Thursday, October 21, 2010

Terabyte Scale for JRuby and Rails

In this, my inaugural blog post, I'm going to explain how you can use Ehcache and Terracotta from your JRuby and Rails applications.  Ehcache is the most popular open source caching solution in the Java landscape and Terracotta's Enterprise Ehcache provides the ultimate in scalability allowing you to store over a terabyte of data in a fully coherent cache.  And now you can use it to provide unprecedented levels of scale to the Rails world.

A Little Bit of History

I work at Terracotta and one of the fun things we do there is to have a semi-annual "dev week", during which the whole distributed team gathers together in our San Francisco office to collaborate in person.  During our last dev week we had a product improvement competition and my entry into the competition was a set of Ruby Gems to provide integration between Ehcache and JRuby/Rails.  To do this, I built on top of the work done by Dylan Stamat, who originally created the JRuby Ehcache integration.  As with so many other open-source projects maintained by people in their free time, this project had been lying dormant for some time while Dylan was busy with other obligations.  I decided to pick up where Dylan left off and as a result of this there is a new set of Ruby Gems for working with Ehcache from JRuby, Rails 2, and Rails 3.

Caching in Rails has traditionally been done using memcached, either directly using the Memcache API or using the Rails caching API with a memcached backend.  While this has worked fairly well for most people, there are some limitations to memcached that have led some to seek alternative solutions.  If you limit yourself to solutions that work with the standard C implementation of Ruby ("Matz's Ruby Implementation") then memcached is likely your best bet.  However, if you are able to use JRuby, a whole new universe of caching solutions emerges, now including the very popular Ehcache.

Using Ehcache with JRuby

Ehcache JRuby integration is provided by the jruby-ehcache gem.  To install it simply execute:

jgem install jruby-ehcache

Ehcache configuration is done with the ehcache.yml YAML file.  The jruby-ehcache gem includes a default ehcache.yml file, but if you would like to customize it you can copy the ehcache.yml file bundled with the gem and place it in your $HOME/lib/config directory and then edit it as you see fit. (This is no longer true. See my blog post Ehcache for JRuby and Rails: Now with more flavor and fewer calories for information about recent changes.)

Now it is time to start using Ehcache.  Let's take a look at a simple example that uses the Ehcache::CacheManager to create a cache, and then puts and gets some date in the cache.

require 'ehcache'

manager = Ehcache::CacheManager.new
cache = manager.cache

cache.put("answer", "42", {:ttl => 120})
answer = cache.get("answer")
puts "Answer: #{answer}"
question = cache.get("question") || 'unknown'
puts "Question: #{question}"

manager.shutdown

Save this code as "jruby-ehcache-demo.rb" and execute it as follows:

jruby -rubygems jruby-ehcache-demo.rb

As you can see from the example, you create a cache using CacheManager.new, and you can control the "time to live" value of a cache entry using the :ttl option in cache.put.  Note that not all of the Ehcache API is currently exposed in the JRuby API, but most of what you need is available and we plan to add a more complete API wrapper in the future.

Using Ehcache in Rails Applications

To use Ehcache from a Rails application you must first install the correct gem for your Rails version.

jgem install jruby-ehcache-rails2 # for Rails 2
jgem install jruby-ehcache-rails3 # for Rails 3

Configuration of Ehcache is still done with the ehcache.yml file, but for Rails applications you must place this file in the config directory of your Rails app.  Note that you must use JRuby to execute your Rails application, as these gems utilize JRuby's Java integration to call the Ehcache API.

With this configuration out of the way, you can now use the Ehcache API directly from your Rails controllers and/or models.  You could of course create a new Cache object everywhere you want to use it, but it is better to create a single instance and make it globally accessible by creating the Cache object in your Rails environment.rb file.  For example, you could add the following lines to config/environment.rb:

require 'ehcache'
EHCACHE = Ehcache::CacheManager.new.cache

By doing so, you make the EHCACHE constant available to all Rails-managed objects in your application.  Using the Ehcache API is now just like the above JRuby example.

If you are using Rails 3 then you have a better option at your disposal: the built-in Rails 3 caching API.  This API provides an abstraction layer for caching underneath which you can plug in any one of a number of caching providers.  The most common provider to date has been the memcached provider, but now you can also use the Ehcache provider.  Switching to the Ehcache provider requires only one line of code in your Rails environment file:

config.cache_store = :ehcache_store

A very simple example of the Rails caching API is as follows:

Rails.cache.write("answer", "42")
Rails.cache.read("answer")  # => '42'

Using this API, your code can be agnostic about the underlying provider, or even switch providers based on the current environment (e.g. memcached in development mode, Ehcache in production).

If you'd like to see a complete example of an Ehcache-enabled Rails application, check out this demo that I wrote to show Ehcache in action:

http://svn.terracotta.org/svn/forge/projects/ehcache-rails-demo/trunk

Conclusion

With Ehcache plugged into your Rails application you have a whole slew of options to address all of those Enterprisey concerns.  With Terracotta's Enterprise Ehcache product you can have the ultimate in scalability by taking advantage of distributed caching or eliminating Java garbage collection pauses with BigMemory.  Plug in a Terracotta Server Array behind your clustered Rails application and you can store hundreds of millions of keys in your cache, and over a terabyte of data.

If you'd like to use Rails for your web app but your boss has concerns about scalability, Ehcache might just be for you.

You can read further documentation on the Ehcache web site, or grab the source from GitHub.

8 comments:

  1. Great article, I love to see fusions of technology like this especially being a day to day Rails developer and a massive advocate of JRuby and a real believer in the power of Java fussed with Ruby.

    However what sort of advantages would such a setup above have over a memcached cluster?

    With memcached being so simple to use in a cluster I wonder where the real advantages would lie in using the above setup to the already widely used memcached setup that so many Rails dev's already use?

    ReplyDelete
  2. Hi Paul, thanks for the comments.

    Memcached is often a "sufficient" solution for many Rails apps. However, there are some limitations that might render it unusable for truly scalable applications. These include:


    * Memcached is not for large media and streaming huge blobs. There is a hard limit of 1 MB per cache entry.
    * Keys must be strings and are limited to 250 characters.
    * Anyone can just telnet to any memcached server. If you're on a shared system, watch out!
    * Memcached is not a persistent store, and doesn’t guarantee something will be in the cache just because you stored it. So you should never rely on the fact that Memcached is storing your data. Memcached should strictly be used for caching purposes only, and not for reliable storage.
    * Memcached is not coherent.
    * Ehcache offers much better performance for many use cases, particularly when your app is working mostly with a "hot set" of data.

    ReplyDelete
  3. Hi Jason,
    thanks for the post.
    BTW, I cannot access to the svn repo where the 'Ehcache-enabled Rails application' demo is in...

    ReplyDelete
  4. Hi Pietro,

    Sorry, I had mistakenly used the secure read-write URL for the demo instead of the read-only public URL. I have corrected this now so you should be able to get to the demo at this point.

    ReplyDelete
  5. My colleague, Nabib, has written a blog explaining another important advantage of Ehcache over memcached: high availability. Read the blog here:

    http://javasmith.blogspot.com/2010/10/ehcache-bigmemory-simple-high.html

    ReplyDelete
  6. Hi Jason, First of all thanks for working on this open source project and bringing such a beautiful solution to all of us.EhCache is indeed best possible caching solution now days and integration with Ruby is just going to make it more valuable.

    Javin
    10 tips on debugging Java Program in eclipse

    ReplyDelete
  7. Ecorptrainings.com provides JRuby in hyderabad with best faculties on real time projects. We give the best online trainingamong the JRuby in Hyderabad. Classroom Training in Hyderabad India

    ReplyDelete
  8. Nice article i found this blog as very informative and very good points were stated in the blog for further information visit
    Oracle Fusion Training

    ReplyDelete