Showing posts with label terracotta. Show all posts
Showing posts with label terracotta. Show all posts

Monday, November 15, 2010

Ehcache for JRuby and Rails: Now with more flavor and fewer calories

In my last blog post I discussed how you can achieve Terabyte scale for JRuby and Rails, and judging from the response I received this is a topic of some interest to Ruby and Rails developers. In that post I explained how it is possible to use Ehcache as a Rails caching provider, or use its API directly from any JRuby application. Since that time, I've done some further work to make Ehcache integration with JRuby and Rails more robust and production ready. In this post I'll describe what's new and improved, including complete coverage of the Ehcache Java API. In a followup post I will discuss how to utilize the new Ehcache JRuby and Rails integration for large scale enterprise applications, including fully coherent distributed caching with Ehcache and Terracotta, and also how to use BigMemory for Enterprise Ehcache to make sure your application can handle any load you can throw at it.

More Flavor

In previous iterations of jruby-ehcache we took the approach of providing Ruby wrapper classes to encapsulate the functionality of Ehcache behind a nice Ruby interface. This had the advantage of making the API more idiomatic to Ruby, but it also meant that we did not provide full API coverage and that any time the Ehcache API changed we also had to update our Ruby wrapper API to accommodate the changes. As you can imagine, we weren't entirely satisfied with this approach and so went in search of a better mechanism. It turns out that JRuby's Java integration combined with Ruby's dynamic open classes provide exactly what we need for this.

The Java integration provided by JRuby makes it incredibly easy to use any Java API from Ruby code. All that is required is adding a simple require 'java' to your Ruby code and the whole of the Java landscape is opened up to you. We built on top of this in the latest jruby-ehcache gem by having the gem automatically set up the CLASSPATH and invoke require 'java'for you, and now with one line of code you instantly have the complete Ehcache API available to your Ruby application:

require 'ehcache'

With that one line of code in place, you can now use any part of the Ehcache API just as you would in a Java application, and JRuby even provides some extra niceties to make the Java API more Ruby-friendly:

require 'ehcache'
  cache_manager = Java::NetSfEhcache::CacheManager.new
  cache = cache_manager.getCache("myCache")
  cache.put("answer", "42")
  answer = cache.get("answer")  # Returns Ehcache Element object
  puts "Answer: #{answer.value}"
  question = cache.get("question") # Returns nil
  if question
    puts "Question: #{question.value}"
  else
    puts "I don't know the question"
  end

I won't cover every detail of JRuby Java integration (see Calling Java From Ruby on the JRuby wiki for full details), but I do want to point out a couple of important details. First, notice how Java classes are referenced from Ruby code. The expression Java::NetSfEhcache::CacheManager is a reference to the Java class net.sf.ehcache.CacheManager. More generally, any Java class can be accessed within the Java module by transforming the package path by removing the dots and converting to CamelCase. Second, JRuby performs some magic to convert Java method names and JavaBeans property accessors to more Ruby-like equivalents. Thus, you can call the getCache method in any of three ways: getCache, get_cache, or simply cache.

That is nice enough, but as any good Rubyist will attest, Java APIs tend to be bloated and difficult to use compared to an equivalent Ruby API. Luckily, we can take advantage of Ruby's dynamic nature and support for open classes to provide a much more Rubyesque API without sacrificing full access to the underlying Java API. For instance, it would be nice if we could use the familiar array access notation to access cache entries, and while we're at it couldn't we also do away with the Ehcache Element object and just access the cache entry value directly? Let's see how this is done.

class Java::NetSfEhcache::Cache
    # Gets an element value from the cache.  Unlike the #get method, this method
    # returns the element value, not the Element object.
    def [](key)
      element = self.get(key)
      element ? element.value : nil
    end
  end

  # Later...
  forty_two = cache['answer']   # Returns the value, not the Element object

Here we open up the net.sf.ehcache.Cache class and add our own custom method to it to provide array access notation. Note that this is not inheritance and we are actually modifying the Cache class directly, so you can now use the [] operator on any Cache object, whether you created it yourself in Ruby code or it was created deep in the bowels of some legacy Java code. The world is yours.

Another bit of Ruby goodness we've added in the latest version is to make the Ehcache::CacheManager and Ehcache::Cache classes include the Ruby Enumerable module. Anyone who's done a significant amount of Ruby programming knows how powerful this module is, but for those who might not be familiar with it let's have a look at a few example usages that illustrate it's power.

# Find all cache entries with time to live greater than one minute.
  cache.find_all {|e| e.ttl > 60}

  # Which cache entry has the largest time to live value?
  cache.max {|e1, e2| e1.ttl <=> e2.ttl}

  # Are all cache entries strings?
  cache.all? {|e| e.value.is_a?(String)}

  # Does cache contain the ultimate question of life, the universe, and everything?
  cache.any? {|e| e.name == 'The Ultimate Question'}

  # Sum of all numeric cache entries.
  cache.inject(0) {|sum, e| e.value.is_a?(Numeric) ? sum + e.value : sum}

  # Find the email address of creators of inferior programming languages
  cache.reject {|e| e.value == 'Ruby'}.map {|e| e.value.creator_email}.uniq

This is just a small preview of what Ruby's Enumerable provides. For full reference, see the documentation on ruby-doc.org. Be aware that if you have a large cache, this kind of iteration over every element could be prohibitively expensive, but for smaller caches it provides a very powerful querying mechanism. For large caches, there is a new search API for Ehcache currently in the works, which uses indexing for efficient searching and will be available in an upcoming Ehcache release.

There are several other ways in which the Ruby API has been enhanced but I can't describe all of them here. If you're curious, see the RDoc API documentation that is bundled with the jruby-ehcache gem. And, of course, because the full Java API is available to you, you can also use the Ehcache Javadocs for reference.

Fewer Calories

In addition to adding the above features, we've also done some fat trimming for this latest release. First and foremost, we have deprecated the YAML configuration option in favor of using the Ehcache native XML configuration. We know that some Rubyists will be disappointed by this decision ("What? More XML?"), but we feel it was the right decision for several reasons:

  • The YAML configuration code is by far the most complicated bit of code in the Ehcache JRuby integration and we feel that it is a likely source of bugs.
  • YAML configuration is handled by pure Ruby code, and the Ehcache Java code is completely ignorant of it. Any time that there is a change to the Ehcache configuration format, it would require an update and new release to the Ruby YAML code, meaning that we'd be playing a continual catch-up game with Ehcache core.
  • There are subtle differences between the YAML configuration and the XML configuration that we feel can only lead to confusion in the long run.
  • Java developers who already use Ehcache will already have ehcache.xml configuration files that they can now use directly instead of translating to YAML.

While we're talking about configuration, I should mention that we've made some improvements to how your configuration files are located. Previous versions of jruby-ehcache required that you place a config.yml file in your $HOME/lib/config directory, which of course made it less than practical to have more than one application using jruby-ehcache at any given time. With version 1.0.0 you have a lot more options available to you. Now you can put your ehcache.xml either in the same directory as the Ruby file that creates the CacheManager object, or place it in your Java CLASSPATH, or you can specify any location in your call to the CacheManager constructor. If you are using Rails, then ehcache.xml will continue to reside in the canonical Rails config directory.

Finally, we've removed a limitation that prevented you from using versions of Ehcache other than that bundled with the jruby-ehcache gem, and made it easy to drop in Enterprise Ehcache JARs into your application. With the latest updates, jruby-ehcache will use your Java CLASSPATH to locate the Ehcache JARs it should use, instead of forcing the use of the bundled Ehcache. In my next blog post I will discuss how you can take advantage of this to add BigMemory to your application, or utilize distributed caching with Terracotta for linear scale out.

Further Reading

If you are interested in learning more about Ehcache and any of its associated add-ons, there are numerous resources available to you. Here are a few to get you started.

During my development on jruby-ehcache, I heavily utilized Gregory T. Brown's excellent book Ruby Best Practices for tips and techniques. I highly recommend this book to anyone doing serious Ruby development.

Thursday, October 21, 2010

Terabyte Scale for JRuby and Rails

In this, my inaugural blog post, I'm going to explain how you can use Ehcache and Terracotta from your JRuby and Rails applications.  Ehcache is the most popular open source caching solution in the Java landscape and Terracotta's Enterprise Ehcache provides the ultimate in scalability allowing you to store over a terabyte of data in a fully coherent cache.  And now you can use it to provide unprecedented levels of scale to the Rails world.

A Little Bit of History

I work at Terracotta and one of the fun things we do there is to have a semi-annual "dev week", during which the whole distributed team gathers together in our San Francisco office to collaborate in person.  During our last dev week we had a product improvement competition and my entry into the competition was a set of Ruby Gems to provide integration between Ehcache and JRuby/Rails.  To do this, I built on top of the work done by Dylan Stamat, who originally created the JRuby Ehcache integration.  As with so many other open-source projects maintained by people in their free time, this project had been lying dormant for some time while Dylan was busy with other obligations.  I decided to pick up where Dylan left off and as a result of this there is a new set of Ruby Gems for working with Ehcache from JRuby, Rails 2, and Rails 3.

Caching in Rails has traditionally been done using memcached, either directly using the Memcache API or using the Rails caching API with a memcached backend.  While this has worked fairly well for most people, there are some limitations to memcached that have led some to seek alternative solutions.  If you limit yourself to solutions that work with the standard C implementation of Ruby ("Matz's Ruby Implementation") then memcached is likely your best bet.  However, if you are able to use JRuby, a whole new universe of caching solutions emerges, now including the very popular Ehcache.

Using Ehcache with JRuby

Ehcache JRuby integration is provided by the jruby-ehcache gem.  To install it simply execute:

jgem install jruby-ehcache

Ehcache configuration is done with the ehcache.yml YAML file.  The jruby-ehcache gem includes a default ehcache.yml file, but if you would like to customize it you can copy the ehcache.yml file bundled with the gem and place it in your $HOME/lib/config directory and then edit it as you see fit. (This is no longer true. See my blog post Ehcache for JRuby and Rails: Now with more flavor and fewer calories for information about recent changes.)

Now it is time to start using Ehcache.  Let's take a look at a simple example that uses the Ehcache::CacheManager to create a cache, and then puts and gets some date in the cache.

require 'ehcache'

manager = Ehcache::CacheManager.new
cache = manager.cache

cache.put("answer", "42", {:ttl => 120})
answer = cache.get("answer")
puts "Answer: #{answer}"
question = cache.get("question") || 'unknown'
puts "Question: #{question}"

manager.shutdown

Save this code as "jruby-ehcache-demo.rb" and execute it as follows:

jruby -rubygems jruby-ehcache-demo.rb

As you can see from the example, you create a cache using CacheManager.new, and you can control the "time to live" value of a cache entry using the :ttl option in cache.put.  Note that not all of the Ehcache API is currently exposed in the JRuby API, but most of what you need is available and we plan to add a more complete API wrapper in the future.

Using Ehcache in Rails Applications

To use Ehcache from a Rails application you must first install the correct gem for your Rails version.

jgem install jruby-ehcache-rails2 # for Rails 2
jgem install jruby-ehcache-rails3 # for Rails 3

Configuration of Ehcache is still done with the ehcache.yml file, but for Rails applications you must place this file in the config directory of your Rails app.  Note that you must use JRuby to execute your Rails application, as these gems utilize JRuby's Java integration to call the Ehcache API.

With this configuration out of the way, you can now use the Ehcache API directly from your Rails controllers and/or models.  You could of course create a new Cache object everywhere you want to use it, but it is better to create a single instance and make it globally accessible by creating the Cache object in your Rails environment.rb file.  For example, you could add the following lines to config/environment.rb:

require 'ehcache'
EHCACHE = Ehcache::CacheManager.new.cache

By doing so, you make the EHCACHE constant available to all Rails-managed objects in your application.  Using the Ehcache API is now just like the above JRuby example.

If you are using Rails 3 then you have a better option at your disposal: the built-in Rails 3 caching API.  This API provides an abstraction layer for caching underneath which you can plug in any one of a number of caching providers.  The most common provider to date has been the memcached provider, but now you can also use the Ehcache provider.  Switching to the Ehcache provider requires only one line of code in your Rails environment file:

config.cache_store = :ehcache_store

A very simple example of the Rails caching API is as follows:

Rails.cache.write("answer", "42")
Rails.cache.read("answer")  # => '42'

Using this API, your code can be agnostic about the underlying provider, or even switch providers based on the current environment (e.g. memcached in development mode, Ehcache in production).

If you'd like to see a complete example of an Ehcache-enabled Rails application, check out this demo that I wrote to show Ehcache in action:

http://svn.terracotta.org/svn/forge/projects/ehcache-rails-demo/trunk

Conclusion

With Ehcache plugged into your Rails application you have a whole slew of options to address all of those Enterprisey concerns.  With Terracotta's Enterprise Ehcache product you can have the ultimate in scalability by taking advantage of distributed caching or eliminating Java garbage collection pauses with BigMemory.  Plug in a Terracotta Server Array behind your clustered Rails application and you can store hundreds of millions of keys in your cache, and over a terabyte of data.

If you'd like to use Rails for your web app but your boss has concerns about scalability, Ehcache might just be for you.

You can read further documentation on the Ehcache web site, or grab the source from GitHub.