Tag Archives: Rails

Typeahead.js, Elasticsearch and Rails!

Recently for a project I was required to take a list of keywords and make an auto suggest feature. Typically, this can be achieved through some simple but crude sql.

Since the project already required Elasticsearch, all the records were already neatly indexed for me. (minus the keywords) Additionally, the project was developed in Rails so I already had the benefit of the elasticsearch-rails project. With that in mind I got started…

Creating an Analyzer

The first step was to create an analyzer in Elasticsearch for our typeahead column. Creating new analyzers is pretty straightforward with the elasticsearch-rails gem. The “keyword” data we were getting was not checked for errors, so in our case we wanted the analyzer to lowercase everything to reduce the suggestions later. Example if there were searches for “Cats” and “cats” in our keywords we didn’t want to return two suggestions.

Creating a mapping

Now that the analyzer is setup, a mapping can be created. Within the mapping an index is created that uses the keyword info, but this is where it get’s kinda goofy. Elasticsearch wants this data to be structured in a hash with the key of “input” which in turn contains an array of the different keywords. See the example below for more details. The final thing to note is that the index is assigned our “typeahead” analyzer that was created above for both index analysis and search analysis.

Querying suggestions

Now that both the analyzer and mapping are setup, the data can be re-indexed and a query can be run. The following will run a “suggestion” query within the elasticsearch-rails gem. Note: the “text” key is the search term, while the “field” that is being searched on is the keyword_suggest field that was defined in the mapping above.

If all goes according to plan, our model should be returning keyword results in the format below…

Controller time

The controller to return json from the model is dead simple.

Displaying the results

Now that the controller is returning suggestions as json, it’s time to wire that up to a user interface. The project was using the popular Bootstrap framework. With version 2.3.2 there was a JavaScript plugin for typeahead functionality. However, this plugin has been removed as of 3.x in favor of using the typeahead.js library from twitter. It turns out, with a little effort typeahead isn’t that much more complicated to setup than the old bootstrap plugin.

First, an input tag is defined. Make sure that the autocomplete attribute is set to “off” so the native browser doesn’t kick in with it’s suggestions.

Typeahead.js uses an engine called bloodhound to make ajax calls as you type. It has some intelligent caching features and makes consuming that data pretty hands off. Once a new bloodhound object is created, the initialize() method must be called to finish the process.

Now that the bloodhound engine is setup, the input tag can be selected and typeahead will wire everything together. One thing to note, the “displayKey” setting is telling typeahead which key to use from our hash values within the array. In this case “text” is the key that should be used.

Going out with style

The last thing that I did was add a bit of custom styling to the typeahead box.

Overall I was pretty impressed by the solution. It look less than a couple hours to setup and was pretty fast and responsive. When considering a typeahead option for your next project give Elasticsearch a spin!

COPY millions of rows to Postgresql with Rails

ActiveRecord is great when you need to quickly access and manipulate a few rows within a database. Loading records into a database is just as easy… Seed files and custom rake tasks make inserting records a breeze. But what happens when you need to import lots of rows. To clarify, when I say lots of rows, I’m talking about millions of rows from a delimited text file.

So many rows, so little time

My first instinct was to read in the text file and do a Model.create() on each row. This took a long, long time. (I actually gave up on it)

Next, I tried batching the rows to an array in an effort limit the number of database calls. I then imported each batch using the activerecord-import gem. Batching helps if your recordset is a couple thousand rows, but doesn’t scale efficiently to a million+ rows.

COPY to the rescue

In order to get in the millions, I needed to take Rails out of the equation as much as possible. This meant using an ActiveRecord raw_connection and the COPY command from postresql.

Due to permission issues, you most likely will not be able to use the filename option of the COPY command. This actually turns out to not be a big deal since you can still use STDIN to pass data to the command. Confused? Let’s look at the example

It’s actually pretty simple. I execute the COPY command on our large_table with data from STDIN. Then I read in the file and put the data into STDIN for each line. When the file is finished, I issue an end copy instruction.

Conclusion

With the copy technique above, I was able to import over 2.4 million rows in less than 4 min. Not too shabby. I would be interested in hearing what strategies you all have used for large imports.

Going to the limit

Using the asset pipeline seems like a no brainer. It works great it almost every circumstance. It is sort of like magic, you deploy to production and all your files are neatly bundled into one concatenated application file. This works great, until it doesn’t. Even the asset pipeline, at some point, has to deal with Internet Explorer.

Hitting the limit

Limit – “The point, edge, or line beyond which something cannot or may not proceed.”

The definition above perfectly describes the behavior of Internet Explorer when dealing with large stylesheet files. IE (6-9) cannot proceed past 4095 rules in one stylesheet. The behavior that occurs is really hard to understand and debug. Once IE hits the magic number of 4095 it stops applying style rules to the page. Ruh ro!

What about the rails?

Now that we know about this problem, the question is… What does this have to do with Rails and the asset pipeline? Well, you could imagine that if you started to build a large enough site with several stylesheets and they were all combined into one sheet (via the asset pipeline), you could start to approach that 4095 limit. Again, once you hit that limit IE will stop applying styles and your page will look broken.

The Breakup

The only thing we can do is breakup. Breaking up the stylesheet into smaller ones will hopefully put us under the 4095 limit. Remember the limit is per stylesheet, so if we break the rules up we shouldn’t have a problem.

There a gem for that

Of course, with Rails there is a a gem for everything. This case being no exception. The css splitter gem has been built just for this reason. Full disclosure: I have not used this gem but it looks like an easy solution.

Manual Split

For my project I didn’t use the css splitter gem. Most of my styles were in two different stylesheets, plus bootstrap. I made another stylesheet rule in my layout for bootstrap and split the files up manually. Since bootstrap has a ton of rules that was originally putting me over the 4096 limit. The last thing to remember is to remove “*= require_tree .” from your application.css file.

Conclusion

As much as we would all like IE to go away, it’s not going anywhere soon. The good news is with some relatively easy manipulation we can get our site back up and running full speed ahead.

1 2 3