Matthew Hutchinson

You Say Party! We Say Die - Laura Palmers Prom

no comments yet, post one now

January 06, 2010 23:57 by

NGINX upload module with Paperclip (on Rails)

19 comments

Over the Christmas holidays I started looking at integrating the nginx upload module into Bugle.

The nginx upload module has been around for a while, but I couldn’t find anything to explain exactly what went on with the params and the best way to integrate it with the Paperclip gem in Rails (which Bugle uses for all upload handling). As I worked with it I found a few caveats along the way.

Why bother?

Without the module, your Rails app will receive the raw uploaded data, parsing its entire contents before it can be used. For large uploads this can be quite slow, since Ruby is the work horse throughout.

With this module, parsing the file happens in C through nginx and before your Ruby application gets it. The module puts the parsed file into a tmp directory and strips all the multipart params out of the POST body, replacing it with params you can use (in Rails) to get the name and location of the file on disk. So by the time the request hits your application, all the expensive parsing has been done and the file is ready to be used by your app. Basically hard work is moved from Ruby to C.

Compiling Nginx to include the module

To install the module you need to build nginx from source and pass it the upload module source directory as an argument. Since I run Bugle on a live production machine I wanted to work with things locally first. I began by setting up my local (OSX) box to match the production stack. I currently use nginx with Passenger and Ruby Enterprise Edition.

First download and untar both the nginx and upload module sources. Then build using the following commands (these worked on both OSX and my Ubuntu production server)

sudo /opt/ruby-enterprise-1.8.7-20090928/bin/passenger-install-nginx-module --nginx-source-dir=<path to nginx sources> --extra-configure-flags=--add-module='<path to upload module sources>'

Or if you’re just building nginx from source (without using the handy passenger installer) go with this;

cd <path to nginx sources>
./configure --add-module=<path to upload module sources>
make
make install

Don’t worry about any existing nginx.conf files you have or nginx vhosts etc. They will be unaffected after recompiling.

Configuring Nginx

The next step is to configure nginx to use the module. In Bugle uploads are normally sent via a POST request to the uploads controller using a restful URL that can be one of;

POST /admin/blogs/:blog_id/uploads
or
POST /admin/themes/:theme_id/uploads

These hit the ‘uploads’ controller, ‘create’ action. I wanted to keep the same restful URLs so I tried the following regex in the nginx config.

location ~* admin\/(themes|blogs)\/([0-9]+)\/uploads { }

While this did work in recognising the URL, the upload module wouldn’t work with it. In the end I opted to use a new defined url for faster uploads, here is the route for it in Rails. You may have a simpler upload URL controller action making this unnecessary.

map.connect 'admin/uploads/fast_upload', :controller => 'admin/uploads', 
                                        :action     => 'create', 
                                        :conditions => { :method => :post }

So the modified nginx server config becomes;

server {
 listen   80;
 server_name bugleblogs.com *.bugleblogs.com;
 
 # ...
 # somewhere inside your server block
 # ...
 
 # Match this location for the upload module
 location /admin/uploads/fast_upload {
   # pass request body to here
   upload_pass @fast_upload_endpoint;

   # Store files to this directory
   # The directory is hashed, subdirectories 0 1 2 3 4 5 6 7 8 9 should exist    
   # i.e. make sure to create /u/apps/bugle/shared/uploads_tmp/0 /u/apps/bugle/shared/uploads_tmp/1 etc.
   upload_store /u/apps/bugle/shared/uploads_tmp 1;

   # set permissions on the uploaded files
   upload_store_access user:rw group:rw all:r;

   # Set specified fields in request body
   # this puts the original filename, new path+filename and content type in the requests params
   upload_set_form_field upload[fast_asset][original_name] "$upload_file_name";
   upload_set_form_field upload[fast_asset][content_type] "$upload_content_type";
   upload_set_form_field upload[fast_asset][filepath] "$upload_tmp_path";

   upload_pass_form_field "^theme_id$|^blog_id$|^authenticity_token$|^format$";
   upload_cleanup 400 404 499 500-505;
 }
 
 location @fast_upload_endpoint {
   passenger_enabled on;  # or this could be your mongrel/thin backend
 } 
}
  • After processing, the upload module puts the file in one of 10 tmp directories, these should be already created and accessible by your Rails app, since I am using Capistrano, I’ve chosen the shared/ folder to house the tmp files (you’d risk loosing a file mid-deploy if you chose somewhere in the RAILS_APP current/ directory)
  • The upload_pass_form_field directive preserves any params that match the regex (we don’t want the module to strip the following params: format, blog_id, theme_id or authenticity_token)
  • The upload_pass directive sets what should handle the request after the upload module has finished with the file (we want to handle it using Rails, through passenger, so a @fast_upload_endpoint location is defined)
  • The upload_set_form_field directives are used to specify the params that Rails will now receive, this will give;
    • params[‘upload’][‘fast_asset’][‘original_name’]
    • params[‘upload’][‘fast_asset’][‘content_type’]
    • params[‘upload’][‘fast_asset’][‘filepath’]

At this point its worth testing the app. Perform an upload and check that nginx is sending these params to your controller action. For more info here is a complete guide to all the module directives.

Working with Paperclip and Rails

Finally I needed to modify Rails to make use of these new params. In Bugle the upload module has_attached_file :asset using Paperclip. One problem is that new file in the tmp/ directory exists with a hashed meaningless filename, so simply passing this file to self.asset will not work for Paperclip processing. It needs to have the original filename and content_type. Fortunately we have those in the new params too. So the new fast_asset= method shifts and renames the file into a sub tmp directory (which gets cleaned on the after_create filter). All this seems a little convoluted, but I couldn’t see any other way to do this, without perhaps modifying the Paperclip internals. If anyone has any suggestions around this let me know in the comments.

class Upload < ActiveRecord::Base
  has_attached_file :asset, :styles => {:thumb => ["64x64#", :jpg]},
                            :url    => ":base_url/:resource/:styles_folder:basename:style_filename.:extension",
                            :path   => "public/u/:resource/:styles_folder:basename:style_filename.:extension" 
        
  attr_accessor :tmp_upload_dir
  after_create  :clean_tmp_upload_dir
  
  # handle new param
  def fast_asset=(file)
    if file && file.respond_to?('[]')
      self.tmp_upload_dir = "#{file['filepath']}_1"
      tmp_file_path = "#{self.tmp_upload_dir}/#{file['original_name']}"
      FileUtils.mkdir_p(self.tmp_upload_dir)
      FileUtils.mv(file['filepath'], tmp_file_path)
      self.asset = File.new(tmp_file_path)
    end
  end    
  
  private
  # clean tmp directory used in handling new param
  def clean_tmp_upload_dir
    FileUtils.rm_r(tmp_upload_dir) if self.tmp_upload_dir && File.directory?(self.tmp_upload_dir)
  end                     
end

For completeness here is the regular controller action;

def create
  # ...
  @upload = @resource.uploads.build(params[:upload])
  if @upload.save
  # ...
end

Go Go Uploads!

Thats it, you should now have much faster uploads through nginx! To see the improvement try uploading a 50Mb+ file with/without the module. In a future series of posts I will be conducting a complete walkthrough of the uploader I have built for Bugle. End to end from the browser, to Rails and the actual server configuration.

Related Links

Spaceships!

no comments yet, post one now

Concept Ships

Concept Ships - spaceships and experimental aircraft art

December 18, 2009 22:25 by

Gource on OSX (Snow Leopard)

9 comments

Some months ago I played around with code_swarm by Michael Ogawa – partly for fun and partly to see what all the fuss was about with the Processing framework (something I have yet to really investigate). Last week I came across Gource another source code visualisation tool this time using 3D rendering.

Software projects are displayed by Gource as an animated tree with the root directory of the project at its centre. Directories appear as branches with files as leaves. Developers can be seen working on the tree at the times they contributed to the project.

Some recent commits to the project fixed build issues on OSX. Despite these fixes I still had trouble compiling. So in summary here’s what I did to get it working. Note that I did resort to installing mac ports (something I’d rather NOT do) – after many attempts to download and manually compile the prerequisites (FTGL 2.1.3~rc5-2 kept giving me problems)

# get mac ports
http://www.macports.org/install.php

# get all the ports you need for gource
sudo port install pcre libsdl libsdl_image ftgl

# get and build Gource from github and follow the instructions in INSTALL
git clone git://github.com/acaudwell/Gource.git
cd Gource
autoreconf -f -i
./configure && make && sudo make install

# navigate to your project directory and type;
gource

# have a look at all the options
man gource

# here are the settings I used (to get a large user icon and change the speed/size
gource ./ -s 0.5 -b 000000 --user-image-dir ~/images/avatars/ --user-scale 2.0 -800x600

# video it in h264 using ffmpeg, first install ffmpeg via mac ports (with codecs)
sudo port install ffmpeg +gpl +lame +x264 +xvid

# pipe PPM images to ffmpeg to generate a h264 encoded movie
gource ./ -s 0.5 -b 000000 --user-image-dir ~/images/avatars/ --user-scale 2.0 -800x600 --output-ppm-stream - | ffmpeg -y -b 3000K -r 60 -f image2pipe -vcodec ppm -i - -vcodec libx264 -vpre hq -crf 28 -threads 0 bugle.mp4

# finally, if you want to add some audio to the video (from an mp3)
ffmpeg -i audio.mp3 -i bugle.mp4 -vcodec libx264 -vpre hq -crf 28 -threads 0 bugle-with-audio.mp4
(this output will be the length of the video/audio track, whichever is longer)

# phew! - now go here to see how to remove macports when you're done :)
http://trac.macports.org/wiki/FAQ#uninstall

Bugle did exist (for a time) as an open source project on github before I moved it to be privately hosted. So here is the result; Bugle’s git log from April ’09 to the present parsed through Gource, with just one committer (me).

In summary you can see real bursts of activity at the start followed by some long periods of inactivity, and coming toward the present date renewed development and work going on. Gource is more impressive when visualising big projects with multiple committers over long periods, like the history of Git itself for example.

The Snowman

no comments yet, post one now

The Snowman

The Snowman, happy holidays!

December 06, 2009 23:22 by

Dead mans bones - My body's a zombie for you

no comments yet, post one now

December 01, 2009 00:01 by

Bluepill monitoring delayed_job

4 comments

Lately I have been using bluepill to to monitor long-running processes on my application servers. The guys at serious business wrote bluepill out of their frustrations with god and monit, which gradually leak memory over long periods in certain conditions. bluepill is a simple piece of code with a small feature set, but does all you need to keep your processes alive. It even has parent/child monitoring for the likes of Unicorn master/worker processes.

Right now I am using it in Bugle production to monitor the delayed_job master process. It will also be useful if (or when) I get a chance to try Unicorn. Delayed Job is used in Bugle for two things right now, processing uploads (storing/deleting to/from S3) and delivering all application emails asynchronously. Here is the bluepill monitoring configuration script for it;

Bluepill.application("bugle") do |app|
  app.process("delayed_job") do |process|
    process.start_command = "/apps/bugle/current/script/delayed_job start -eproduction"
    process.pid_file = "/apps/bugle/current/tmp/pids/delayed_job.pid"
    process.uid = "bugle"
    process.gid = "bugle"
  end
end
November 26, 2009 18:26 by

Open the pod bay doors Hal

no comments yet, post one now

So, I'm back here again. Its been two years since I posted regularly and a lot has happened since then. Here's a quick summary of that, and some reasoning behind my motivation to start writing again.

read the rest of this article →
November 20, 2009 18:49 by

1955 Mercedes Benz 300 SLR Coupe

no comments yet, post one now

1955 Mercedes Benz 300 SLR Coupe

1955 Mercedes Benz 300 SLR Coupe at cartype

November 20, 2009 11:42 by

Paging Keys

no comments yet, post one now

Some time ago I wrote a javascript class to implement keyboard short-cuts for paging through listings one item at a time (and across paginated pages). It was inspired by the navigation at FFFFOUND! and explained nicely by Ryan Singer of 37Signals.

Some time ago Ryan posted a link to my script on the 37Signals blog, now with over 130 watchers on github a creeping sense of responsibility has settled in. So the latest revision now has both jQuery and Prototype support and a handier configuration object, so you can further customise how your page and CSS work with the script.

var config = {
  nodeSelector:        '.hentry h2 a.entry-title',  // used to select each item on the page and place in the map (must be a link)
  prevPageSelector:    '.prev_page',                // link on this element should always jump to prev page a.prev_page (must be a link)
  nextPageSelector:    '.next_page',                // link on this element should always jump to next page a.next_page (must be a link)
  pagingNavId:         'paging-nav',                // dom id of the floating page navigation element
  keyNext:             'j',                         // hot keys used 
  keyPrev:             'k',
  keyNextPage:         'h',
  keyPrevPage:         'l',
  keyRefresh:          'r',
  additionalBodyClass: 'paging-keys',               // this class is assigned to the page body on load
  bottomAnchor:        'bottom'                     // the name of the anchor (without #) at end of page, e.g. set on last post on the page
};

I’m using the script on this site (see the overlay on the top right) and you can try it out by pressing j/k to navigate through the articles. Some things worth mentioning about the code;

  • It tries to closely follow the ’7 rules of unobtrusive javascript":http://icant.co.uk/articles/seven-rules-of-unobtrusive-javascript/
  • By default (and in the demo) it hooks to items in the HTML that formated with in the hATOM microformat
  • It also latches onto pagination links generated from the popular will_paginate gem
  • It makes use of Hotkey.js
  • Eventually this script will live inside many of the default templates in Bugle
November 18, 2009 11:14 by
← (k) prev | next (j) →