Learning Ruby on Rails: File Upload

Whenever I decide to pick up a new language – or framework – there are usually two ubiquitous problems that I like to try to solve immediately: authentication and file uploads. Over the years I’ve found that these are great learning tools for a couple of reasons.

  1. They’re well known problems. They’ve been around for a long time and the best practices for solving them are pretty well documented.
  2. They’re finite problems. There are only so many ways to skin these particular cats and the solutions are very contained. I could build an app whose sole purpose is to login, upload a file and display it.
  3. The problems may be well known and finite, but they’re not trivial. Trivial solutions don’t make for good learning tools.

This month, I’ve finally taken the plunge and picked up Ruby and Rails. That’s a two-fer (a language and a framework), so once again I started figuring out how to solve these same problems with the new toolkit.

Authentication

After looking around, I found that there is already an authentication plugin, AuthLogic, that looked remarkably comprehensive and solved that particular problem just about the same way I would have if I’d done it manually, but then threw a few more useful tools in the kit that I probably wouldn’t have. Additionally, it looked relatively easy to install and configure.

That sounded like a pretty good deal to me, so I followed along with Ryan Bates’ excellent Railscast and got authentication hooked up rather quickly. No point in re-inventing the wheel as long as the existing wheel is as good as or better than the one you’d have invented, right?

Upload

I did not, however, find a similarly comprehensive plugin for file uploads. There are a few out there – Attachment_fu and Paperclip seem to lead the pack – but none that appear to handle physical media the way I like to handle it. Since my wheel is better than (or at least different from) those wheels, this became my de facto learning experience.

Disclaimer: I only took a quick look at what these plugins had to offer, so maybe they’re closer to my way than it appeared. Nonetheless, I needed a problem to solve and this became the one.

Requirements

As I mentioned, my requirements are slightly different from most I’ve seen, so it’s probably worth outlining them really quickly.

  1. Store physical files to their own directory structure dedicated to user contributed content. In my Rails project, I have a Rails.root/public/bin/ directory for this. However, since I don’t want to store (what I hope will be) thousands of files or more in a single directory, I create 26 subdirectories named a-z. In each of those, I create 26 more directories also named a-z. Now I have 676 possible locations so my (potentially) huge number of files can be nicely distributed (for example, Rails.root/public/bin/r/m/file.ext) for performance.
  2. Store physical file metadata (file size, MIME type, URI, etc.) in the database. I often find myself needing or wanting to query this kind of data directly and I don’t want to have to do so through the application. I want easy access to find out how many images I have that are over 50KB, for example.
  3. Abstract the common functionality to allow the upload of many different types of file (images, videos, flash movies, etc.) with very little effort. In a traditional inheritance model, I might have a File class (a concrete class) that is extended by any type with specialized properties or actions.

For the sake of this article, I’ll keep this simple and focus on images as the only subtype. The extrapolation should be easy enough.

Approach

As I alluded to above, an image is-a file, so a simple inheritance model would be a perfect fit, but…

  1. Rails doesn’t offer much in the way of inheritance (no, single table inheritance is not a sufficient solution).
  2. “File” is a reserved word in the Rails framework.

The latter problem was easy to solve, so I created an Image model and a Binary model. The first, though, was a little trickier. Although Rails’, well, ActiveRecord’s, support for inheritance is non-existent, Rails offers something very helpful for this purpose: observer support.

At a Really High Level

From 10,000’, it’s quite simple (isn’t everything at that distance?). I create an image, but before the image object is actually saved, I want to hand off the physical file to the Binary class for uploading to a temporary directory, inspection and extraction of metadata. If that’s successful, I want to move the file to a permanent location and continue saving the image. If anything fails at any point in the process, I want to remove any database records and delete the physical file.

Code

Enough with the chit-chat, let’s write some code.

View

The only relevant view code is that for the form partial. It allows the user to enter a title for the image, a description and to select a file. Images have other properties (like width and height), but they’re derived and I don’t want the user to enter those.

<% form_for( @image, :html => { :multipart => true } ) do |f| %>
  <%= f.error_messages -%>
  <p>
    <%= f.label :name, 'Title' %>
    <%= f.text_field :name %>
  </p>
  <p>
    <%= f.label :upload, 'File' -%>
    <%= f.file_field :upload -%>
  </p>
  <p>
    <%= f.label :description %>
    <%= f.text_area :description %>
  </p>
  <p><%= f.submit( 'Upload' ) %></p>
<% end %>

Controller

As with most, there are two controller actions in play to upload a new image: new and create.

class ImagesController < ApplicationController
  def new
    @image  = Image.new
  end

def create @image = Image.new( params[:image] ) @image.user_id = current_user @image.save! flash[:notice] = “Successfully created image.” redirect_to @image rescue => e logger.error( ‘Upload failed. ‘ + e ) flash[:error] = ‘Upload failed. Please try again.’ render :action => ‘new’ end end

The only significant – and perhaps non-obvious – difference from the basic scaffold code is the move away from if @image.save! to an exception-centric approach. I did that, first, because I can raise exceptions anywhere and how they’re handled is absolutely predictable. I can’t always return false and be sure that activity simply stops. Second, Rails has a very nice feature, in my opinion, that encapsulates the entirety of a save action in a transaction. If an exception is raised at any time during the save process, the transaction is rolled back and it’s like nothing ever happened. That’s a lot of goodness for absolutely no effort and I wanted to take advantage of it.

Models and Observer

I find it easiest to think about the model code in the order in which it’s encountered, so I’m going to split the code views and build on them accordingly. If that proves unpopular, I’ll publish a more unified version of the model code.

First, a snippet of the Image model to show its association, accessor attribute and validation.

 class Image < ActiveRecord::Base
   belongs_to :binary

validates_presence_of( :upload ) attr_accessor :upload # snip (for now) end

The observer is watching the Image model. When the save process is kicked off (and because we’re creating a new image), the BinaryObserver is engaged and its before_create callback is executed. In that callback we’re offering the Image model the opportunity to execute any instructions it may have before_ the physical file is actually uploaded. If no exceptions are raised, the Binary model is told to upload the file.

 class BinaryObserver < ActiveRecord::Observer
   observe :image

def before_create( model ) # in this case, the Image model is passed if model.respond_to?( ‘before_upload’ ) model.before_upload( model ) rescue raise end binary = Binary.new.upload( model.upload ) # snip (for now) end

The file uploaded by the form is passed to the Binary model and stored to a temporary location on the server. The upload method stores the file in a temporary location so that it can be further inspected and/or validated by the subtype for any details that may be relevant to that subtype. Any file that remains in the temporary location is assumed to be orphaned and is subject to deletion. Any file that makes it into the bin/ directory hash is assumed to be live and is left alone unless there’s a good reason for touching it.

 class Binary < ActiveRecord::Base
   has_one :image

def upload( uploaded_file ) self.name = uploaded_file.original_path self.mime_type = uploaded_file.content_type # get_bin_root() returns File.join( Rails.root, ‘public’, ‘bin’ ) save_as = File.join( get_bin_root(), ‘_tmp’, uploaded_file.original_path ) File.open( save_as.to_s, ‘w’ ) do |file| file.write( uploaded_file.read ) end self.extension = File.extname( self.name ).sub( /^\./, ‘’ ).downcase self.size = File.size( save_as ) self.path = save_as.sub( Rails.root.to_s + ‘/’, ‘’ ) self.uri = get_uri_from_path() self.save! return self end end

Since this is a specific type of file – an image, I want to extract certain additional properties (in this case, the image’s width and height) from the physical file before the final save. I may also want to perform additional validation on the physical file while it’s stored in its temporary location on the file system. That’s where the custom after_upload callback comes into play.

 # models/binary_observer.rb
 class BinaryObserver < ActiveRecord::Observer
   observe :image

def before_create( model ) if model.respond_to?( ‘before_upload’ ) model.before_upload( model ) rescue raise end binary = Binary.new.upload( model.upload ) if model.respond_to?( ‘after_upload’ ) model.after_upload( model, binary ) rescue raise end end

The original model (Image, in case you’ve forgotten) is passed along with the Binary object. The image is read and additional metadata pertinent only to images (not to generic files) is extracted and written to the model.

# models/image.rb
 require ‘RMagick’ # The rmagick gem is required to inspect/manipulate images

class Image < ActiveRecord::Base belongs_to :binary validates_presence_of( :upload ) attr_accessor :upload def after_upload( model, file ) # Insert any physical file validation requirements here image = Magick::Image::read( file.path ).first self.width = image.columns self.height = image.rows end end

Assuming everything goes well, we now have complete Binary and Image models as well as a valid file on our file system. To close out, we’re going to tell the Binary class to move the file to its permanent location, update any model properties accordingly and send the user to the image page so they can see their new upload.

class BinaryObserver < ActiveRecord::Observer
  observe :image

def before_create( model ) if model.respond_to?( ‘before_upload’ ) model.before_upload( model ) rescue raise end binary = Binary.new.upload( model.upload ) if model.respond_to?( ‘after_upload’ ) model.after_upload( model, binary ) rescue raise end binary = binary.store() model.binary_id = binary.id model.active = 1 rescue => e # # Because we’re raising an exception, Rails will rollback # the binary save operation at the database level. # File.delete( File.join( Rails.root, binary.path ) ) if binary # # Rethrow any exception that was raised. # raise e end end

What I Like

  1. It should take very little effort to support a new file type.
  2. The view and controller for a subtype doesn’t deviate very far from what’s provided by scaffolding. That makes for easy reading for new developers.
  3. The controller really has very little to do. The work is done down the stack by the models. I love a skinny controller.

What I Don’t Like

  1. I’d prefer to add the upload virtual attribute to the Binary class so I don’t have to add it to each new subtype, but doing so made the views more complex. Overall, it just feels like less effort to do it this way.
  2. Although it will take very little to support a new file subtype, it will require some effort (updating the observer, including the attribute accessor, etc.). Ideally, I’d like to have it be a no-effort kind of endeavor. That’s probably just wishful thinking.

It should be fairly clear from the title and intro that I’m on the (very) short side of the learning curve with Ruby & Rails. It’s entirely possible that I’ve made a mockery of any number of best practices or conventions. Constructive criticism is welcome. I’ve learned a lot by doing this, input from experience is never a bad thing.

Learning CakePHP: Validation

I started developing for the web sometime in 1995 while I was still in the military. I “turned pro” in 1997. At that time there were two methods of data validation: client-side and server-side. Yeah, I know, that’s still all we have, but I think that Ajax has helped to blur the line between the two – perceptually, I mean, not technically.

Then

Even back then I hated duplication. Frankly, I hated doing validation at all; I damn sure hated doing it twice. Moreover, I hated server-side validation because doing it (without also doing a whole lot of extra work) involved:

  1. Submitting a form to an action page
  2. Having that action page validate the form values
  3. Displaying (what was usually) an ugly error message to the user
  4. Asking the user to use their Back button to return to the form
  5. Expecting the user to remember the errors that had been displayed
  6. Asking the user correct each error
  7. Rinse
  8. Repeat

That’s a lot of lousy user experience. Since that sucked so mightily, I chose client-side. With Javascript, I could validate user input without having to make the round trip to the server. Talk to me all you want about graceful degradation, but first of all, that concept didn’t exist in 1996 and second…this is Javascript we’re talking about.

Now

With the relatively recent proliferation of API‘s, server-side validation has taken on a whole new urgency. With multiple gateways into an application’s business core, it’s more critical than ever that validation be moved as far down the application stack as possible in order to eliminate redundancy and, in the process, ensure consistency. For most applications that I’ve built in the last 5 years or so, there’s been some kind of API involved.

With CakePHP

Once I had worked with CakePHP for a few months, I got comfortable enough to start thinking about validation and how I wanted it to work from end-to-end. I had a few requirements:

  1. I only wanted to write my rules and messages once. Do something once and you don’t have to worry about consistency. One less pain point is always going to be a good thing.
  2. I wanted to be able to validate without necessarily saving. Most of the time I’d be doing both, but decoupling them is just a good idea.
  3. Hide from the user the fact that my validation was happening on the server. To them, generating and displaying error messages should be a smooth and seamless experience.

I came up with something that seems reasonably elegant and has met my needs to date.

The Basics

Although I’m not a big fan of the semantics, CakePHP nicely decouples the validation and save actions. I would expect this of any framework, of course, but that’s a place to start. If I’m validating something independent of a save operation, I usually want that validation to return one or more error messages. CakePHP makes this easy and even offers me a convenient means of using my preferred semantics.

Never, Ever Mess With the Core

This is one of my basic tenets for developing with third-party libraries. Messing with core code is almost certain to destroy your upgrade path. Find another way.

In this case, I want to be able to validate model data based on the rules defined by that model and I want this validation to be available to all models in my application in exactly the same way. To do that, I created a copy of CakePHP’s app_model.php file as app/app_model.php. By creating a copy in my app/ directory, I can preserve my upgrade path, but still inherit from the proper parent class.

Validate

In the newly copied AppModel class, I added this function:

/**
 * function validate
 * 
 * Validates model data. This function can be called independently
 * on any model for validation independent of a save operation. It
 * can also be overridden, for example, by a model requiring more
 * complex validation.
 *
 * param   $data
 * return  array     An array of error messages or an empty array
 *                   if the data validates properly.
 */
public function validate ( $data = array() ) { $this->set ( $data );

/** * Corrected based on input from Miles Johnson in the comments * below. */ return !$this->validates() ? $this->validationErrors : array(); }

That snippet alone provides basic validation across my entire application with a simple line of code:

Model->validate ( $this->data )

Complex validation

Something of a misnomer since this isn’t very complex, but there are times when more is required. For example, if I have customers and vendors that have addresses, I usually want to break the address out so that the data structure can be shared. Separate table, separate model. The data is abstracted, but I can’t really have a valid customer or vendor unless their address also validates. To do that, I need to override my simple validate() method in the Vendor (or Customer) model so that I can do just a little bit more:

public function validate ( $data = array() ) {
   return array_merge ( $this->Address->validate ( $data ), parent::validate ( $data ) );
}

User Interface

As I alluded to initially, what I really want to do is to perform server-side validation while providing the illusion of client-side validation. To do that, I employ the appropriate controller. If I’m able to submit the entire form via Ajax, this becomes trivial because the same request can perform both actions (provided no errors are reported). Here’s how it might look on a vendor application page (/vendors/apply):

public function apply() {
   $errors = $this->Vendor->validate ( $this->data );

if ( !empty ( $errors ) ) { /** Package the errors for an ajax return */ echo json_encode ( array ( ‘errors’ => $errors ); ); exit(); } /** If there are no errors…save */ exit(); }

If I want to validate first (again, via Ajax) and then submit to a different action page, all I have to do is create a validate() method in the controller, make the Ajax call to that and, if no errors are returned, submit the form to the preferred action. If errors are returned, I can use jQuery’s JSON parser to extract the messages and drop them on the screen.

With this technique, I can consolidate my validation, rarely write more than one block of code to access that validation and present the results to a user attractively without that user ever being aware that anything more than client-side error handling has been done. What technique(s) do you employ to minimize your validation headaches?

Labels. They're Not Just for Forms Anymore.

Think semantically, not dogmatically. Labels can be used to describe data as well as form fields. I can’t tell you how often I’ve seen something like this:

<div id="image-info">
   <span class="label">Name:</span><span>myimage.jpg</span>
   <span class="label">Size:</span><span>5KB</span>
   <span class="label">MIME Type:</span><span>image/jpg</span>
</div>

The obvious red flag here is that the name of the class is also the name of a tag, but a lot of examples aren’t quite so obvious. Nonetheless, when labeling data, use the label tag to do it. I’ve never seen any indication that it’s incorrect in any way.

Disable the System Bell in iTerm

Spend enough time in a terminal session and eventually the system “bell” will drive you nuts. I honestly don’t remember it being this much of an issue on my old Macbook Pro, but it’s been maddening since I got my new one a few weeks ago. Because of its bookmarks feature, iTerm is my emulator of choice and there’s nothing in its preferences (I’m using Build 0.9.6.20090415) that even acknowledges a system bell exists, much less allows me to disable it. I did a clean install when I got my new machine, so maybe this is a recent change. I haven’t looked at the release history to determine why it’s not there, I can only be sure that it’s not.

In a fit of desperation this morning, I decided to scour the plist file to see if there was anything I could do at a slightly lower level to quiet my terminal sessions. Fortunately, I found an answer:

  1. Navigate to ~/Library/Preferences.
  2. Open net.sourceforge.iTerm.plist in your favorite plist file editor. I use Property List Editor.app because I have XCode installed and the app is available to me. There are other plist editors out there or you can just open the file in a text editor – it’s just an XML file with a fancy extension.
  3. Navigate the XML nodes (different editors may offer different means of drilling down) to Root > Terminals > Default > Silence Bell
  4. Click the checkbox to enable that property.
  5. Save the change.
  6. Restart iTerm.

Enjoy the silence.

MacPorts, MySQL 5 and the Launch Daemons

Update, 7/29/2009: In response to my question about this on StackOverflow, Mike Richards offered an infinitely better solution. Apparently MacPorts is effectively deprecating the mysql5 +server path in favor of a new mysql-server package. I can’t confirm this personally, but it sounds reasonable enough.

That sounds a little bit like a Harry Potter title, but the content isn’t nearly as entertaining. For the past year or two, I’ve been using a MySQL installed via MacPorts, the (pseudo-) apt repository for Mac ports (get it?) of Unix applications and utilities. MacPorts has been fantastic and I haven’t regretted the decision to move away from either OS X’s native MySQL install or from MAMP, an all-in-one solution that I had used previously. The last few times I’ve installed MySQL, though, I’ve noticed that I haven’t been able to get MySQL to start automatically when I login.

Following Chad Kieffer’s excellent tutorial for installing & configuring a MacPorts MySQL install, I would get myself to the point where I execute launchctl to load the plist file that will start MySQL automatically:

$ sudo launchctl load -w /Library/LaunchDaemons/org.macports.mysql5.plist

Unlike Chad, I want MySQL to start automatically. Admittedly, my work-life balance sucks; I’m more than likely doing something work-related if I’m sitting behind the keyboard. Given that, the server might as well be ready to respond, right? Except that the plist I’m trying to load…isn’t there to be loaded.

The first time that I did the install, the plist was there and loaded as expected, but the last 2 or 3 times that has not been the case. I don’t know what changed with the MacPorts bundle, but that plist simply isn’t there. Fortunately, I still have my old install around, so I faked it.

If anyone else is having the same issue, here’s how you too can fake it:

  1. Create a directory for the launch scripts.
    $ mkdir -p /opt/local/etc/LaunchDaemons/org.macports.mysql5
  2. Download the files that no longer get installed, mysql5.wrapper and org.macports.mysql5.plist. I’m making mine available since I don’t know where else to get them. Save both files to the directory you just created.
  3. Set the proper ownership and permissions.
    $ sudo chown root:wheel /opt/local/etc/LaunchDaemons/org.macports.mysql5/*
    $ sudo chmod 755 /opt/local/etc/LaunchDaemons/org.macports.mysql5/mysql5.wrapper
    $ sudo chmod 644 /opt/local/etc/LaunchDaemons/org.macports.mysql5/org.macports.mysql5.plist
  4. Create a soft link to the newly downloaded plist file in /Library/LaunchDaemons.
    $ cd /Library/LaunchDaemons
    $ ln -s /opt/local/etc/LaunchDaemons/org.macports.mysql5/org.macports.mysql5.plist org.macports.mysql5.plist
  5. Load the plist file, as indicated in Chad’s instructions and duplicated above. For the sake of keeping it all in one place:
    $ sudo launchctl load -w /Library/LaunchDaemons/org.macports.mysql5.plist
  6. Reboot.
  7. Verify that MySQL has started.
    $ sudo ps -ef | grep mysql

You should see output that looks something like this:

    0    65     1   0   0:00.00 ??         0:00.00 /opt/local/bin/daemondo --label=mysql5 --start-cmd /opt/local/etc/LaunchDaemons/org.macports.mysql5/mysql5.wrapper start ; --stop-cmd /opt/local/etc/LaunchDaemons/org.macports.mysql5/mysql5.wrapper stop ; --restart-cmd /opt/local/etc/LaunchDaemons/org.macports.mysql5/mysql5.wrapper restart ; --pid=none
    0    85     1   0   0:00.01 ??         0:00.01 /bin/sh /opt/local/lib/mysql5/bin/mysqld_safe --datadir=/opt/local/var/db/mysql5 --pid-file=/opt/local/var/db/mysql5/rob17.local.pid
   74   111    85   0   0:07.48 ??         0:19.75 /opt/local/libexec/mysqld --basedir=/opt/local --datadir=/opt/local/var/db/mysql5 --user=mysql --pid-file=/opt/local/var/db/mysql5/rob17.local.pid --socket=/tmp/mysql.sock
  501  3370  3145   0   0:00.00 ttys003    0:00.00 grep mysql

If you do, then you’re golden. If you don’t, then you probably made a mistake. If the mistake is mine, please let me know in the comments and I’ll make the appropriate adjustments.