RSpec Mocks for the Sad and Desperate
No matter how good I think I am at something, I try to remember that I’m always learning and I can always be better. This is especially true with all things technological, whether it’s server or code, and one of my favorite things to suck less at is RSpec. When I first started working with Ruby and Rails, I was mystified by it. Written poorly, a spec quickly turns into a mess of muddled expectations and shoulds, vague statements in large blocks that can require more management than the code it claims to prove; written well, it’s a dream, a revolution, a safety net and documentation all in one.
Lately, I’ve been working on leveling up my specs quite a bit. It’s sort of embarrassing to say but I had a really hard time understanding exactly how or why someone would use mocks within a spec. I mean, I understood the concept — stand-ins for objects and methods to ensure you are only testing a small subset of your code, not related objects or methods — but I found the example code somewhat hard to read. It wasn’t until I committed a little time and had to start writing my own that I realized it isn’t difficult at all, it’s just that most of the resources out there that attempt to explain them do a crappy job.
So here’s the deal: a double is a dummy object that acts as a stand-in for a real object. I’d say to think of it like a stunt double but it’s more of a crash test dummy, a mannequin object without methods of its own. It’s a shell, a placeholder, and you use stubs to create its methods and responses.
Let’s pretend we want to test a method, `get_score`. This method exists in a mixin and performs some sort of magic on User objects to return the user’s score. The thing is, since your module doesn’t actually retrieve the score, you don’t want to test anything on User, you just want to make sure that the proper call is made. To do that, you create a double to stand in for a user.
describe MyModule::ScoreGetter do let(:clazz) do Class.new do include MyModule::ScoreGetter #contains the get_score method end end describe 'method get_score' do let(:user) { double("a user object") } let(:obj) { clazz.new } it 'returns a score' do expect(obj.get_score(user)).to eq 50 end end end
Note that we created a double to represent a user instead of calling User.new. Easy so far, but now what? Now we start stubbing some methods.
RSpec’s documentation says a stub is “an instruction to an object (real or test double) to return a
known value in response to a message.” In practice, that really just means that a stub is a stand-in for calling any actual method. You write the expected response of a method call rather than actually calling the method. WHY, though? Imagine our code for `get_score` looks something like this:
module MyModule module ScoreGetter def get_score(user) complicated_private_method(user) end private def complicated_private_method(user) #all sorts of wacky shit, it's preparing stuff on your calling object or something, ultimately ending in... user.score_generation(self) end end end class User #code def score_generation(caller) #sophisticated score generation algorithm end end
We stub because we don’t want to test `score_generation`. It goes outside the scope of this unit test, maybe it involves a database and we don’t want to perform queries, maybe it’s slow, maybe it uses someone else’s code and it’s unstable. If it fails, your `get_score` method will fail, even if `get_score` isn’t broken! All we want to test is:
- our class calls `get_score`
- `complicated_private_method` calls `user.score_generation` and returns a score
That’s it. Doing this is easy.
it 'returns a score' do expect(user).to receive(:score_generation).with(obj).and_return(50) expect(obj.get_score(user)).to eq 50 end
You are expecting `user` to receive the method `score_generation` with `obj` as a parameter and you decided it will return `50`. The syntax here might be a little misleading: you aren’t expect it to return 50, you are saying IT WILL RETURN 50. Your expectation is on the method and its parameter.
Alternatively, if you don’t want to declare at as an expectation and just want a single expectation for your test, you can use the `stub` method with a very similar syntax.
it 'returns a score' do user.stub(:score_generation).and_return(50) expect(obj.get_score(user)).to eq 50 end
All we’re saying is, “when the user double receives a call to `score_generation`, return 50.” This is not an expectation, though, so your spec won’t fail if this doesn’t happen unless another expectation relies on it. In other words, we could have this:
it 'returns a score' do user.stub(:score_generation).and_return(50) user.stub(:do_this_thing).and_return(:foo) expect(obj.get_score(user)).to eq 50 end
…and your spec would still pass because `do_this_thing` is acting as a double’s method definition, not an expectation. Change that to `expect` and your spec’s passing depends on it.
Again, we have to do this because our double doesn’t have a `score_generation` method and even if it did, we wouldn’t want it to actually be called because we don’t want to rely on that class or its damned dirty methods. An important thing to remember is to declare your stubs before the method is actually called; if you don’t, the method will be called before the stub is in place and it won’t know how to behave.
For me, one of the trickier parts of working with this stuff comes when I’m looking at someone else’s code and they’re using a lot of stubs to unfamiliar methods. Effective stubbing expects you to understand the flow of messages within the code. There have been occasions where I’ll add or modify a method and not have the test show the expected change, only to discover that a method further up the chain has been stubbed and my method isn’t actually being called! Read carefully, understand completely.
That’s it for now. Hopefully this will help someone along!
Neo4j Lessons Learned Followup
It’s been just over two months since my last post, where I wrote about building and self-hosting a site on a budget with Rails and Neo4j. Since then, I haven’t made many changes to the site itself — a couple styling tweaks, some new admin controls — but I’m happy to report that it’s been rock solid, extremely fast, and yielded unanimously positive feedback. Go team! There are two updates to that article that I want to address.
First, while researching the optimization of Cypher queries, I came across a mention of a simple way to improve performance when using the embedded database. I checked against the code in neo4j-core and found that it wasn’t following this best practice, so I made the change. It’s been committed but since there hasn’t been (and won’t be) a new version of the gem pushed on the v2.x branch, you should update your Gemfile to reflect it.
gem ‘neo4j-core’, git: ‘[email protected]:andreasronge/neo4j-core.git’, ref: ‘99644256a6’
I do so much caching in Rails can’t claim I saw a huge change on my server, but my tests in the console confirmed that it worked. It changes the way the query engine works and lets the database cache queries correctly. It certainly can’t help.
Second, I neglected to mention a really important lesson from my experience:
Never trust relationship order to be consistent and predictable.
Phillymetal.com uses a node for shows, additional nodes for bands, with a relationship (“playing_shows” or something) between them. The ORDER of the bands is very important. When I tested, I found that it was consistently returning the bands in the order that they were specified after I imported the data.
Unfortunately, a few days after the site launched, I discovered that on my production server, new show submissions would be in the wrong order immediately after the form was submitted, then they’d flip to the right order after the show was modified in any way. It was odd. After a little while later, I found the problem, but it doesn’t matter because I was wrong to assume it would just be in the right order in the first place.
If you want a series of relationships to appear in the right order, put a property on the relationship that you can use to sort. In my case, I actually had been saving the index from the submitted form just in case, I just wasn’t actually using it to sort, so my fix was easy. Don’t make the same mistake.
If you’re in NYC and want to talk about any of this stuff (or at least listen to me talk about it), I’ll be presenting at the NYC Neo4j Meetup on July 21, 2014. Info here. It’s just going to be a discussion and Q&A. I expect we’ll cover a lot of the same stuff as these blog posts but considering how smart the crowd at the last Meetup was, I can only imagine that I’ll learn a bit from it, too.
That’s it for now. Lots more to write about later, hope someone finds this stuff useful!
Hosting My First Rails/Neo4j Project: Tips and Lessons Learned
After months of hard work, the new phillymetal.com is online. Go check it out. Lauren and I are extremely proud of it and feedback so far has been great. We haven’t made an official announcement about it through the email list or even on our personal Facebook pages because we want to make sure all the bugs are ironed out and the server is stable, but it’s not like it’s a secret so there’s no harm in writing about the experience.
As I detailed in my last post, the site used to be a giant mess of PHP code drawing from MySQL. It was, to put it professionally, a total clusterfuck and every time I go through the code or the database, I think it’s a marvel that it was so solid and actually worked. Of course, a big part of that was the simplicity of the backend and a lack of anything even remotely complicated as far as storing data was concerned. On the Shows side, almost nothing was truly relational other than an individual show and the bands booked to play it; however, each show’s bands were independent from one another, so a band booked on 50 shows had their information in the database 50 times. This made for some fast queries — the site was blazing — but that’s about all I can say about it.
We decided to move to Rails and Neo4j.rb because that’s what we’re using for a bigger project. In a sense, Phillymetal.com is our exhibition round. I was especially curious about the hosting. Would I be able to get away with modest hardware or would I need a beast of a machine (or machines?) just to function? Would backup and restore be tricky? Would it be stable? Would updates be a hassle? All of the material I could find on Neo4j.rb and Rails dealt with usage focused on development. I still haven’t found anything indicating that anyone else is even using this combination in a production website!
I learned all these things and more. In fact, I learned so much that I want to share this experience in case anyone else is interested about doing something similar. To be clear, this wasn’t my first Rails project, just my first app that uses Neo4j.
First, some basics. We are running Rails 4.0.3, JRuby 1.7.10, and Neo4j.rb 2.3. It’s very important that you use the latest commit to the Rails 4 branch from the Neo4j.rb Github in your gemfile, not the latest official release. Also make sure to explicitly include the latest commit to the Neo4j wrapper at its Github page for a critical bugfix that kills certain Rails form submissions. (I fixed it, you’re welcome.)
Our database is very small, only about 50MB and a couple hundred thousand nodes. As a result, we were able to get away with an absolutely tiny server from DigitalOcean with 2GB of RAM. Performance tuning was a huge part of that, though, and we’ll get to that in a minute, but your mileage may vary when it comes to what kind of hardware you’ll be able to get away with.
Now, without further ado, my lessons learned from migrating from PHP/MySQL to Rails/Neo4j.
1: Test everything
All Rails articles you read tell you how important testing is. Of course, coming from PHP, I had a lot of bad habits that came from having no standards, no guidelines… no rails, basically. I was also able to just pop into my server and fix code on the fly whenever something went wrong. With Rails, especially using JRuby, pushing updates is a bit of a pain because of the compilation and Java app deployment processes, so testing is not just good for the site’s stability but also for your own time. I don’t have the time to be unsure, so I need tests.
2: Just because some queries are easy, it doesn’t mean those queries are always cheap
If you’re on a small (or nonexistent, in my case) budget, you need to be very concerned with squeezing performance out of cheap hardware. Neo4j makes it very easy to go wild with relational queries, since its whole mantra focuses on blazing fast retrieval and logical organization of data and all that stuff, but that doesn’t mean you should always try to use those abilities.
Case in point: the “last post” and “replies” columns on this dumb page. When I first put this together, I was calculating those fields dynamically for each post. Rails and Neo4j make it easy to do that: Last post was a matter of topic.posts.to_a.last.poster.username and replies was just topic.posts.count – 1. Did it work? Of course. Was it fast? Yes… sort of… in small doses. Counting of posts wasn’t much a problem, Neo4j and Rails do that easily, but figuring out the last poster’s username got sort of expensive on budget hardware, and that is really the key here: budget hardware. In my tests, the more power I gave it, the better it worked, but I didn’t want to spend any more money on this than I had to.
More importantly, why did I think it was necessary to calculate those things on the fly? Just because I could? Those things changed so infrequently that, in the end, I decided to store them as properties on the Topic model itself and just update them when they changed. It may be a bit less high tech but it’s better for performance, and at the end of the day, I decided to prioritize user experience over exploiting every possible capability of my technology.
3: Cache to save cash
When I started building this, I was using Rails 3.2. Rails 4 seemed kind of interesting but nothing really made me feel like I had to upgrade immediately. It also didn’t help that I wasn’t the biggest fan of moving mass-assignment security out of models and into controllers. The protected_attributes gem didn’t work with Neo4j.rb’s Rails 4 branch so I’d have been forced to use strong_params, which I wasn’t ready to do.
What changed all that? Rails 4’s cache improvements. Even though I could have included the gem that provided that functionality to Rails 3, I figured it was a good enough reason to make the jump to Rails 4. After a failed attempt at making protected_attributes work with Neo4j.rb’s Rails 4 branch, I even upgraded that side of my code, too. (I still don’t love it.)
One of the problems that took me far too long to actually diagnose was that Neo4j.rb 2.3 models didn’t have the cache_key method required for cache digests to work properly. The reason I said to use the latest commit to the Rails 4 branch at the beginning of this post is that it includes my fix for this, which is based off the Mongoid implementation. With that in place, cache digests work great with Rails 4 and Neo4j.rb!
I cache all of the show information at https://phillymetal.com/shows. It is critical to the performance of the page. In development, I found that it wasn’t the queries themselves that were terrible — Neo4j really does move quickly — it was drawing the partial. I store a lot of show information in the relationships themselves; in particular, the band descriptions and links go in the relationship if the show promoter wants information different from the bands’ defaults in the database. Because of that, the server has to look at each band, compare its relationship description to default description, and present whichever is appropriate. This is fast when you’re looking at a single show, less fast when you’re looking at dozens of shows, and even worse when it’s on a public page that has multiple concurrent sessions. More importantly, it is information that is the same literally every time it is pulled up, so it belongs in a cache.
If you are using Torquebox, as I recommend through the rest of this post, you can enable the Torquebox cache in production.rb by simply setting **config.cache_store = :torquebox_store** and calling it a day.
4: Build solid admin tools
I was spoiled by PHPMyAdmin. Because Phillymetal.com is a small site that doesn’t do very much and isn’t very needy, I got used to performing certain tasks directly from the database. For instance, on the rare occasion that a discussion topic needed to be deleted, I’d do it from there. User needed to be banned? Database. IP lookup for a problem user? Database. Owner of a show? Database. At the worst, there was actually no password reset function built into the site. I would change a dummy user’s password, copy the password hash to the user requesting the reset, inform the user, and then change my dummy password back. Wow.
Neo4j makes that impossible. Not only is there a bug that prevents the Neo4j admin from working with my combination of Neo4j.rb, JRuby, and Rails, I wouldn’t be able to make changes as easily as I had in the past even if I wanted to because of the way data is organized. This is fine by me, though, since the site really did need admin tools (and a freaking password reset… holy shit, man! It’s 2014, come on!) and Rails made it easy enough to build them. Still, if you’re a solo admin running a small site, set aside some time to build admin tools for your management tasks that used to be handled directly in the database. You will not have the easy access to the database that you are used to.
5: Bone up on Linux, you’re going to be doing everything yourself
There is no Heroku, there is only you. Neo4j.rb 2.3 uses Neo4j embedded and is therefore incompatible with the most popular PaaS out there. Torquebox and JBoss are supported by OpenShift but if you’re going to take the time to learn that and you have budget concerns, you might as well save some money and get smarter by learning to do it yourself.
There is an excellent, quick walkthrough on AmberBit that takes you through installing Torquebox and deploying with Capistrano. Some changes you should make:
Use the latest version of Torquebox. As of the writing of this post, it was 3.1.0. I had some issues with that version on my first deployment, email me if your site doesn’t load — you may need to make a custom .knob file.
The upstart task needs to be modified for Ubuntu 12.04. Open /etc/init/torquebox and do this:
#start on started network-services
#stop on stopped network-services
start on runlevel [2345]
stop on runlevel [016]
Also do this for Neo4j:
#limit nofile 4096 4096
limit nofile 40000 40000
Make a folder called db in /home/torquebox/shared and in deploy.rb, modify your :finalize_update task with this:
run “rm -r #{release_path}/db”
run “ln -nfs #{shared_path}/db #{release_path}/”
I also modified it so the log file would use the shared path instead of the release path.
5b: Setup your backup
This is part of knowing Linux but it’s so important that I want to highlight it separately.
You need to configure a backup script for Neo4j since it can’t be copied while the server is running. This is actually extremely easy as long as you’re running the Enterprise version, which you can legally do as long as you have a license. If you’re a solo or small team of developers that meet the criteria, you can get a license for free, just register.
Include the neo4j-advanced and neo4j-enterprise gems in Gemfile.
Add the following lines to application.rb:
config.neo4j[‘online_backup_enabled’]=true
config.neo4j[‘online_backup_server’]=’127.0.0.1:6362′**
**
The first one is self-explanatory. The second one is necessary because if you don’t explicitly tell it what IP to listen on, it will bind to 0.0.0.0 and allow literally anyone to run backups of your database over the internet. This feels like a terrible default, I hope it gets cleared up in the future!
When you register, Neo will send you to link for the latest version but we don’t want that, we want the version that matches the Neo4j embedded in our app. Change the filename in their link 1.9.5, same as Neo4j.rb is running, and you can download that version. Save it to your server and inside of /bin/ you’ll find neo4j-backup.sh. All you need to do now is write a script that will perform your backup. I find that there’s some sort of bug that prevents incremental backups from working correctly so for now, you’ll need to clear the directory every time the script runs. Here’s my very barebones script:
rm -r pm-backup
/root/neo4j-enterprise-1.9.5/bin/neo4j-backup -from single://127.0.0.1 -to /root/pm-backup
rm pm-backup.zip
zip -r pm-backup.zip /root/pm-backup
Cron runs it nightly, DigitalOcean takes a snapshot of my server nightly, I sleep soundly.
6: You must take your site down to update code
This is easily my least favorite part of this entire setup. Torquebox supports no-downtime updates of Java apps but since all releases share the same database and only one app can have the embedded DB open at a time, you have no choice but to stop your entire site every time you want to update. The way around this is with a Neo4j cluster, but if our goal here is to minimize hosting cost, this isn’t an option. Plan your maintenance windows carefully and get your management tools in place to minimize reboots.
If you do need to reboot, you can do it quickly by managing it carefully. I do mine in stages.
Every update starts by deploying with Capistrano. I modified my deploy.rb so it doesn’t try to restart the server, meaning Capistrano is basically staging the update files. This is useful if you want to automate this process, maybe by having a cron job that restarts your application server nightly. You can deploy at any time and know your updates will be processed later. (But that’s not what I do, so let’s keep going.)
Next, I have an Nginx site defined in /etc/nginx/sites-available that just loads a basic Offline message. When I’m ready to restart Torquebox, I run a script to unlink the live site and link the maintenance site conf files in Nginx, reload Nginx conf, and stops Torquebox. After that, I manually run service torquebox start and then tail /var/log/torquebox/torquebox.log -f until I see that it’s fully started. I run the pm-online.sh script to unlink my maintenance site, link the production site, reload Nginx, and I’m back up! The whole process, excluding the deployment, takes less than a minute. I had two occasions in the past week where Torquebox gave an error at load, so following the log file is something I always recommend just in case.
Of course, I recognize that this is far from ideal. In a busier site, a Neo4j cluster would be crucial to prevent downtime. Thankfully, for small sites like this, 60 seconds of downtime in the middle of the night or for emergency maintenance (I discovered the public Neo4j backup port issue while writing this article on a Sunday afternoon and had to patch it immediately!) is acceptable.
7: Familiarize yourself with Java application deployment and management concepts
You may think of yourself as a Rails developer or a Linux administrator but as soon as you start using Torquebox, you are also a Java app server admin. Aside from the server restarting process, dealing with this was my least favorite part of this entire project, mostly because there are not many resources out there for people who are getting started with it. Spend some time reading through all of the Torquebox documentation very carefully. You should also buy Deploying with JRuby by Joe Kutner, available here. Even though it’s a little out of date at this point, Joe’s book makes a clear case for why Torquebox is the way to go. Be careful when reading through his sections on TorqueBox jobs, since newer versions handle them quite a bit different than his instructions. The basics remain the same and as an introduction to the benefits of Torquebox, it’s a great start.
One crucial part of tuning your environment is giving TorqueBox enough RAM to work with. Out of the box, its RAM defaults are very low, so tuning them for your system by opening /opt/torquebox/jboss/bin/standalone.conf and finding this line:
# Specify options to pass to the Java VM.
A few lines below that, you’ll see JAVA_OPTS=”-Xms[SOMETHING]m -Xmx[SOMETHING]m -XX:MaxPermSize=256m -Djava.net.preferIPv4Stack=true”. I don’t have the default one sitting around so I can’t tell you exactly what it says but you’re going to want to set those numbers to values appropriate for your environment. There is A LOT of information out there on setting these values for Java apps but in my testing, I found that setting Xms and Xmx to the same value gives me the best performance. In my environment, that line looks like this:
JAVA_OPTS=”-Xms1792m -Xmx1792m -XX:MaxPermSize=256m -Djava.net.preferIPv4Stack=true”
You want to make sure that Xmx + MaxPermSize does not exceed the maximum amount of RAM in your system. I have 2GB of RAM, so I’m probably cutting it a little close here. Neo4j will consume more RAM the longer it has been running and Xmx will define the upper limit for Torquebox and the processes it spawns. The larger your database is, the more RAM it will require. Despite my DB only consuming 50MB on disk, I find that Torquebox consistently uses at least 1.2GB of RAM. Testing also revealed that giving it much more than that is a waste, it just doesn’t grow large enough.
If you want to know more about how Java uses RAM, read up on garbage collection… but don’t go crazy with it. I spent days trying to make garbage collection faster and less frequent when the real solution was caching and smarter use of my database, as described above; still, it’s good to know. My favorite resources were here, here, and here.
8: All jobs must be run from within the app
What exactly do I mean? I mean that any cron or Torquebox job must interact with your webserver to perform work, not access the database or Rails directly, because you can only have one running instance of Neo4j embedded without a cluster.
In my case, I have a few different jobs, but one of them syncs Facebook stats for listed events every hour and saves the retrieved information to the Show nodes. If I was using PostgreSQL, I would just have a Torquebox job that executed Show.fb_sync. That is not an option here, though, so I POST a particular form to a particular path that, when received by the controller, executes Show.fb_sync from within the app. I do this sort of thing for any job that requires interaction with the database. Not terrible or truly shocking but it’s the sort of workaround you have to be prepared to implement.
9: Neo4j.rb and Rails are a perfectly reasonable pairing to consider instead of PostgreSQL and Rails for small projects
This really was the most important thing I learned and, really, what I hoped to discover at the end of this journey. My fear was that the lack of small projects discussing their experiences with Neo4j.rb was due to the fact that the hosting requirements in particular were too complicated or costly. In reality, anyone looking for the benefits of NoSQL without losing relational flexibility should consider the Neo4j.rb/Rails combination without fear.
So there you have it! You may have noticed that not much of this was really specific to Neo4j.rb; in fact, most of these lessons were really best-practices web development: don’t overuse queries, cache wherever you can, build proper administrative tools, test everything. Neo4j.rb + Rails + JRuby don’t require you to reinvent the wheel, they just force you to kinda… use a different kind of wheel, I guess. My experience made me feel a lot more confident in the technology going forward. Neo4j.rb 3.0 will allow developers to use the Neo4j REST API, removing the requirement of JRuby, and opening the door for deployment to Heroku. I’m looking forward to it reaching the point where I can use it for my projects but I’m not sure sure I’m curious to see if there is a noticeable performance tradeoff as a result. Time will tell. Until then, I’ve found that doing things this way isn’t terrible and I’m looking forward to my next project.
Wrapping your PHP Mind Around Rails MVC
(The following contains many extreme simplifications of concepts. Please do not comment to object about this unless I am extremely wrong and need to correct a statement!)
I learned how to use computers when I was very little. My parents’ friends would come over to fix the computer and I would watch them, very carefully, and ask questions about the commands they entered. Early on, I learned a very basic concept: a file is a container for code. I sometimes think that this gets past most computer users, this idea that every program or website you access is made up of any number of pieces that perform different tasks to make something come to life, but it’s always helped me troubleshoot and learn.
I remember realizing that a .TXT file contained text; I remember realizing that HTML files were ALSO just text and they could be opened by Notepad, too. I discovered that going to index.html on a website meant that my browser was receiving the text and then building the page uniquely for me. When I used to work with clients doing tech support, I’d explain that going to a website isn’t like watching TV, where the picture was sent to your screen and presented as a complete object; no, a website is built on the fly for them. When you’d go to index.html, you were accessing a specific file on a server. Somewhere, there was a file sitting in a directory just like the directories on every other computer and it contained the code that made the website appear. It was a blueprint, not a snapshot.
When I started learning PHP six or seven years ago, this concept got a bit more advanced. Yes, index.php was an actual file, but the source code I saw was created by the server and the underlying PHP code was not directly accessible to the browser. OK, easy enough.
And then I tried to learn Django and everything got fucked up. I knew there was an index page, but where was index.py? When I went to the root of a site, what file was it loading? If I wanted to create a variable, why couldn’t I define it in the view? My mind was blown. I felt overwhelmed, I was impatient, so I went back to PHP for many years.
A bit over a year ago, I began a new project, something that I am now doing full-time and will hopefully unveil sometime in the next few months. Raw PHP was not going to cut it, I needed some sort of framework, and I wanted something modern with a large community. I went back to Django and started to make sense of it. I rebuilt woeunholy.com using it but had trouble moving it to production. More importantly, I discovered that the database I wanted to use was not really ready for Django. It was, however, very popular with Ruby on Rails, so I jumped over to that.
Coming from a one-file-one-action background of HTML, PHP, and general tech support, the hardest part of learning Rails was getting my head around its MVC implementation. When I talk to friends who come from the same background, they say the same thing. It is the one of the biggest if not THE biggest stumbling block to learning Rails. The amount of background information needed to go from installing the framework to your first “Hello, World!” is much greater with Rails (or Django or any MVC framework) than it is with PHP, and I’d say that overcoming this one-file-one-action mindset is at the root of the problem. I’m going to attempt to explain it and with any luck, it will help someone else get past the initial hurdles so they can make their lives better by getting away from raw PHP.
A lot of articles describing “MVC” use variations of the same few words but, in my opinion, frequently do a lousy job of really explaining anything. (As an aside, I find that a lot of articles that offer intros to Rails or MVC just use ripped off content from other sites in the interest of drumming up ad revenue, but that’s a story for another day.) Let’s start with some definitions and, more importantly, some examples of how they fit together.
The M in MVC stands for the “model.” Your model might be a book, it might be a page, it might be a sentence. Models define types of things. Phillymetal.com, which I’m rebuilding now, has 7 models that are central to the application: band, message, post, show, topic, user, and venue. Each model defines what those objects look like. In Rails, they also define a lot of logic that deals with interacting with them: what their required properties are — for example, you can’t have a band without a name. The models also automate the core interaction between the application and the database. It creates the “getters” and “setters” that tell the database what to create. None of that stuff really matters for this discussion; for now, just know that a model defines a thing.
The V in MVC is the “view.” The view is the part of the application that gets shown to the user. In Rails, your views are templates, where you fill in the static content (you include HTML in a Rails view) and mix in Ruby code to perform basic logic and work with data provided by…
…the controller, the C of MVC. The controller handles the flow of information from model to view. It’s like the central hub and is the most difficult part of this to really grasp, in my opinion. The controller is where we need to really throw the one-file-one-action paradigm out the window.
In Rails, we think in terms of actions. The controller defines actions and prepares data that is sent to the view. Once a request reaches the server, the flow of logic is like this:
The router receives the request for a resource (typically a model) and locates a controller and an action within that controller that is responsible for the resource.
The controller performs whatever activities are defined for that action. It might tell a model to run a database query and if it does, it might package the result of that query into a variable.
It then initiates a view, usually by rendering a template, and sends this data back to the browser. If there are variables available, the controller injects them into the view.
That’s it, really. It’s not that bad. But it means that for every request, there are multiple files and the flow of information jumps from place to place. A request hits the router (routes.rb) and goes to a controller (shows_controller.rb) and then goes to a model (show.rb) and then back to the controller (shows_controller.rb) and then to a view (maybe index.html.erb). Knowing where to look to find behavior can be tricky, so let’s look at a more specific example.
User browses to phillymetal.com/shows/
Instead of loading index.html or index.php, the Rails router looks at a table to determine what controller is responsible for the “shows” resource. It discovers a controller called “shows” and knows that the default behavior of sending GET to shows is to execute the “index” action
The “index action” might look like this:
def index
@shows = Show.all
end
That means it creates a variable, @shows, that is the result of asking the Show model to find all objects of type “show.” The key here is that the Show MODEL is doing the actual query, not the controller. All the controller knows is that it is setting a variable, @shows, equal to the result of calling “all” on Show.
When Show.all completes, Rails knows to look for a view that matches the name of the controller and the specific action. In this case, it is looking at the “shows” views for a file called index.html.erb. Why that file? Because the index view is the result of calling the index action.
We can express this whole transaction through a flowchart.
User requests “shows” -> router determines that “phillymetal.com/shows” maps to the “shows” controller’s index action -> shows controller performs whatever activity is defined in index (at “def index”) -> controller reaches out to Shows model -> Shows model performs its task and sends the result back to the controller -> the controller sends the variable to the view and generates HTML -> HTML is sent to the browser
Take it slowly and it’s not terrible.
The concept of different actions takes some time to really wrap your head around if you’re used to “double-click index.html and the code will be interpreted.” At the end of the day, you really need to just know some basics about how they work. It gets a bit more complicated when you start dealing with REST requests in combination with actions, but we can make that kind of simple, too.
For a controller to execute an action, it uses a specific verb to describe what type of behavior is expected from that action. The verbs are recognized by the server; some might be more familiar than others. Any HTML or PHP developer certainly knows POST and GET requests, but have you ever thought about why they are called this? It is because the server is actually being sent the word “POST” or “GET” along with the URL holding the content requested. Based on the verb, it knows what you are trying to do. When you tell it “GET index.html” or “GET /shows/” the expected result is that nothing is changed, something is opened and passed to the browser. With POST, it is expected that you are sending something, you are creating something. Rails builds these actions in and has default behavior that maps verbs to actions.
GET /shows/ executes the “index” action on shows. You could expect a listing of events.
GET /show/1/ executes the “show” action on the show with ID 1. You’d expect information for a specific event.
GET /shows/new executes the “new” action on shows. You’d expect a form for a new event.
POST /shows/ executes the “create” action on shows. The controller would expect a form with data that would be sent to the Show model to create a new event.
GET /shows/1/edit executes the “edit” action on show the show with ID 1. You’d expect a form that lets you edit the details for the show with ID #1.
PATCH /shows/1/ executes the “update” action on the show with ID 1. The controller would expect a form with data that tells it how to update that specific event.
DELETE /shows/1/ executes the “destroy” action on the show with ID 1. I bet you can guess what that does.
In all cases, the word in caps is the verb sent to the server along with the URL. You can have one URL with different actions and Rails will know to behave differently in each case. http://somesite.baz/shows/ can give different results if you tell the server to GET it or POST to it. This table in the official Rails documentation explains this exact concept but without cursing as much.
I found that getting rid of my one-file-one-action mindset was the most difficult part of learning Rails. Once you get past it, it’s a real dream to work with because it automates so many frustrating, time-consuming parts of development.
There’s a lot more to say about this. The Rails MVC concept is not perfect. There’s a phrase Rails people love saying, “Skinny controllers, fat models,” that attempts to remind you to keep all the heavy lifting out of your controllers and just let them shuttle information between models and views, but I think that it encourages some bad behavior. I’ll write about that another time. For now, if you’re just getting started, try to forget about index.php. Remember that it’s index ACTION, not index FILE. Router to controller, controller to model, model back to controller, controller to view, view to browser.
But that’s about it for now. Once you get past this, you will fall into a rhythm and instinctively know where to look when something doesn’t work right. You’ll start writing tests for your models and controllers and realize that the part of your app that interacts with the browser can be handled almost independently of everything else. The learning curve may be steep but don’t give up! It is worth your time and you will be glad you stuck with it.
MetalURL.com: it makes your URL metal
I’ve been working on web stuff for this past year with Lauren, my girlfriend; in fact, if you go to subvertallmedia.com, you’ll find that we’re now using it as our portfolio site. Our first collaboration was Woe’s “Withdrawal” site early last year, we’re finishing up our full rebuild of phillymetal.com, and we have another project that’s occupied all the rest of our time that I’ll be able to announce soon. One other project was metalurl.com, which I’m happy to say is online now.
MetalURL.com is in the class of “URL shorteners,” I guess, in that you give it an address and it spits out a different one, but it does not shorten the address, it just makes it more metal. So, for instance, I can now link someone to http://metalurl.com/315-dragon-infiltrator-bathory instead of blog.subvertallmedia.com.
This site owes a heavy debt to shadyurl.com, which makes any URL look like something your corporate content filter would block. I wanted a way to make metal URLs appropriate for any forum. Don’t think http://www.youtube.com/watch?v=s6_EJ-6WF7g is as ferocious as it should be? Just send them to http://metalurl.com/34-wolf-satan-torture instead.
The backend of the site is quite simple. Rails 3, jQuery handles the submit and returns the link. Very simple basic auth for my backend, since we only have one user and didn’t want to make it too complicated. It’s on a free Heroku instance + $9 database add-on. Lauren handled every part of the interface, including the creation of the three logos and their companion themes, and I took care of the backend. It was a fun project, I’m happy to see people using it a little.