For a few months, we’ve had a few reports by users from Neo4j.rb users of an odd bug. The story goes, “I try to create a relationship between two nodes but the type checking tells me that one of the nodes is not of the appropriate type but I know that it is.” In code, it could look like this:

student = Student.first
lesson = Lesson.first
# Creates a relationship between the lesson and student
lesson.students && student

More specifically, it was always reported by Rails users dealing with create actions. They’d be loading a node, creating a new node of another class, and associating the new node with the old.

student = Student.create(student_params)
lesson = Lesson.find(lesson_id)
lesson.students && student

An error would be raised saying words to the effect of, “Node type invalid. Expected , found ."

In my arrogance, I assumed that since the errors were not reported that often and I had never personally seen it, these guys must have had some conflict with another gem or messed up code somewhere. After all, if it was a real bug, we’d have seen it ourselves or our tests would have caught it, right? No, wrong — completely wrong. Finally, someone copied/pasted their code to me and I could see everything was done correctly. They also noticed that it only happened after updating a file and restarting their server would fix it. It began to sound like an issue with the Rails automatic reloader, so I set out to track it down.

Classes are Still Instances

The root, or maybe roots, of the problem turned out to be variables and constants hanging onto copies of classes, persisting across Rails reload!. There were two spots where this was happening, but to understand why it’s a problem, we need to talk about what a class is.

In Ruby, a class is an instance of Class. Because of that, it is possible that active Ruby process can have two independent instances of the same class, one a different version of the other. It’s something like this:

container = Container.new
container.enclosed_model = Student
# the Student model is changed, Rails `reload!` is called
# From this point out, our `Student` model can be thought of as Student v2.

# This is an instance of Student v2
student = Student.new

# This is false because our container's `enclosed_model` is Student v1
student.is_a?(container.enclosed_model)
# =>false

# And this hints at why
Student.object_id == container.enclosed_model.object_id

Comparing two objects’ object_id results is always the final answer on whether they are the same. By spitting out the object ids before the error was raised, I saw that, yes, Rails was loading a new instance of the class but somewhere in the bowels of Neo4j.rb was a reference to the old version.

After that, there were two questions remaining: where were references to old models hanging around and how could I keep things current?

The Culprits

The answer to the first question was pretty easy to track down but might be a bit too specific to Neo4j.rb to warrant inclusion in this blog post so I’ll just go over it briefly.

Exhibit A: Node Wrapping

When nodes are returned from the database, we have to figure out which model is responsible for it, and doing that requires us to find the model that has the same combination of labels as the node. It’s not a terribly expensive process but it can be wasteful since you only really need to perform that process once, when a combination of labels is first encountered, and should be able to reuse the results every time after. The results are stored in a hash, MODELS_FOR_LABELS_CACHE, which maps labels => model_constant. In a normal Ruby app, it’s not a big deal that this never gets GCed since you will only have so many possible entries; in Rails development mode, with models constantly being reloaded, that’s not always true, so it was possible for node’s to be instantiated using the wrong versions of models!

Exhibit B: Association models

The associations system and QueryProxy class are arguably the coolest features of Neo4j.rb. You’re able to define associations in models on the fly. Once defined, you can easily create and traverse relationships. It’s possible to build relatively complex queries that jump between models, executed lazily, as you do in ActiveRecord. What makes Neo4j.rb’s association chaining cool is that it will only perform one query, no matter how many models you jump across. So this:

student.lessons.teachers.children.lessons

…will find the lessons of children whose parents are teachers who teach classes taken by one student. In Cypher, it could look something like:

MATCH (s:Student)-[:ENROLLED_IN]->lesson)-[:TAUGHT_BY]->teacher)-[:PARENT_OF]->child)-[:ENROLLED_IN]->result) WHERE ID(s) = {student_id} RETURN result

In a model, the association definitions look like this:

has_many :out, :lessons, type: 'ENROLLED_IN'

That method returns an instance of Neo4j::ActiveNode::HasN::Association, which becomes bound to the model in its @associations hash. This association instance needs to know the model on the other side, so whether you let it infer that from the association name or use model_class to set it, it ends up with a @model_class instance variable that stores — you guessed it — the constant of the other class. When you try to create a relationship, it does a type check in Ruby. If the node given is not an instance of the model saved in @model_class, it raises an error. And there we have it: lesson.students && student will raise an error if student was not borne of the same version of Student held in its @model_class.

Whew! So… now what?

Learning to Let Go… of Models

The answer to the second question, how do we clean things up, was found just last night.

When I first diagnosed the problem, I was eager to get a workaround in so I patched the gem to not use is_a? to determine association compatibility, opting instead to use the node’s labels compared to the model’s labels. This solved the immediate problem but it wasn’t a real fix. Last night, someone commented to an issue on the devise-neo4j library that I had forgotten about, describing the same problem, and I realized that there’s a very good chance this caching was the root. He had done some research into Devise and posted a snippet that included a reference to ActiveSupport::Dependencies::ClassCache, so I looked that up and found a note about the before_remove_const method.

before_remove_const seems to be the solution. When implemented as a class method, it is called by Rails reloader at the start of a reload cycle. I was able to use it to wipe out the constants that were hanging onto models and trigger a refresh of @model_class in each association. You can see the PR here. I say it “seems” to be the solution because I’m still waiting on confirmation of the devise-neo4j issue’s resolution, but I’m reasonably confident. Even if it doesn’t, I think we’ve confirmed that there’s an old reference to a model hanging out somewhere, so we just have to figure out what we missed and queue it for update later on.

So there you have it! An interesting bug squashed and in the process, we saw more proof of Ruby’s “everything-is-an-object” ethos. We learned a bit more about ActiveSupport, some best practices when caching class names, and a crucial reminder to take bug reports seriously, even if they seem impossible to you.