Ruby's Lookups & Scopes
06 Mar 2017This is the third and last article in the Ruby's Dark Corners series.
Lookups in Ruby
The great difficulty in Ruby is knowing what names refer to. Linearization showed us that knowing which method gets called is non-trivial. Unfortunately it's only part of the story.
In Ruby, names can refer to:
- methods:
x.name
orname
- local variables:
name
- class variables:
@@name
- global variables:
$name
- constants:
NAME
- modules:
Name
We also would like to know on which module methods are defined. By the way,
since Class < Module
, when we say modules we mean modules and classes,
unless otherwise specified.
There are five big concepts involved with these lookups (and then some fun sprinkled on top):
- self: what does
self
refer to? - definee: if a method definition appears, on which class is it defined?
- scoping: knowing which constructs create new scope for local variables and which don't
- nesting: what are the surrounding modules? (often called cref)
Let's look at how each type of name is resolved.
Local & Global Variables
Global variables are prefixed with $
and they are easy: they are accessible
everywhere.
Local variables depend on static scopes. New scopes are create when you enter a new method or module body, or when you enter a method call's block. Control flow statements do not create scopes!
Only scopes created by blocks inherit their parent's scope. Scope created by
methods and modules are completely isolated from their parent's scope. It is
sometimes said that module
, class
and def
are scope gates.
x = 1
module Foo
p x # NameError
end
There is however a trick you can use:
x = 1
Foo = Module.new do
p x # ok
end
It is possible to make local variables survive their lexical scope:
def foo
x = 0
[lambda { x += 1 }, lambda { p x }]
end
x, y = foo
x.call()
y.call() # 1
Something else that is peculiar: a variable is "in scope" in an assignment to it has appeared before in the current scope.
p x # NameError
x = 1
# but
x = 1 if false
p x # nil
Reflection can help with local variables too: the binding
method (both
Kernel#binding
and Proc#binding
) returns a Binding
object which describes
the local variables in scope.
Instance Variables
Instance variables are prefixed by @
. They are always looked up on self
.
If an instance variable doesn't exists (it has never been assigned), it
evaluates to nil
.
Note that this means that instance variables are effectively instance-private:
an instance of X
cannot access the instance variables of another instance of
X
. This is more restrictive than private
in Java for instance. However, it's
easy to use reflection to side-step this with
Object#instance_variable_get/set
.
Class Variables
Class variables are prefixed with @@
. They are defined on modules, and can be
accessed both in module bodies and in method bodies. Strangely, they can also be
accessed in the meta-class of a module (and its meta-meta-class, etc).
If a module inherits multiple version of a class variable, it's always the first
inherited version that wins (so the first <
, include
or preprend
).
You can list the class variables "owned" by a module with
Module#class_variables(false)
(false
says not to include class inherited
class variables). Interestingly, class variables can "migrate": if the module is
a class that has a class variable @@a
, and that @@a
becomes defined on one
of its ancestor, the class variable will disappear from the class. This will not
happen for non-class modules however.
class T; @@a = 't'; end
class Object; @@a = 'o'; end
class T
p @@a # o
p class_variables(false) # []
end
If a class variable is accessed before it has been assigned, a NameError
occurs.
Final catch, class variables can only be accessed inside the body of a class
that inherits/includes the class defining the variables. This means you can't
use the @@x
notation with the eval
method that we will see later. However,
you can use reflection: Module#class_variable_get/set
.
Constants (and Modules)
Constants start with an uppercase letter, and modules are actually a kind of constant.
Constants depend on the notion of nesting
: you can access a constant if it is
declared inside a module body that is around you, or in one of the ancestors of
the current module (but not in an ancestor of a surrounding module!). You
can also use ::
to navigate through modules:
X = 0
module A
X = 1
p ::X # 0
module B; module C; Y = 2; end; end
module D
p X # 1
p B::C::Y
end
end
The following exemple illustrates two gotchas:
- You don't have access to constants defined in an ancestor of a surrounding module.
- Constants in surrounding modules take precedence over constants in ancestors.
module A; X = 'a'; end
module B; Y = 'b'; end
module C
include A
p X # ok
Y = 'c'
module D
include B
p Y # 'c'!
p X # NameError
end
end
The module body really has to be around you:
module A::B::C; p Y; end # ok (Y defined in C)
module A::D; p X; end # NameError (X defined in A)
We can access the current nesting with Module::nesting
and clarify the
example:
module A
module D; p Module.nesting; end # [A, A::D]
end
module A::D; p Module.nesting; end # [A::D]
Of course, if you try to access a constant before it has been assigned, a
NameError
ensues. Module::constants
will list the constants defined at the
point of call, while Module#constants
will list the constants defined by the
module (and optionally by its ancestors, depending on the argument).
Methods
Methods are looked up on the receiver (the thing before the dot). If there is no
receiver, self
is assumed. Then it's just a matter to perform the lookup
according to linearization. self
may also have a meta-class which has
priority on all other ancestors.
Refinements
However, there is another subtlety, called refinements. In Ruby, it is common to use monkey patching to open a class and add and redefine methods. However, if every library starts doing this, you can end up with nasty conflicts. Refinements were introduced as a solution to this problem.
Refinements can only be defined in non-class modules, and only classes can be refined:
class C
def foo; 'foo'; end
end
module M
refine C do
def foo; 'bar'; end
end
end
Defining a refinement does nothing by itself. You have to use the Module#using
function to enable some refinements in the current module:
module N
p C.new.foo # 'foo'
using M
p C.new.foo # 'bar'
end
p C.new.foo # 'foo'
using
statements are somewhat peculiar, because they are strictly lexical in
scope. Said otherwise, the introduced refinements have the same visibility as
constants, excepted that the using
is not inherited (via sublassing or
include):
class A; using M; end
class B < A
p C.new.foo # 'foo'
end
However, refinements themselves are inherited:
module O
include M
end
module P
using O
p C.new.foo # 'bar'
end
How prioritary are refinements? Refinements always take priority over everything
else. What if multiple refinements conflicts? The latest innermost using
statement
always wins.
You can get a list of modules whose refinements are visible with
Module::used_modules
.
On which module are methods defined?
Last, but not least, when you use the def
keyword, on which module are the
methods defined? The not-so-helpful answer is that these methods become instance
methods of the definee. Entering certain declarations and calling certain
functions change the definee, as we'll see in the next section.
How self
and the definee change
where | self | definee |
---|---|---|
top-level (file or REPL) | main (an instance of Object ) |
Object |
in a module | module | module |
in class << X |
metaclass of X |
metaclass of X |
in a def method |
receiver | surrounding module |
in a def X.method |
receiver (X ) |
metaclass of X |
X.instance_eval |
receiver (X ) |
metaclass of X |
X.class_eval |
receiver (X , a module) |
receiver (X ) |
Also note that Module#module_eval
is an alias for Module#class_eval
. There
are also variants called class/module/instance_exec
, which do the same thing
as the eval
version but allow passing additional arguments to the block
(useful to make instance variables accessible to the block). This is the
only difference between the exec
and eval
family, despite the
documentation seeming to hint at some bizarre lookup behaviour.
Note that for the two def
rows, the definee is both the module on which the
method itself is being defined, and the definee for nested method definitions.
At first, it may seem that class/module_eval
does the same as a module
or
class
block. However, you cannot open a class inside a method, so class_eval
helps in you need to evaluate something in the context of a class as part of a
method. Note however that it is possible to open metaclasses in a method through
class << X
.
There are also string-based eval
based. But things are complex enough as it
is, if you can avoid to go there, please do.
Finally, if you experiment, you may find out you cannot define singleton method
on the classes Integer
and Symbol
(e.g. def Integer.foo
). I assume for
performance reasons.
Doing Better?
The situation is a bit hairy, to say the least. How could we simplify things?
Once you wrap your head around them, globals, class and instance variables are
rather simple. One caveat: I would allow accessing class variables whenever
self
is the class, removing an exception with little benefit.
Regarding local variables, it's not quite obvious why scope gates are needed. Maybe because they would encourage people to use class-local variables instead of class variables? It doesn't really hurt, but adds yet another thing to remember to the language.
Constants lookup is puzzling. My take is that constants should be looked up first in the ancestors then in the surrounding module and its ancestors, etc. That would be much less surprising. I don't see any reason to do things in the current way (surrounding modules take precedence over the ancestors).
I think a useful heuristic is to preserve the ability to move code vertically. Code that is well-defined in a class should have the same meaning in a subclass or in a superclass, unless some symbols are overridden in the class. Surrounding modules should not affect such code.
Refinements blatantly violate the heuristic above. I can appreciate why: the
idea was to to restrict monkey patching to a well-defined blob of code, and
inheriting using
statements goes against that by forcing monkey patching on
sub-classes. On the other hand, I think it goes against the principle of least
surprise by violating my heuristic above.
Another reason refinements are awkward is that they introduce not one, but two
new forms of lookup: using
statements and refinement definitions (refine
).
I believe it was a mistake to overload modules with another role as "refinement
container". It would probably have been better to merge using
into refine
:
make all refinements active as soon as defined, and combine them through
inheritance (using the ancestor chain).
If it was really desirable to refine a method only within a superclass but not
within its subclasses, the private
function could have been reused. This would
have made refinement lookups similar to method lookup, and would have surprised
no-one.
There is a big problem with this idea: dynamically looking up refinement is expensive: you now need to walk two ancestor chains instead of one for every lookup. A simple countermeasure: forbid adding new refinements dynamically, which is already the case in the current version.
Finally, I would allow opening classes and modules in methods, and allow any
expression in place of the class name. This would remove the need for
class_eval
method. If needed one can add the restriction that only existing
classes and modules can be opened in methods.
I would also remove instance_eval
, however it has no immediate analog. I would
retool the class << X
notation for this purpose. The metaclass can still be
accessed through Object#singleton_class
. This field can be used in conjunction
with class
so that class X.singleton_class
achieves the previous effect of
class << X
.
So in summary:
- Allow class variables to be used anywhere where
self
is the class. - Abolish scope gates?
- Lookup constants first in the ancestors, then in the surround class and their ancestors.
- Merge
using
intorefine
, and make this new kind of refinement inheritable, except when qualified withprivate
. Forbid dynamically adding refinements to avoid dynamic refinement lookup. - Remove the
instance_eval
andclass_eval
methods; allow opening classes in methods; and makeclass << X
take the meaning ofX.instance_eval
. - Add a
metaclass
field to classObject
.
Of course, it's too late for Ruby. But for a language inspired by Ruby, these points are worth considering.