Chapter 10Modules
A beginning programmer writes her programs like an ant builds her hill, one piece at a time, without thought for the bigger structure. Her programs will be like loose sand. They may stand for a while, but growing too big they fall apart.
Realizing this problem, the programmer will start to spend a lot of time thinking about structure. Her programs will be rigidly structured, like rock sculptures. They are solid, but when they must change, violence must be done to them.
The master programmer knows when to apply structure and when to leave things in their simple form. Her programs are like clay, solid yet malleable.
Every program has a shape. On a small scale, this shape is determined by its division into functions and the blocks inside those functions. Programmers have a lot of freedom in the way they structure their programs. Shape follows more from the taste of the programmer than from the program’s intended functionality.
When looking at a larger program in its entirety, individual functions start to blend into the background. Such a program can be made more readable if we have a larger unit of organization.
Modules divide programs into clusters of code that, by some criterion, belong together. This chapter explores some of the benefits that such division provides and shows techniques for building modules in JavaScript.
Why modules help
There are a number of reasons why authors divide their books into chapters and sections. These divisions make it easier for a reader to see how the book is built up and to find specific parts that they are interested in. They also help the author by providing a clear focus for every section.
The benefits of organizing a program into several files or modules are similar. Structure helps people who aren’t yet familiar with the code find what they are looking for and makes it easier for the programmer to keep things that are related close together.
Some programs are even organized along the model of a traditional text, with a well-defined order in which the reader is encouraged to go through the program and with lots of prose (comments) providing a coherent description of the code. This makes reading the program a lot less intimidating—reading unknown code is usually intimidating—but has the downside of being more work to set up. It also makes the program more difficult to change because prose tends to be more tightly interconnected than code. This style is called literate programming. The “project” chapters of this book can be considered literate programs.
As a general rule, structuring things costs energy. In the early stages of a project, when you are not quite sure yet what goes where or what kind of modules the program needs at all, I endorse a minimalist, structureless attitude. Just put everything wherever it is convenient to put it until the code stabilizes. That way, you won’t be wasting time moving pieces of the program back and forth, and you won’t accidentally lock yourself into a structure that does not actually fit your program.
Namespacing
Most modern programming languages have a scope level between global (everyone can see it) and local (only this function can see it). JavaScript does not. Thus, by default, everything that needs to be visible outside of the scope of a top-level function is visible everywhere.
Namespace pollution, the problem of a lot of
unrelated code having to share a single set of global variable names,
was mentioned in Chapter 4,
where the Math
object was given as an example of an object that acts
like a module by grouping math-related functionality.
Though JavaScript provides no actual module construct yet, objects can be used to create publicly accessible subnamespaces, and functions can be used to create an isolated, private namespace inside of a module. Later in this chapter, I will discuss a way to build reasonably convenient, namespace-isolating modules on top of the primitive concepts that JavaScript gives us.
Reuse
In a “flat” project, which isn’t structured as a set of modules, it is not apparent which parts of the code are needed to use a particular function. In my program for spying on my enemies (see Chapter 9), I wrote a function for reading configuration files. If I want to use that function in another project, I must go and copy out the parts of the old program that look like they are relevant to the functionality that I need and paste them into my new program. Then, if I find a mistake in that code, I’ll fix it only in whichever program that I’m working with at the time and forget to also fix it in the other program.
Once you have lots of such shared, duplicated pieces of code, you will find yourself wasting a lot of time and energy on moving them around and keeping them up-to-date.
Putting pieces of functionality that stand on their own into separate files and modules makes them easier to track, update, and share because all the various pieces of code that want to use the module load it from the same actual file.
This idea gets even more powerful when the relations between modules—which other modules each module depends on—are explicitly stated. You can then automate the process of installing and upgrading external modules (libraries).
Taking this idea even further, imagine an online service that tracks and distributes hundreds of thousands of such libraries, allowing you to search for the functionality you need and, once you find it, set up your project to automatically download it.
This service exists. It is called NPM (npmjs.org). NPM consists of an online database of modules and a tool for downloading and upgrading the modules your program depends on. It grew out of Node.js, the browserless JavaScript environment we will discuss in Chapter 20, but can also be useful when programming for the browser.
Decoupling
Another important role of modules is isolating pieces of code from each other, in the same way that the object interfaces from Chapter 6 do. A well-designed module will provide an interface for external code to use. As the module gets updated with bug fixes and new functionality, the existing interface stays the same (it is stable) so that other modules can use the new, improved version without any changes to themselves.
Note that a stable interface does not mean no new functions, methods, or variables are added. It just means that existing functionality isn’t removed and its meaning is not changed.
A good module interface should allow the module to grow without breaking the old interface. This means exposing as few of the module’s internal concepts as possible while also making the “language” that the interface exposes powerful and flexible enough to be applicable in a wide range of situations.
For interfaces that expose a single, focused concept, such as a configuration file reader, this design comes naturally. For others, such as a text editor, which has many different aspects that external code might need to access (content, styling, user actions, and so on), it requires careful design.
Using functions as namespaces
Functions are the only things in JavaScript that create a new scope. So if we want our modules to have their own scope, we will have to base them on functions.
Consider this
trivial module for associating names with day-of-the-week numbers, as
returned by a Date
object’s getDay
method:
var names = ["Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"]; function dayName(number) { return names[number]; } console.log(dayName(1)); // → Monday
The dayName
function is part
of the module’s interface, but the names
variable is not. We
would prefer not to spill it into the global scope.
var dayName = function() { var names = ["Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"]; return function(number) { return names[number]; }; }(); console.log(dayName(3)); // → Wednesday
Now names
is a local variable in an
(unnamed) function. This function is created and immediately called,
and its return value (the actual dayName
function) is stored in a
variable. We could have pages and pages of code in this function, with
100 local variables, and they would all be internal to our
module—visible to the module itself but not to outside code.
We can use a similar pattern to isolate code from the outside world entirely. The following module logs a value to the console but does not actually provide any values for other modules to use:
(function() { function square(x) { return x * x; } var hundred = 100; console.log(square(hundred)); })(); // → 10000
This code simply outputs the square of 100, but in the real world it could be a module that adds a method to some prototype or sets up a widget on a web page. It is wrapped in a function to prevent the variables it uses internally from polluting the global scope.
Why did we wrap the namespace
function in a pair of parentheses? This has to do with a quirk in
JavaScript’s syntax. If an expression starts with the
keyword function
, it is a function expression. However, if a
statement starts with function
, it is a function
declaration, which requires a name and, not being an expression,
cannot be called by writing parentheses after it. You can think of the
extra wrapping parentheses as a trick to force the function to be
interpreted as an expression.
Objects as interfaces
Now imagine that we want to add another function to our day-of-the-week module, one that goes from a day name to a number. We can’t simply return the function anymore but must wrap the two functions in an object.
var weekDay = function() { var names = ["Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"]; return { name: function(number) { return names[number]; }, number: function(name) { return names.indexOf(name); } }; }(); console.log(weekDay.name(weekDay.number("Sunday"))); // → Sunday
For bigger modules,
gathering all the exported values into an object at the end of the
function becomes awkward since many of the exported functions are
likely to be big and you’d prefer to write them somewhere else, near
related internal code. A convenient alternative is to declare an
object (conventionally named exports
) and add properties to that
whenever we are defining something that needs to be exported. In the
following example, the module function takes its interface object as
an argument, allowing code outside of the function to create it and store
it in a variable. (Outside of a function, this
refers to the global
scope object.)
(function(exports) { var names = ["Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"]; exports.name = function(number) { return names[number]; }; exports.number = function(name) { return names.indexOf(name); }; })(this.weekDay = {}); console.log(weekDay.name(weekDay.number("Saturday"))); // → Saturday
Detaching from the global scope
The previous pattern is commonly used by JavaScript modules intended for the browser. The module will claim a single global variable and wrap its code in a function in order to have its own private namespace. But this pattern still causes problems if multiple modules happen to claim the same name or if you want to load two versions of a module alongside each other.
With a little plumbing, we
can create a system that allows one module to directly ask for the
interface object of another module, without going through the
global scope. Our goal is a require
function that, when given a
module name, will load that module’s file (from disk or the Web,
depending on the platform we are running on) and return the
appropriate interface value.
This approach solves the problems mentioned previously and has the added benefit of making your program’s dependencies explicit, making it harder to accidentally make use of some module without stating that you need it.
For require
we need two
things. First, we want a function readFile
, which returns the
content of a given file as a string. (A single such function is not
present in standard JavaScript, but different JavaScript
environments, such as the browser and Node.js, provide their own ways
of accessing files. For now, let’s just pretend we have this
function.) Second, we need to be able to actually execute this
string as JavaScript code.
Evaluating data as code
There are several ways to take data (a string of code) and run it as part of the current program.
The most obvious way is the special operator
eval
, which will execute a string of code in the current scope.
This is usually a bad idea because it breaks some of the sane
properties that scopes normally have, such as being isolated from the
outside world.
function evalAndReturnX(code) { eval(code); return x; } console.log(evalAndReturnX("var x = 2")); // → 2
A better way of interpreting data as code is
to use the Function
constructor. This takes two arguments: a string
containing a comma-separated list of argument names and a string
containing the function’s body.
var plusOne = new Function("n", "return n + 1;"); console.log(plusOne(4)); // → 5
This is precisely what we need for our modules. We can wrap a module’s code in a function, with that function’s scope becoming our module scope.
Require
The following is a minimal
implementation of require
:
function require(name) { var code = new Function("exports", readFile(name)); var exports = {}; code(exports); return exports; } console.log(require("weekDay").name(1)); // → Monday
Since the new Function
constructor wraps the module
code in a function, we don’t have to write a wrapping namespace
function in the module file itself. And since we make exports
an
argument to the module function, the module does not have to declare
it. This removes a lot of clutter from our example module.
var names = ["Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"]; exports.name = function(number) { return names[number]; }; exports.number = function(name) { return names.indexOf(name); };
When using this pattern, a module typically starts with a few variable declarations that load the modules it depends on.
var weekDay = require("weekDay"); var today = require("today"); console.log(weekDay.name(today.dayNumber()));
The simplistic implementation of require
given previously
has several problems. For one, it will load and run a module every
time it is require
d, so if several modules have the same
dependency or a require
call is put inside a function that will
be called multiple times, time and energy will be wasted.
This can be solved by storing the modules that have already been loaded in an object and simply returning the existing value when one is loaded multiple times.
The second problem is that it is
not possible for a module to directly export a value other than the
exports
object, such as a function. For example, a module might want
to export only the constructor of the object type it defines. Right
now, it cannot do that because require
always uses the exports
object it creates as the exported value.
The traditional solution for this is to provide
modules with another variable, module
, which is an object that has a
property exports
. This property initially points at the empty object
created by require
but can be overwritten with another value in
order to export something else.
function require(name) { if (name in require.cache) return require.cache[name]; var code = new Function("exports, module", readFile(name)); var exports = {}, module = {exports: exports}; code(exports, module); require.cache[name] = module.exports; return module.exports; } require.cache = Object.create(null);
We now have a module system that uses a single
global variable (require
) to allow modules to find and use each
other without going through the global scope.
This style of module system is called CommonJS modules, after the pseudo-standard that first specified it. It is built into the Node.js system. Real implementations do a lot more than the example I showed. Most importantly, they have a much more intelligent way of going from a module name to an actual piece of code, allowing both pathnames relative to the current file and module names that point directly to locally installed modules.
Slow-loading modules
Though it is possible to use the CommonJS module style when
writing JavaScript for the browser, it is somewhat involved. The
reason for this is that reading a file (module) from the Web is a lot
slower than reading it from the hard disk. While a script is running
in the browser, nothing else can happen to the website on which it
runs, for reasons that will become clear in
Chapter 14. This means that if every
require
call went and fetched something from some faraway web
server, the page would freeze for a painfully long time while loading
its scripts.
One way to
work around this problem is to run a program like
Browserify on your code before you serve it
on a web page. This will look for calls to require
, resolve all
dependencies, and gather the needed code into a single big file.
The website itself can simply load this file to get all the modules
it needs.
Another solution is to wrap the code that makes up your module in a function so that the module loader can first load its dependencies in the background and then call the function, initializing the module, when the dependencies have been loaded. That is what the Asynchronous Module Definition (AMD) module system does.
Our trivial program with dependencies would look like this in AMD:
define(["weekDay", "today"], function(weekDay, today) { console.log(weekDay.name(today.dayNumber())); });
The define
function is central to this approach. It takes first an array of
module names and then a function that takes one argument for each
dependency. It will load the dependencies (if they haven’t already
been loaded) in the background, allowing the page to continue working
while the files are being fetched. Once all dependencies are loaded,
define
will call the function it was given, with the interfaces
of those dependencies as arguments.
The modules that are loaded
this way must themselves contain a call to define
. The value used as
their interface is whatever was returned by the function passed to
define
. Here is the weekDay
module again:
define([], function() { var names = ["Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"]; return { name: function(number) { return names[number]; }, number: function(name) { return names.indexOf(name); } }; });
To be
able to show a minimal implementation of define
, we will pretend we
have a backgroundReadFile
function that takes a filename and a
function and calls the function with the content of the file as
soon as it has finished loading it. (Chapter
17 will explain how to write that function.)
For the purpose of keeping track of modules while they are being
loaded, the implementation of define
will use objects that describe
the state of modules, telling us whether they are available yet and
providing their interface when they are.
The getModule
function, when given a name, will return such an
object and ensure that the module is scheduled to be loaded. It uses
a cache object to avoid loading the same module twice.
var defineCache = Object.create(null); var currentMod = null; function getModule(name) { if (name in defineCache) return defineCache[name]; var module = {exports: null, loaded: false, onLoad: []}; defineCache[name] = module; backgroundReadFile(name, function(code) { currentMod = module; new Function("", code)(); }); return module; }
We assume the loaded file also contains a
(single) call to define
. The currentMod
variable is used to tell
this call about the module object that is currently being loaded so
that it can update this object when it finishes loading. We will come
back to this mechanism in a moment.
The define
function itself uses
getModule
to fetch or create the module objects for the current
module’s dependencies. Its task is to schedule the moduleFunction
(the function that contains the module’s actual code) to be run
whenever those dependencies are loaded. For this purpose, it defines a
function whenDepsLoaded
that is added to the onLoad
array of all
dependencies that are not yet loaded. This function immediately
returns if there are still unloaded dependencies, so it will do
actual work only once, when the last dependency has finished loading. It is
also called immediately, from define
itself, in case there are no
dependencies that need to be loaded.
function define(depNames, moduleFunction) { var myMod = currentMod; var deps = depNames.map(getModule); deps.forEach(function(mod) { if (!mod.loaded) mod.onLoad.push(whenDepsLoaded); }); function whenDepsLoaded() { if (!deps.every(function(m) { return m.loaded; })) return; var args = deps.map(function(m) { return m.exports; }); var exports = moduleFunction.apply(null, args); if (myMod) { myMod.exports = exports; myMod.loaded = true; myMod.onLoad.forEach(function(f) { f(); }); } } whenDepsLoaded(); }
When all dependencies are available,
whenDepsLoaded
calls the function that holds the module, giving it
the dependencies’ interfaces as arguments.
The first thing define
does is store the value that currentMod
had
when it was called in a variable myMod
. Remember that getModule
,
just before evaluating the code for a module, stored the corresponding
module object in currentMod
. This allows whenDepsLoaded
to store
the return value of the module function in that module’s exports
property, set the module’s loaded
property to true, and call all the
functions that are waiting for the module to load.
This code is a lot harder to follow than
the require
function. Its execution does not follow a simple,
predictable path. Instead, multiple operations are set up to happen at
some unspecified time in the future, which obscures the way the
code executes.
A real AMD implementation is, again, quite a lot more clever about resolving module names to actual URLs and generally more robust than the one shown previously. The RequireJS (requirejs.org) project provides a popular implementation of this style of module loader.
Interface design
Designing interfaces for modules and object types is one of the subtler aspects of programming. Any nontrivial piece of functionality can be modeled in various ways. Finding a way that works well requires insight and foresight.
The best way to learn the value of good interface design is to use lots of interfaces—some good, some bad. Experience will teach you what works and what doesn’t. Never assume that a painful interface is “just the way it is”. Fix it, or wrap it in a new interface that works better for you.
Predictability
If programmers can predict the way your interface works, they (or you) won’t get sidetracked as often by the need to look up how to use it. Thus, try to follow conventions. When there is another module or part of the standard JavaScript environment that does something similar to what you are implementing, it might be a good idea to make your interface resemble the existing interface. That way, it’ll feel familiar to people who know the existing interface.
Another area where predictability is important is the actual behavior of your code. It can be tempting to make an unnecessarily clever interface with the justification that it’s more convenient to use. For example, you could accept all kinds of different types and combinations of arguments and do the “right thing” for all of them. Or you could provide dozens of specialized convenience functions that provide slightly different flavors of your module’s functionality. These might make code that builds on your interface slightly shorter, but they will also make it much harder for people to build a clear mental model of the module’s behavior.
Composability
In your interfaces, try to use the simplest data structures possible and make functions do a single, clear thing. Whenever practical, make them pure functions (see Chapter 3).
For example, it is not uncommon for modules to
provide their own array-like collection objects, with their own
interface for counting and extracting elements. Such objects won’t
have map
or forEach
methods, and any existing function that
expects a real array won’t be able to work with them. This is an
example of poor composability—the module cannot be easily composed
with other code.
One example would be a module for spell-checking text, which we might need when we want to write a text editor. The spell-checker could be made to operate directly on whichever complicated data structures the editor uses and directly call internal functions in the editor to have the user choose between spelling suggestions. If we go that way, the module cannot be used with any other programs. On the other hand, if we define the spell-checking interface so that you can pass it a simple string and it will return the position in the string where it found a possible misspelling, along with an array of suggested corrections, then we have an interface that could also be composed with other systems because strings and arrays are always available in JavaScript.
Layered interfaces
When designing an interface for a complex piece of functionality—sending email, for example—you often run into a dilemma. On the one hand, you do not want to overload the user of your interface with details. They shouldn’t have to study your interface for 20 minutes before they can send an email. On the other hand, you do not want to hide all the details either—when people need to do complicated things with your module, they should be able to.
Often the solution is to provide two interfaces: a detailed low-level one for complex situations and a simple high-level one for routine use. The second can usually be built easily using the tools provided by the first. In the email module, the high-level interface could just be a function that takes a message, a sender address, and a receiver address and then sends the email. The low-level interface would allow full control over email headers, attachments, HTML mail, and so on.
Summary
Modules provide structure to bigger programs by separating the code into different files and namespaces. Giving these modules well-defined interfaces makes them easier to use and reuse and makes it possible to continue using them as the module itself evolves.
Though the JavaScript language is characteristically unhelpful when it comes to modules, the flexible functions and objects it provides make it possible to define rather nice module systems. Function scopes can be used as internal namespaces for the module, and objects can be used to store sets of exported values.
There are two popular, well-defined approaches to such modules. One is
called CommonJS Modules and revolves around a require
function
that fetches a module by name and returns its interface. The other is
called AMD and uses a define
function that takes an array of
module names and a function and, after loading the modules, runs the
function with their interfaces as arguments.
Exercises
Month names
Write a
simple module similar to the weekDay
module that can convert month
numbers (zero-based, as in the Date
type) to names and can convert names back
to numbers. Give it its own namespace since it will need an internal
array of month names, and use plain JavaScript, without any module
loader system.
// Your code here. console.log(month.name(2)); // → March console.log(month.number("November")); // → 10
This follows the weekDay
module almost
exactly. A function expression, called immediately, wraps the variable
that holds the array of names, along with the two functions that must
be exported. The functions are put in an object and returned. The
returned interface object is stored in the month
variable.
A return to electronic life
Hoping that Chapter 7 is still somewhat fresh in your mind, think back to the system designed in that chapter and come up with a way to separate the code into modules. To refresh your memory, these are the functions and types defined in that chapter, in order of appearance:
Vector Grid directions directionNames randomElement BouncingCritter elementFromChar World charFromElement Wall View WallFollower dirPlus LifelikeWorld Plant PlantEater SmartPlantEater Tiger
Don’t exaggerate and create too many modules. A book that starts a new chapter for every page would probably get on your nerves, if only because of all the space wasted on titles. Similarly, having to open 10 files to read a tiny project isn’t helpful. Aim for three to five modules.
You can choose to have some functions become internal to their module and thus inaccessible to other modules.
There is no single correct solution here. Module organization is largely a matter of taste.
Here is what I came up with. I’ve put parentheses around internal functions.
Module "grid" Vector Grid directions directionNames Module "world" (randomElement) (elementFromChar) (charFromElement) View World LifelikeWorld directions [reexported] Module "simple_ecosystem" (randomElement) [duplicated] (dirPlus) Wall BouncingCritter WallFollower Module "ecosystem" Wall [duplicated] Plant PlantEater SmartPlantEater Tiger
I have reexported the directions
array from the
grid
module from world
so that modules built on that (the
ecosystems) don’t have to know or worry about the existence of the
grid
module.
I also duplicated two generic and tiny helper values
(randomElement
and Wall
) since they are used as internal details
in different contexts and do not belong in the interfaces for these
modules.
Circular dependencies
A
tricky subject in dependency management is circular dependencies,
where module A depends on B, and B also depends on A. Many module
systems simply forbid this. CommonJS modules allow a limited form:
it works as long as the modules do not replace their default exports
object with another value and start accessing each other’s
interface only after they finish loading.
Can you think of a way in which support for this feature could be
implemented? Look back to the definition of require
and consider
what the function would have to do to allow this.
The trick
is to add the exports
object created for a module to require
's
cache before actually running the module. This means the module
will not yet have had a chance to override module.exports
, so we do
not know whether it may want to export some other value. After
loading, the cache object is overridden with module.exports
, which
may be a different value.
But if in the course of loading the module, a second module is loaded
that asks for the first module, its default exports
object, which is likely
still empty at this point, will be in the cache, and the second module
will receive a reference to it. If it doesn’t try to do anything with
the object until the first module has finished loading, things will
work.