Chapter 3: Functions
¶ A program often needs to do the same thing in different places.
Repeating all the necessary statements every time is tedious and
error-prone. It would be better to put them in one place, and have the
program take a detour through there whenever necessary. This is what
functions were invented for: They are canned code that a program can
go through whenever it wants. Putting a string on the screen requires
quite a few statements, but when we have a print
function we can
just write print("Aleph")
and be done with it.
¶ To view functions merely as canned chunks of code doesn't do them justice though. When needed, they can play the role of pure functions, algorithms, indirections, abstractions, decisions, modules, continuations, data structures, and more. Being able to effectively use functions is a necessary skill for any kind of serious programming. This chapter provides an introduction into the subject, chapter 6 discusses the subtleties of functions in more depth.
¶ Pure functions, for a start, are the things that were called functions in the mathematics classes that I hope you have been subjected to at some point in your life. Taking the cosine or the absolute value of a number is a pure function of one argument. Addition is a pure function of two arguments.
¶ The defining properties of pure functions are that they always return the same value when given the same arguments, and never have side effects. They take some arguments, return a value based on these arguments, and do not monkey around with anything else.
¶ In JavaScript, addition is an operator, but it could be wrapped in a function like this (and as pointless as this looks, we will come across situations where it is actually useful):
function add(a, b) { return a + b; } show(add(2, 2));
¶ add
is the name of the function. a
and b
are the names of the
two arguments. return a + b;
is the body of the function.
¶ The keyword function
is always used when creating a new function.
When it is followed by a variable name, the resulting function will be
stored under this name. After the name comes a list of argument
names, and then finally the body of the function. Unlike those
around the body of while
loops or if
statements, the braces around
a function body are obligatory1.
¶ The keyword return
, followed by an expression, is used to
determine the value the function returns. When control comes across a
return
statement, it immediately jumps out of the current function
and gives the returned value to the code that called the function. A
return
statement without an expression after it will cause the
function to return undefined
.
¶ A body can, of course, have more than one statement in it. Here is a function for computing powers (with positive, integer exponents):
function power(base, exponent) { var result = 1; for (var count = 0; count < exponent; count++) result *= base; return result; } show(power(2, 10));
¶ If you solved exercise 2.2, this technique for computing a power should look familiar.
¶ Creating a variable (result
) and updating it are side effects.
Didn't I just say pure functions had no side effects?
¶ A variable created inside a function exists only inside the function.
This is fortunate, or a programmer would have to come up with a
different name for every variable he needs throughout a program.
Because result
only exists inside power
, the changes to it only
last until the function returns, and from the perspective of code that
calls it there are no side effects.
¶ Write a function called absolute
, which returns the absolute value
of the number it is given as its argument. The absolute value of a
negative number is the positive version of that same number, and the
absolute value of a positive number (or zero) is that number itself.
function absolute(number) { if (number < 0) return -number; else return number; } show(absolute(-144));
¶ Pure functions have two very nice properties. They are easy to think about, and they are easy to re-use.
¶ If a function is pure, a call to it can be seen as a thing in itself. When you are not sure that it is working correctly, you can test it by calling it directly from the console, which is simple because it does not depend on any context2. It is easy to make these tests automatic ― to write a program that tests a specific function. Non-pure functions might return different values based on all kinds of factors, and have side effects that might be hard to test and think about.
¶ Because pure functions are self-sufficient, they are likely to be
useful and relevant in a wider range of situations than non-pure ones.
Take show
, for example. This function's usefulness depends on the
presence of a special place on the screen for printing output. If that
place is not there, the function is useless. We can imagine a related
function, let's call it format
, that takes a value as an argument
and returns a string that represents this value. This function is
useful in more situations than show
.
¶ Of course, format
does not solve the same problem as show
, and no
pure function is going to be able to solve that problem, because it
requires a side effect. In many cases, non-pure functions are
precisely what you need. In other cases, a problem can be solved with
a pure function but the non-pure variant is much more convenient or
efficient.
¶ Thus, when something can easily be expressed as a pure function, write it that way. But never feel dirty for writing non-pure functions.
¶ Functions with side effects do not have to contain a return
statement. If no return
statement is encountered, the function
returns undefined
.
function yell(message) { alert(message + "!!"); } yell("Yow");
¶ The names of the arguments of a function are available as variables inside it. They will refer to the values of the arguments the function is being called with, and like normal variables created inside a function, they do not exist outside it. Aside from the top-level environment, there are smaller, local environments created by function calls. When looking up a variable inside a function, the local environment is checked first, and only if the variable does not exist there is it looked up in the top-level environment. This makes it possible for variables inside a function to 'shadow' top-level variables that have the same name.
function alertIsPrint(value) { var alert = print; alert(value); } alertIsPrint("Troglodites");
¶ The variables in this local environment are only visible to the code inside the function. If this function calls another function, the newly called function does not see the variables inside the first function:
var variable = "top-level"; function printVariable() { print("inside printVariable, the variable holds '" + variable + "'."); } function test() { var variable = "local"; print("inside test, the variable holds '" + variable + "'."); printVariable(); } test();
¶ However, and this is a subtle but extremely useful phenomenon, when a function is defined inside another function, its local environment will be based on the local environment that surrounds it instead of the top-level environment.
var variable = "top-level"; function parentFunction() { var variable = "local"; function childFunction() { print(variable); } childFunction(); } parentFunction();
¶ What this comes down to is that which variables are visible inside a function is determined by the place of that function in the program text. All variables that were defined 'above' a function's definition are visible, which means both those in function bodies that enclose it, and those at the top-level of the program. This approach to variable visibility is called lexical scoping.
¶ People who have experience with other programming languages might expect that a block of code (between braces) also produces a new local environment. Not in JavaScript. Functions are the only things that create a new scope. You are allowed to use free-standing blocks like this...
var something = 1; { var something = 2; print("Inside: " + something); } print("Outside: " + something);
¶ ... but the something
inside the block refers to the same variable
as the one outside the block. In fact, although blocks like this are
allowed, they are utterly pointless. Most people agree that this is a
bit of a design blunder by the designers of JavaScript, and ECMAScript
Harmony will add some way to define variables that stay inside blocks
(the let
keyword).
¶ Here is a case that might surprise you:
var variable = "top-level"; function parentFunction() { var variable = "local"; function childFunction() { print(variable); } return childFunction; } var child = parentFunction(); child();
¶ parentFunction
returns its internal function, and the code at the
bottom calls this function. Even though parentFunction
has finished
executing at this point, the local environment where variable
has
the value "local"
still exists, and childFunction
still uses it.
This phenomenon is called closure.
¶ Apart from making it very easy to quickly see in which part of a program a variable will be available by looking at the shape of the program text, lexical scoping also allows us to 'synthesise' functions. By using some of the variables from an enclosing function, an inner function can be made to do different things. Imagine we need a few different but similar functions, one that adds 2 to its argument, one that adds 5, and so on.
function makeAddFunction(amount) { function add(number) { return number + amount; } return add; } var addTwo = makeAddFunction(2); var addFive = makeAddFunction(5); show(addTwo(1) + addFive(1));
¶ To wrap your head around this, you should consider functions to not just package up a computation, but also an environment. Top-level functions simply execute in the top-level environment, that much is obvious. But a function defined inside another function retains access to the environment that existed in that function at the point when it was defined.
¶ Thus, the add
function in the above example, which is created when
makeAddFunction
is called, captures an environment in which amount
has a certain value. It packages this environment, together with the
computation return number + amount
, into a value, which is then
returned from the outer function.
¶ When this returned function (addTwo
or addFive
) is called, a new
environment―-in which the variable number
has a value―-is created,
as a sub-environment of the captured environment (in which amount
has a value). These two values are then added, and the result is
returned.
¶ On top of the fact that different functions can contain variables of
the same name without getting tangled up, these scoping rules also
allow functions to call themselves without running into problems. A
function that calls itself is called recursive. Recursion
allows for some interesting definitions. Look at this implementation
of power
:
function power(base, exponent) { if (exponent == 0) return 1; else return base * power(base, exponent - 1); }
¶ This is rather close to the way mathematicians define exponentiation,
and to me it looks a lot nicer than the earlier version. It sort of
loops, but there is no while
, for
, or even a local side effect to
be seen. By calling itself, the function produces the same effect.
¶ There is one important problem though: In most browsers, this second version is about ten times slower than the first one. In JavaScript, running through a simple loop is a lot cheaper than calling a function multiple times.
¶ The dilemma of speed versus elegance is an interesting one. It not only occurs when deciding for or against recursion. In many situations, an elegant, intuitive, and often short solution can be replaced by a more convoluted but faster solution.
¶ In the case of the power
function above the un-elegant version is
still sufficiently simple and easy to read. It doesn't make very much
sense to replace it with the recursive version. Often, though, the
concepts a program is dealing with get so complex that giving up some
efficiency in order to make the program more straightforward becomes
an attractive choice.
¶ The basic rule, which has been repeated by many programmers and with which I wholeheartedly agree, is to not worry about efficiency until your program is provably too slow. When it is, find out which parts are too slow, and start exchanging elegance for efficiency in those parts.
¶ Of course, the above rule doesn't mean one should start ignoring
performance altogether. In many cases, like the power
function, not
much simplicity is gained by the 'elegant' approach. In other cases,
an experienced programmer can see right away that a simple approach is
never going to be fast enough.
¶ The reason I am making a big deal out of this is that surprisingly many programmers focus fanatically on efficiency, even in the smallest details. The result is bigger, more complicated, and often less correct programs, which take longer to write than their more straightforward equivalents and often run only marginally faster.
¶ But I was talking about recursion. A concept closely related to recursion is a thing called the stack. When a function is called, control is given to the body of that function. When that body returns, the code that called the function is resumed. While the body is running, the computer must remember the context from which the function was called, so that it knows where to continue afterwards. The place where this context is stored is called the stack.
¶ The fact that it is called 'stack' has to do with the fact that, as we saw, a function body can again call a function. Every time a function is called, another context has to be stored. One can visualise this as a stack of contexts. Every time a function is called, the current context is thrown on top of the stack. When a function returns, the context on top is taken off the stack and resumed.
¶ This stack requires space in the computer's memory to be stored. When the stack grows too big, the computer will give up with a message like "out of stack space" or "too much recursion". This is something that has to be kept in mind when writing recursive functions.
function chicken() { return egg(); } function egg() { return chicken(); } print(chicken() + " came first.");
¶ In addition to demonstrating a very interesting way of writing a broken program, this example shows that a function does not have to call itself directly to be recursive. If it calls another function which (directly or indirectly) calls the first function again, it is still recursive.
¶ Recursion is not always just a less-efficient alternative to looping. Some problems are much easier to solve with recursion than with loops. Most often these are problems that require exploring or processing several 'branches', each of which might branch out again into more branches.
¶ Consider this puzzle: By starting from the number 1 and repeatedly either adding 5 or multiplying by 3, an infinite amount of new numbers can be produced. How would you write a function that, given a number, tries to find a sequence of additions and multiplications that produce that number?
¶ For example, the number 13 could be reached by first multiplying 1 by 3, and then adding 5 twice. The number 15 can not be reached at all.
¶ Here is the solution:
function findSequence(goal) { function find(start, history) { if (start == goal) return history; else if (start > goal) return null; else return find(start + 5, "(" + history + " + 5)") || find(start * 3, "(" + history + " * 3)"); } return find(1, "1"); } print(findSequence(24));
¶ Note that it doesn't necessarily find the shortest sequence of operations, it is satisfied when it finds any sequence at all.
¶ The inner find
function, by calling itself in two different ways,
explores both the possibility of adding 5 to the current number and of
multiplying it by 3. When it finds the number, it returns the
history
string, which is used to record all the operators that were
performed to get to this number. It also checks whether the current
number is bigger than goal
, because if it is, we should stop
exploring this branch, it is not going to give us our number.
¶ The use of the ||
operator in the example can be read as 'return the
solution found by adding 5 to start
, and if that fails, return the
solution found by multiplying start
by 3'. It could also have been
written in a more wordy way like this:
else { var found = find(start + 5, "(" + history + " + 5)"); if (found == null) found = find(start * 3, "(" + history + " * 3)"); return found; }
¶ Even though function definitions occur as statements between the rest of the program, they are not part of the same time-line:
print("The future says: ", future()); function future() { return "We STILL have no flying cars."; }
¶ What is happening is that the computer looks up all function definitions, and stores the associated functions, before it starts executing the rest of the program. The same happens with functions that are defined inside other functions. When the outer function is called, the first thing that happens is that all inner functions are added to the new environment.
¶ There is another way to define function values, which more closely
resembles the way other values are created. When the function
keyword is used in a place where an expression is expected, it is
treated as an expression producing a function value. Functions created
in this way do not have to be given a name (though it is allowed to
give them one).
var add = function(a, b) { return a + b; }; show(add(5, 5));
¶ Note the semicolon after the definition of add
. Normal function
definitions do not need these, but this statement has the same general
structure as var add = 22;
, and thus requires the semicolon.
¶ This kind of function value is called an anonymous function, because
it does not have a name. Sometimes it is useless to give a function a
name, like in the makeAddFunction
example we saw earlier:
function makeAddFunction(amount) { return function (number) { return number + amount; }; }
¶ Since the function named add
in the first version of
makeAddFunction
was referred to only once, the name does not serve
any purpose and we might as well directly return the function value.
¶ Write a function greaterThan
, which takes one argument, a number,
and returns a function that represents a test. When this returned
function is called with a single number as argument, it returns a
boolean: true
if the given number is greater than the number that
was used to create the test function, and false
otherwise.
function greaterThan(x) { return function(y) { return y > x; }; } var greaterThanTen = greaterThan(10); show(greaterThanTen(9));
¶ Try the following:
alert("Hello", "Good Evening", "How do you do?", "Goodbye");
¶ The function alert
officially only accepts one argument. Yet when
you call it like this, the computer does not complain at all, but just
ignores the other arguments.
show();
¶ You can, apparently, even get away with passing too few arguments.
When an argument is not passed, its value inside the function is
undefined
.
¶ In the next chapter, we will see a way in which a function body can
get at the exact list of arguments that were passed to it. This can be
useful, as it makes it possible to have a function accept any number
of arguments. print
makes use of this:
print("R", 2, "D", 2);
¶ Of course, the downside of this is that it is also possible to
accidentally pass the wrong number of arguments to functions that
expect a fixed amount of them, like alert
, and never be told about
it.
- Technically, this wouldn't have been necessary, but I suppose the designers of JavaScript felt it would clarify things if function bodies always had braces.
- Technically, a pure function can not use the value of any external variables. These values might change, and this could make the function return a different value for the same arguments. In practice, the programmer may consider some variables 'constant' ― they are not expected to change ― and consider functions that use only constant variables pure. Variables that contain a function value are often good examples of constant variables.