Chapter 4: Data structures: Objects and Arrays
¶ This chapter will be devoted to solving a few simple problems. In the process, we will discuss two new types of values, arrays and objects, and look at some techniques related to them.
¶ Consider the following situation: Your crazy aunt Emily, who is rumoured to have over fifty cats living with her (you never managed to count them), regularly sends you e-mails to keep you up to date on her exploits. They usually look like this:
Dear nephew,
Your mother told me you have taken up skydiving. Is this true? You watch yourself, young man! Remember what happened to my husband? And that was only from the second floor!
Anyway, things are very exciting here. I have spent all week trying to get the attention of Mr. Drake, the nice gentleman who moved in next door, but I think he is afraid of cats. Or allergic to them? I am going to try putting Fat Igor on his shoulder next time I see him, very curious what will happen.
Also, the scam I told you about is going better than expected. I have already gotten back five 'payments', and only one complaint. It is starting to make me feel a bit bad though. And you are right that it is probably illegal in some way.
(... etc ...)
Much love, Aunt Emily
died 27/04/2006: Black Leclère
born 05/04/2006 (mother Lady Penelope): Red Lion, Doctor Hobbles the 3rd, Little Iroquois
¶ To humour the old dear, you would like to keep track of the genealogy of her cats, so you can add things like "P.S. I hope Doctor Hobbles the 2nd enjoyed his birthday this Saturday!", or "How is old Lady Penelope doing? She's five years old now, isn't she?", preferably without accidentally asking about dead cats. You are in the possession of a large quantity of old e-mails from your aunt, and fortunately she is very consistent in always putting information about the cats' births and deaths at the end of her mails in precisely the same format.
¶ You are hardly inclined to go through all those mails by hand. Fortunately, we were just in need of an example problem, so we will try to work out a program that does the work for us. For a start, we write a program that gives us a list of cats that are still alive after the last e-mail.
¶ Before you ask, at the start of the correspondence, aunt Emily had only a single cat: Spot. (She was still rather conventional in those days.)
¶ It usually pays to have some kind of clue what one's program is going to do before starting to type. Here's a plan:
- Start with a set of cat names that has only "Spot" in it.
- Go over every e-mail in our archive, in chronological order.
- Look for paragraphs that start with "born" or "died".
- Add the names from paragraphs that start with "born" to our set of names.
- Remove the names from paragraphs that start with "died" from our set.
¶ Where taking the names from a paragraph goes like this:
- Find the colon in the paragraph.
- Take the part after this colon.
- Split this part into separate names by looking for commas.
¶ It may require some suspension of disbelief to accept that aunt Emily always used this exact format, and that she never forgot or misspelled a name, but that is just how your aunt is.
¶ First, let me tell you about properties. A lot of JavaScript values
have other values associated with them. These associations are called
properties. Every string has a property called length
, which refers
to a number, the amount of characters in that string.
¶ Properties can be accessed in two ways:
var text = "purple haze"; show(text["length"]); show(text.length);
¶ The second way is a shorthand for the first, and it only works when the name of the property would be a valid variable name ― when it doesn't have any spaces or symbols in it and does not start with a digit character.
¶ The values null
and undefined
do not have any properties. Trying
to read properties from such a value produces an error. Try the
following code, if only to get an idea about the kind of error-message
your browser produces in such a case (which, for some browsers, can be
rather cryptic).
var nothing = null; show(nothing.length);
¶ The properties of a string value can not be changed. There are quite a
few more than just length
, as we will see, but you are not allowed
to add or remove any.
¶ This is different with values of the type object. Their main role is to hold other values. They have, you could say, their own set of tentacles in the form of properties. You are free to modify these, remove them, or add new ones.
¶ An object can be written like this:
var cat = {colour: "grey", name: "Spot", size: 46}; cat.size = 47; show(cat.size); delete cat.size; show(cat.size); show(cat);
¶ Like variables, each property attached to an object is labelled by a
string. The first statement creates an object in which the property
"colour"
holds the string "grey"
, the property "name"
is attached
to the string "Spot"
, and the property "size"
refers to the number
46
. The second statement gives the property named size
a new
value, which is done in the same way as modifying a variable.
¶ The keyword delete
cuts off properties. Trying to read a
non-existent property gives the value undefined
.
¶ If a property that does not yet exist is set with the =
operator,
it is added to the object.
var empty = {}; empty.notReally = 1000; show(empty.notReally);
¶ Properties whose names are not valid variable names have to be quoted when creating the object, and approached using brackets:
var thing = {"gabba gabba": "hey", "5": 10}; show(thing["5"]); thing["5"] = 20; show(thing[2 + 3]); delete thing["gabba gabba"];
¶ As you can see, the part between the brackets can be any expression. It is converted to a string to determine the property name it refers to. One can even use variables to name properties:
var propertyName = "length"; var text = "mainline"; show(text[propertyName]);
¶ The operator in
can be used to test whether an object has a
certain property. It produces a boolean.
var chineseBox = {}; chineseBox.content = chineseBox; show("content" in chineseBox); show("content" in chineseBox.content);
¶ When object values are shown on the console, they can be clicked to inspect their properties. This changes the output window to an 'inspect' window. The little 'x' at the top-right can be used to return to the output window, and the left-arrow can be used to go back to the properties of the previously inspected object.
show(chineseBox);
¶ The solution for the cat problem talks about a 'set' of names. A set is a collection of values in which no value may occur more than once. If names are strings, can you think of a way to use an object to represent a set of names?
¶ Show how a name can be added to this set, how one can be removed, and how you can check whether a name occurs in it.
¶ This can be done by storing the content of the set as the properties
of an object. Adding a name is done by setting a property by that name
to a value, any value. Removing a name is done by deleting this
property. The in
operator can be used to determine whether a certain
name is part of the set1.
var set = {"Spot": true}; // Add "White Fang" to the set set["White Fang"] = true; // Remove "Spot" delete set["Spot"]; // See if "Asoka" is in the set show("Asoka" in set);
¶ Object values, apparently, can change. The types of values discussed in chapter 2 are all immutable, it is impossible to change an existing value of those types. You can combine them and derive new values from them, but when you take a specific string value, the text inside it can not change. With objects, on the other hand, the content of a value can be modified by changing its properties.
¶ When we have two numbers, 120
and 120
, they can for all practical
purposes be considered the precise same number. With objects, there is
a difference between having two references to the same object and
having two different objects that contain the same properties.
Consider the following code:
var object1 = {value: 10}; var object2 = object1; var object3 = {value: 10}; show(object1 == object2); show(object1 == object3); object1.value = 15; show(object2.value); show(object3.value);
¶ object1
and object2
are two variables grasping the same value.
There is only one actual object, which is why changing object1
also
changes the value of object2
. The variable object3
points to
another object, which initially contains the same properties as
object1
, but lives a separate life.
¶ JavaScript's ==
operator, when comparing objects, will only return
true
if both values given to it are the precise same value.
Comparing different objects with identical contents will give false
.
This is useful in some situations, but impractical in others.
¶ Object values can play a lot of different roles. Behaving like a set is only one of those. We will see a few other roles in this chapter, and chapter 8 shows another important way of using objects.
¶ In the plan for the cat problem ― in fact, call it an algorithm, not a plan, that makes it sound like we know what we are talking about ― in the algorithm, it talks about going over all the e-mails in an archive. What does this archive look like? And where does it come from?
¶ Do not worry about the second question for now. Chapter 14 talks about some ways to import data into your programs, but for now you will find that the e-mails are just magically there. Some magic is really easy, inside computers.
¶ The way in which the archive is stored is still an interesting question. It contains a number of e-mails. An e-mail can be a string, that should be obvious. The whole archive could be put into one huge string, but that is hardly practical. What we want is a collection of separate strings.
¶ Collections of things are what objects are used for. One could make an object like this:
var mailArchive = {"the first e-mail": "Dear nephew, ...", "the second e-mail": "..." /* and so on ... */};
¶ But that makes it hard to go over the e-mails from start to end ― how does the program guess the name of these properties? This can be solved by more predictable property names:
var mailArchive = {0: "Dear nephew, ... (mail number 1)", 1: "(mail number 2)", 2: "(mail number 3)"}; for (var current = 0; current in mailArchive; current++) print("Processing e-mail #", current, ": ", mailArchive[current]);
¶ Luck has it that there is a special kind of objects specifically for
this kind of use. They are called arrays, and they provide some
conveniences, such as a length
property that contains the amount
of values in the array, and a number of operations useful for this
kind of collection.
¶ New arrays can be created using brackets ([
and ]
):
var mailArchive = ["mail one", "mail two", "mail three"]; for (var current = 0; current < mailArchive.length; current++) print("Processing e-mail #", current, ": ", mailArchive[current]);
¶ In this example, the numbers of the elements are not specified explicitly anymore. The first one automatically gets the number 0, the second the number 1, and so on.
¶ Why start at 0? People tend to start counting from 1. As unintuitive as it seems, numbering the elements in a collection from 0 is often more practical. Just go with it for now, it will grow on you.
¶ Starting at element 0 also means that in a collection with X
elements, the last element can be found at position X - 1
. This is
why the for
loop in the example checks for current <
mailArchive.length
. There is no element at position
mailArchive.length
, so as soon as current
has that value, we stop
looping.
¶ Write a function range
that takes one argument, a positive number,
and returns an array containing all numbers from 0 up to and including
the given number.
¶ An empty array can be created by simply typing []
. Also remember
that adding properties to an object, and thus also to an array, can be
done by assigning them a value with the =
operator. The length
property is automatically updated when elements are added.
function range(upto) { var result = []; for (var i = 0; i <= upto; i++) result[i] = i; return result; } show(range(4));
¶ Instead of naming the loop variable counter
or current
, as I have
been doing so far, it is now called simply i
. Using single letters,
usually i
, j
, or k
for loop variables is a widely spread habit
among programmers. It has its origin mostly in laziness: We'd rather
type one character than seven, and names like counter
and current
do not really clarify the meaning of the variable much.
¶ If a program uses too many meaningless single-letter variables, it can
become unbelievably confusing. In my own programs, I try to only do
this in a few common cases. Small loops are one of these cases. If the
loop contains another loop, and that one also uses a variable named
i
, the inner loop will modify the variable that the outer loop is
using, and everything will break. One could use j
for the inner
loop, but in general, when the body of a loop is big, you should come
up with a variable name that has some clear meaning.
¶ Both string and array objects contain, in addition to the length
property, a number of properties that refer to function values.
var doh = "Doh"; print(typeof doh.toUpperCase); print(doh.toUpperCase());
¶ Every string has a toUpperCase
property. When called, it will
return a copy of the string, in which all letters have been converted
to uppercase. There is also toLowerCase
. Guess what that does.
¶ Notice that, even though the call to toUpperCase
does not pass any
arguments, the function does somehow have access to the string
"Doh"
, the value of which it is a property. How this works precisely
is described in chapter 8.
¶ Properties that contain functions are generally called methods, as
in 'toUpperCase
is a method of a string object'.
var mack = []; mack.push("Mack"); mack.push("the"); mack.push("Knife"); show(mack.join(" ")); show(mack.pop()); show(mack);
¶ The method push
, which is associated with arrays, can be used to
add values to it. It could have been used in the last exercise, as an
alternative to result[i] = i
. Then there is pop
, the opposite of
push
: it takes off and returns the last value in the array. join
builds a single big string from an array of strings. The parameter it
is given is pasted between the values in the array.
¶ Coming back to those cats, we now know that an array would be a good
way to store the archive of e-mails. On this page, the function
retrieveMails
can be used to (magically) get hold of this array.
Going over them to process them one after another is not rocket science
anymore either:
var mailArchive = retrieveMails(); for (var i = 0; i < mailArchive.length; i++) { var email = mailArchive[i]; print("Processing e-mail #", i); // Do more things... }
¶ We have also decided on a way to represent the set of cats that are
alive. The next problem, then, is to find the paragraphs in an e-mail
that start with "born"
or "died"
.
¶ The first question that comes up is what exactly a paragraph is. In this case, the string value itself can't help us much: JavaScript's concept of text does not go any deeper than the 'sequence of characters' idea, so we must define paragraphs in those terms.
¶ Earlier, we saw that there is such a thing as a newline character. These are what most people use to split paragraphs. We consider a paragraph, then, to be a part of an e-mail that starts at a newline character or at the start of the content, and ends at the next newline character or at the end of the content.
¶ And we don't even have to write the algorithm for splitting a string
into paragraphs ourselves. Strings already have a method named
split
, which is (almost) the opposite of the join
method of
arrays. It splits a string into an array, using the string given as
its argument to determine in which places to cut.
var words = "Cities of the Interior"; show(words.split(" "));
¶ Thus, cutting on newlines ("\n"
), can be used to split an e-mail
into paragraphs.
¶ split
and join
are not precisely each other's inverse.
string.split(x).join(x)
always produces the original value, but
array.join(x).split(x)
does not. Can you give an example of an array
where .join(" ").split(" ")
produces a different value?
var array = ["a", "b", "c d"]; show(array.join(" ").split(" "));
¶ Paragraphs that do not start with either "born" or "died" can be
ignored by the program. How do we test whether a string starts with a
certain word? The method charAt
can be used to get a specific
character from a string. x.charAt(0)
gives the first character, 1
is the second one, and so on. One way to check whether a string starts
with "born" is:
var paragraph = "born 15-11-2003 (mother Spot): White Fang"; show(paragraph.charAt(0) == "b" && paragraph.charAt(1) == "o" && paragraph.charAt(2) == "r" && paragraph.charAt(3) == "n");
¶ But that gets a bit clumsy ― imagine checking for a word of ten characters. There is something to be learned here though: when a line gets ridiculously long, it can be spread over multiple lines. The result can be made easier to read by lining up the start of the new line with the first element on the original line that plays a similar role.
¶ Strings also have a method called slice
. It copies out a piece of
the string, starting from the character at the position given by the
first argument, and ending before (not including) the character at the
position given by the second one. This allows the check to be written
in a shorter way.
show(paragraph.slice(0, 4) == "born");
¶ Write a function called startsWith
that takes two arguments, both
strings. It returns true
when the first argument starts with the
characters in the second argument, and false
otherwise.
function startsWith(string, pattern) { return string.slice(0, pattern.length) == pattern; } show(startsWith("rotation", "rot"));
¶ What happens when charAt
or slice
are used to take a piece of a
string that does not exist? Will the startsWith
I showed still work
when the pattern is longer than the string it is matched against?
show("Pip".charAt(250)); show("Nop".slice(1, 10));
¶ charAt
will return ""
when there is no character at the given
position, and slice
will simply leave out the part of the new
string that does not exist.
¶ So yes, that version of startsWith
works. When startsWith("Idiots",
"Most honoured colleagues")
is called, the call to slice
will,
because string
does not have enough characters, always return a
string that is shorter than pattern
. Because of that, the comparison
with ==
will return false
, which is correct.
¶ It helps to always take a moment to consider abnormal (but valid) inputs for a program. These are usually called corner cases, and it is very common for programs that work perfectly on all the 'normal' inputs to screw up on corner cases.
¶ The only part of the cat-problem that is still unsolved is the extraction of names from a paragraph. The algorithm was this:
- Find the colon in the paragraph.
- Take the part after this colon.
- Split this part into separate names by looking for commas.
¶ This has to happen both for paragraphs that start with "died"
, and
paragraphs that start with "born"
. It would be a good idea to put it
into a function, so that the two pieces of code that handle these
different kinds of paragraphs can both use it.
¶ Can you write a function catNames
that takes a paragraph as an
argument and returns an array of names?
¶ Strings have an indexOf
method that can be used to find the
(first) position of a character or sub-string within that string. Also,
when slice
is given only one argument, it will return the part of
the string from the given position all the way to the end.
¶ It can be helpful to use the console to 'explore' functions. For
example, type "foo: bar".indexOf(":")
and see what you get.
function catNames(paragraph) { var colon = paragraph.indexOf(":"); return paragraph.slice(colon + 2).split(", "); } show(catNames("born 20/09/2004 (mother Yellow Bess): " + "Doctor Hobbles the 2nd, Noog"));
¶ The tricky part, which the original description of the algorithm
ignored, is dealing with spaces after the colon and the commas.
The + 2
used when slicing the string is needed to leave out the
colon itself and the space after it. The argument to split
contains
both a comma and a space, because that is what the names are really
separated by, rather than just a comma.
¶ This function does not do any checking for problems. We assume, in this case, that the input is always correct.
¶ All that remains now is putting the pieces together. One way to do that looks like this:
var mailArchive = retrieveMails(); var livingCats = {"Spot": true}; for (var mail = 0; mail < mailArchive.length; mail++) { var paragraphs = mailArchive[mail].split("\n"); for (var paragraph = 0; paragraph < paragraphs.length; paragraph++) { if (startsWith(paragraphs[paragraph], "born")) { var names = catNames(paragraphs[paragraph]); for (var name = 0; name < names.length; name++) livingCats[names[name]] = true; } else if (startsWith(paragraphs[paragraph], "died")) { var names = catNames(paragraphs[paragraph]); for (var name = 0; name < names.length; name++) delete livingCats[names[name]]; } } } show(livingCats);
¶ That is quite a big dense chunk of code. We'll look into making it a bit lighter in a moment. But first let us look at our results. We know how to check whether a specific cat survives:
if ("Spot" in livingCats) print("Spot lives!"); else print("Good old Spot, may she rest in peace.");
¶ But how do we list all the cats that are alive? The in
keyword has
a somewhat different meaning when it is used together with for
:
for (var cat in livingCats) print(cat);
¶ A loop like that will go over the names of the properties in an object, which allows us to enumerate all the names in our set.
¶ Some pieces of code look like an impenetrable jungle. The example solution to the cat problem suffers from this. One way to make some light shine through it is to just add some strategic blank lines. This makes it look better, but doesn't really solve the problem.
¶ What is needed here is to break the code up. We already wrote two
helper functions, startsWith
and catNames
, which both take care of
a small, understandable part of the problem. Let us continue doing
this.
function addToSet(set, values) { for (var i = 0; i < values.length; i++) set[values[i]] = true; } function removeFromSet(set, values) { for (var i = 0; i < values.length; i++) delete set[values[i]]; }
¶ These two functions take care of the adding and removing of names from the set. That already cuts out the two most inner loops from the solution:
var livingCats = {Spot: true}; for (var mail = 0; mail < mailArchive.length; mail++) { var paragraphs = mailArchive[mail].split("\n"); for (var paragraph = 0; paragraph < paragraphs.length; paragraph++) { if (startsWith(paragraphs[paragraph], "born")) addToSet(livingCats, catNames(paragraphs[paragraph])); else if (startsWith(paragraphs[paragraph], "died")) removeFromSet(livingCats, catNames(paragraphs[paragraph])); } }
¶ Quite an improvement, if I may say so myself.
¶ Why do addToSet
and removeFromSet
take the set as an argument?
They could use the variable livingCats
directly, if they wanted to.
The reason is that this way they are not completely tied to our
current problem. If addToSet
directly changed livingCats
, it would
have to be called addCatsToCatSet
, or something similar. The way it
is now, it is a more generally useful tool.
¶ Even if we are never going to use these functions for anything else,
which is quite probable, it is useful to write them like this. Because
they are 'self sufficient', they can be read and understood on their
own, without needing to know about some external variable called
livingCats
.
¶ The functions are not pure: They change the object passed as their
set
argument. This makes them slightly trickier than real pure
functions, but still a lot less confusing than functions that run amok
and change any value or variable they please.
¶ We continue breaking the algorithm into pieces:
function findLivingCats() { var mailArchive = retrieveMails(); var livingCats = {"Spot": true}; function handleParagraph(paragraph) { if (startsWith(paragraph, "born")) addToSet(livingCats, catNames(paragraph)); else if (startsWith(paragraph, "died")) removeFromSet(livingCats, catNames(paragraph)); } for (var mail = 0; mail < mailArchive.length; mail++) { var paragraphs = mailArchive[mail].split("\n"); for (var i = 0; i < paragraphs.length; i++) handleParagraph(paragraphs[i]); } return livingCats; } var howMany = 0; for (var cat in findLivingCats()) howMany++; print("There are ", howMany, " cats.");
¶ The whole algorithm is now encapsulated by a function. This means that
it does not leave a mess after it runs: livingCats
is now a local
variable in the function, instead of a top-level one, so it only
exists while the function runs. The code that needs this set can call
findLivingCats
and use the value it returns.
¶ It seemed to me that making handleParagraph
a separate function also
cleared things up. But this one is so closely tied to the
cat-algorithm that it is meaningless in any other situation. On top of
that, it needs access to the livingCats
variable. Thus, it is a
perfect candidate to be a function-inside-a-function. When it lives
inside findLivingCats
, it is clear that it is only relevant there,
and it has access to the variables of its parent function.
¶ This solution is actually bigger than the previous one. Still, it is tidier and I hope you'll agree that it is easier to read.
¶ The program still ignores a lot of the information that is contained in the e-mails. There are birth-dates, dates of death, and the names of mothers in there.
¶ To start with the dates: What would be a good way to store a date? We
could make an object with three properties, year
, month
, and
day
, and store numbers in them.
var when = {year: 1980, month: 2, day: 1};
¶ But JavaScript already provides a kind of object for this purpose.
Such an object can be created by using the keyword new
:
var when = new Date(1980, 1, 1); show(when);
¶ Just like the notation with braces and colons we have already
seen, new
is a way to create object values. Instead of specifying
all the property names and values, a function is used to build up the
object. This makes it possible to define a kind of standard procedure
for creating objects. Functions like this are called constructors,
and in chapter 8 we will see how to write them.
¶ The Date
constructor can be used in different ways.
show(new Date()); show(new Date(1980, 1, 1)); show(new Date(2007, 2, 30, 8, 20, 30));
¶ As you can see, these objects can store a time of day as well as a date. When not given any arguments, an object representing the current time and date is created. Arguments can be given to ask for a specific date and time. The order of the arguments is year, month, day, hour, minute, second, milliseconds. These last four are optional, they become 0 when not given.
¶ The month numbers these objects use go from 0 to 11, which can be confusing. Especially since day numbers do start from 1.
¶ The content of a Date
object can be inspected with a number of
get...
methods.
var today = new Date(); print("Year: ", today.getFullYear(), ", month: ", today.getMonth(), ", day: ", today.getDate()); print("Hour: ", today.getHours(), ", minutes: ", today.getMinutes(), ", seconds: ", today.getSeconds()); print("Day of week: ", today.getDay());
¶ All of these, except for getDay
, also have a set...
variant that
can be used to change the value of the date object.
¶ Inside the object, a date is represented by the amount of milliseconds it is away from January 1st 1970. You can imagine this is quite a large number.
var today = new Date(); show(today.getTime());
¶ A very useful thing to do with dates is comparing them.
var wallFall = new Date(1989, 10, 9); var gulfWarOne = new Date(1990, 6, 2); show(wallFall < gulfWarOne); show(wallFall == wallFall); // but show(wallFall == new Date(1989, 10, 9));
¶ Comparing dates with <
, >
, <=
, and >=
does exactly what you
would expect. When a date object is compared to itself with ==
the
result is true
, which is also good. But when ==
is used to
compare a date object to a different, equal date object, we get
false
. Huh?
¶ As mentioned earlier, ==
will return false
when comparing two
different objects, even if they contain the same properties. This is a
bit clumsy and error-prone here, since one would expect >=
and ==
to behave in a more or less similar way. Testing whether two dates are
equal can be done like this:
var wallFall1 = new Date(1989, 10, 9), wallFall2 = new Date(1989, 10, 9); show(wallFall1.getTime() == wallFall2.getTime());
¶ In addition to a date and time, Date
objects also contain
information about a timezone. When it is one o'clock in Amsterdam,
it can, depending on the time of year, be noon in London, and seven in
the morning in New York. Such times can only be compared when you take
their time zones into account. The getTimezoneOffset
function of a
Date
can be used to find out how many minutes it differs from GMT
(Greenwich Mean Time).
var now = new Date(); print(now.getTimezoneOffset());
"died 27/04/2006: Black Leclère"
¶ The date part is always in the exact same place of a paragraph. How
convenient. Write a function extractDate
that takes such a paragraph
as its argument, extracts the date, and returns it as a date object.
function extractDate(paragraph) { function numberAt(start, length) { return Number(paragraph.slice(start, start + length)); } return new Date(numberAt(11, 4), numberAt(8, 2) - 1, numberAt(5, 2)); } show(extractDate("died 27-04-2006: Black Leclère"));
¶ It would work without the calls to Number
, but as mentioned earlier,
I prefer not to use strings as if they are numbers. The inner function
was introduced to prevent having to repeat the Number
and slice
part three times.
¶ Note the - 1
for the month number. Like most people, Aunt Emily
counts her months from 1, so we have to adjust the value before giving
it to the Date
constructor. (The day number does not have this
problem, since Date
objects count days in the usual human way.)
¶ In chapter 10 we will see a more practical and robust way of extracting pieces from strings that have a fixed structure.
¶ Storing cats will work differently from now on. Instead of just
putting the value true
into the set, we store an object with
information about the cat. When a cat dies, we do not remove it from
the set, we just add a property death
to the object to store the
date on which the creature died.
¶ This means our addToSet
and removeFromSet
functions have become
useless. Something similar is needed, but it must also store
birth-dates and, later, the mother's name.
function catRecord(name, birthdate, mother) { return {name: name, birth: birthdate, mother: mother}; } function addCats(set, names, birthdate, mother) { for (var i = 0; i < names.length; i++) set[names[i]] = catRecord(names[i], birthdate, mother); } function deadCats(set, names, deathdate) { for (var i = 0; i < names.length; i++) set[names[i]].death = deathdate; }
¶ catRecord
is a separate function for creating these storage objects.
It might be useful in other situations, such as creating the object
for Spot. 'Record' is a term often used for objects like this, which
are used to group a limited number of values.
¶ So let us try to extract the names of the mother cats from the paragraphs.
"born 15/11/2003 (mother Spot): White Fang"
¶ One way to do this would be...
function extractMother(paragraph) { var start = paragraph.indexOf("(mother ") + "(mother ".length; var end = paragraph.indexOf(")"); return paragraph.slice(start, end); } show(extractMother("born 15/11/2003 (mother Spot): White Fang"));
¶ Notice how the start position has to be adjusted for the length of
"(mother "
, because indexOf
returns the position of the start of
the pattern, not its end.
¶ The thing that extractMother
does can be expressed in a more general
way. Write a function between
that takes three arguments, all of
which are strings. It will return the part of the first argument that
occurs between the patterns given by the second and the third
arguments.
¶ So between("born 15/11/2003 (mother Spot): White Fang", "(mother ",
")")
gives "Spot"
.
¶ between("bu ] boo [ bah ] gzz", "[ ", " ]")
returns "bah"
.
¶ To make that second test work, it can be useful to know that indexOf
can be given a second, optional parameter that specifies at which
point it should start searching.
function between(string, start, end) { var startAt = string.indexOf(start) + start.length; var endAt = string.indexOf(end, startAt); return string.slice(startAt, endAt); } show(between("bu ] boo [ bah ] gzz", "[ ", " ]"));
¶ Having between
makes it possible to express extractMother in a
simpler way:
function extractMother(paragraph) { return between(paragraph, "(mother ", ")"); }
¶ The new, improved cat-algorithm looks like this:
function findCats() { var mailArchive = retrieveMails(); var cats = {"Spot": catRecord("Spot", new Date(1997, 2, 5), "unknown")}; function handleParagraph(paragraph) { if (startsWith(paragraph, "born")) addCats(cats, catNames(paragraph), extractDate(paragraph), extractMother(paragraph)); else if (startsWith(paragraph, "died")) deadCats(cats, catNames(paragraph), extractDate(paragraph)); } for (var mail = 0; mail < mailArchive.length; mail++) { var paragraphs = mailArchive[mail].split("\n"); for (var i = 0; i < paragraphs.length; i++) handleParagraph(paragraphs[i]); } return cats; } var catData = findCats();
¶ Having that extra data allows us to finally have a clue about the cats aunt Emily talks about. A function like this could be useful:
function formatDate(date) { return date.getDate() + "/" + (date.getMonth() + 1) + "/" + date.getFullYear(); } function catInfo(data, name) { if (!(name in data)) return "No cat by the name of " + name + " is known."; var cat = data[name]; var message = name + ", born " + formatDate(cat.birth) + " from mother " + cat.mother; if ("death" in cat) message += ", died " + formatDate(cat.death); return message + "."; } print(catInfo(catData, "Fat Igor"));
¶ The first return
statement in catInfo
is used as an escape hatch.
If there is no data about the given cat, the rest of the function is
meaningless, so we immediately return a value, which prevents the rest
of the code from running.
¶ In the past, certain groups of programmers considered functions that
contain multiple return
statements sinful. The idea was that this
made it hard to see which code was executed and which code was not.
Other techniques, which will be discussed in chapter 5, have made the
reasons behind this idea more or less obsolete, but you might still
occasionally come across someone who will criticise the use of
'shortcut' return statements.
¶ The formatDate
function used by catInfo
does not add a zero before
the month and the day part when these are only one digit long. Write a
new version that does this.
function formatDate(date) { function pad(number) { if (number < 10) return "0" + number; else return number; } return pad(date.getDate()) + "/" + pad(date.getMonth() + 1) + "/" + date.getFullYear(); } print(formatDate(new Date(2000, 0, 1)));
¶ Write a function oldestCat
which, given an object containing cats as
its argument, returns the name of the oldest living cat.
function oldestCat(data) { var oldest = null; for (var name in data) { var cat = data[name]; if (!("death" in cat) && (oldest == null || oldest.birth > cat.birth)) oldest = cat; } if (oldest == null) return null; else return oldest.name; } print(oldestCat(catData));
¶ The condition in the if
statement might seem a little intimidating.
It can be read as 'only store the current cat in the variable oldest
if it is not dead, and oldest
is either null
or a cat that was
born after the current cat'.
¶ Note that this function returns null
when there are no living cats
in data
. What does your solution do in that case?
¶ Now that we are familiar with arrays, I can show you something
related. Whenever a function is called, a special variable named
arguments
is added to the environment in which the function body
runs. This variable refers to an object that resembles an array. It
has a property 0
for the first argument, 1
for the second, and so
on for every argument the function was given. It also has a length
property.
¶ This object is not a real array though, it does not have methods like
push
, and it does not automatically update its length
property
when you add something to it. Why not, I never really found out, but
this is something one needs to be aware of.
function argumentCounter() { print("You gave me ", arguments.length, " arguments."); } argumentCounter("Death", "Famine", "Pestilence");
¶ Some functions can take any number of arguments, like print
does.
These typically loop over the values in the arguments
object to do
something with them. Others can take optional arguments which, when
not given by the caller, get some sensible default value.
function add(number, howmuch) { if (arguments.length < 2) howmuch = 1; return number + howmuch; } show(add(6)); show(add(6, 4));
¶ Extend the range
function from exercise 4.2 to take a second, optional
argument. If only one argument is given, it behaves as earlier and
produces a range from 0 to the given number. If two arguments are
given, the first indicates the start of the range, the second the end.
function range(start, end) { if (arguments.length < 2) { end = start; start = 0; } var result = []; for (var i = start; i <= end; i++) result.push(i); return result; } show(range(4)); show(range(2, 4));
¶ The optional argument does not work precisely like the one in the
add
example above. When it is not given, the first argument takes
the role of end
, and start
becomes 0
.
¶ You may remember this line of code from the introduction:
print(sum(range(1, 10)));
¶ We have range
now. All we need to make this line work is a sum
function. This function takes an array of numbers, and returns their
sum. Write it, it should be easy.
function sum(numbers) { var total = 0; for (var i = 0; i < numbers.length; i++) total += numbers[i]; return total; } print(sum(range(1, 10)));
¶ Chapter 2 mentioned the functions Math.max
and Math.min
.
With what you know now, you will notice that these are really the
properties max
and min
of the object stored under the name
Math
. This is another role that objects can play: A warehouse
holding a number of related values.
¶ There are quite a lot of values inside Math
, if they would all have
been placed directly into the global environment they would, as it is
called, pollute it. The more names have been taken, the more likely
one is to accidentally overwrite the value of some variable. For
example, it is not a far shot to want to name something max
.
¶ Most languages will stop you, or at least warn you, when you are defining a variable with a name that is already taken. Not JavaScript.
¶ In any case, one can find a whole outfit of mathematical functions and
constants inside Math
. All the trigonometric functions are there ―
cos
, sin
, tan
, acos
, asin
, atan
. π and e, which are
written with all capital letters (PI
and E
), which was, at one
time, a fashionable way to indicate something is a constant. pow
is
a good replacement for the power
functions we have been writing, it
also accepts negative and fractional exponents. sqrt
takes square
roots. max
and min
can give the maximum or minimum of two values.
round
, floor
, and
ceil
will round numbers to the closest whole number, the whole
number below it, and the whole number above it respectively.
¶ There are a number of other values in Math
, but this text is an
introduction, not a reference. References are what you look at when
you suspect something exists in the language, but need to find out
what it is called or how it works exactly. Unfortunately, there is no
one comprehensive complete reference for JavaScript. This is mostly
because its current form is the result of a chaotic process of
different browsers adding different extensions at different times. The
ECMA standard document that was mentioned in the introduction provides
a solid documentation of the basic language, but is more or less
unreadable. For most things, your best bet is the Mozilla Developer
Network.
¶ Maybe you already thought of a way to find out what is available in
the Math
object:
for (var name in Math) print(name);
¶ But alas, nothing appears. Similarly, when you do this:
for (var name in ["Huey", "Dewey", "Loui"]) print(name);
¶ You only see 0
, 1
, and 2
, not length
, or push
, or join
,
which are definitely also in there. Apparently, some properties of
objects are hidden. There is a good reason for
this: All objects have a few methods, for example toString
, which
converts the object into some kind of relevant string, and you do not
want to see those when you are, for example, looking for the cats that
you stored in the object.
¶ Why the properties of Math
are hidden is unclear to me. Someone
probably wanted it to be a mysterious kind of object.
¶ All properties your programs add to objects are visible. There is no
way to make them hidden, which is unfortunate because, as we will see
in chapter 8, it would be nice to be able to add methods to objects
without having them show up in our for
/in
loops.
¶ Some properties are read-only, you can get their value but not change it. For example, the properties of a string value are all read-only.
¶ Other properties can be 'active'. Changing them causes things to happen. For example, lowering the length of an array causes excess elements to be discarded:
var array = ["Heaven", "Earth", "Man"]; array.length = 2; show(array);
- There are a few subtle problems with this approach, which will be discussed and solved in chapter 8. For this chapter, it works well enough.