Chapter 4: Data structures: Objects and Arrays

¶ This chapter will be devoted to solving a few simple problems. In the process, we will discuss two new types of values, arrays and objects, and look at some techniques related to them.

¶ Consider the following situation: Your crazy aunt Emily, who is rumoured to have over fifty cats living with her (you never managed to count them), regularly sends you e-mails to keep you up to date on her exploits. They usually look like this:

Dear nephew,

Your mother told me you have taken up skydiving. Is this true? You watch yourself, young man! Remember what happened to my husband? And that was only from the second floor!

Anyway, things are very exciting here. I have spent all week trying to get the attention of Mr. Drake, the nice gentleman who moved in next door, but I think he is afraid of cats. Or allergic to them? I am going to try putting Fat Igor on his shoulder next time I see him, very curious what will happen.

Also, the scam I told you about is going better than expected. I have already gotten back five 'payments', and only one complaint. It is starting to make me feel a bit bad though. And you are right that it is probably illegal in some way.

(... etc ...)

Much love, Aunt Emily

died 27/04/2006: Black Leclère

born 05/04/2006 (mother Lady Penelope): Red Lion, Doctor Hobbles the 3rd, Little Iroquois

¶ To humour the old dear, you would like to keep track of the genealogy of her cats, so you can add things like "P.S. I hope Doctor Hobbles the 2nd enjoyed his birthday this Saturday!", or "How is old Lady Penelope doing? She's five years old now, isn't she?", preferably without accidentally asking about dead cats. You are in the possession of a large quantity of old e-mails from your aunt, and fortunately she is very consistent in always putting information about the cats' births and deaths at the end of her mails in precisely the same format.

¶ You are hardly inclined to go through all those mails by hand. Fortunately, we were just in need of an example problem, so we will try to work out a program that does the work for us. For a start, we write a program that gives us a list of cats that are still alive after the last e-mail.

¶ Before you ask, at the start of the correspondence, aunt Emily had only a single cat: Spot. (She was still rather conventional in those days.)

¶ It usually pays to have some kind of clue what one's program is going to do before starting to type. Here's a plan:

Start with a set of cat names that has only "Spot" in it.
Go over every e-mail in our archive, in chronological order.
Look for paragraphs that start with "born" or "died".
Add the names from paragraphs that start with "born" to our set of names.
Remove the names from paragraphs that start with "died" from our set.

¶ Where taking the names from a paragraph goes like this:

Find the colon in the paragraph.
Take the part after this colon.
Split this part into separate names by looking for commas.

¶ It may require some suspension of disbelief to accept that aunt Emily always used this exact format, and that she never forgot or misspelled a name, but that is just how your aunt is.

¶ First, let me tell you about properties. A lot of JavaScript values have other values associated with them. These associations are called properties. Every string has a property called length, which refers to a number, the amount of characters in that string.

¶ Properties can be accessed in two ways:

var text = "purple haze";
show(text["length"]);
show(text.length);

¶ The second way is a shorthand for the first, and it only works when the name of the property would be a valid variable name ― when it doesn't have any spaces or symbols in it and does not start with a digit character.

¶ The values null and undefined do not have any properties. Trying to read properties from such a value produces an error. Try the following code, if only to get an idea about the kind of error-message your browser produces in such a case (which, for some browsers, can be rather cryptic).

var nothing = null;
show(nothing.length);

¶ The properties of a string value can not be changed. There are quite a few more than just length, as we will see, but you are not allowed to add or remove any.

¶ This is different with values of the type object. Their main role is to hold other values. They have, you could say, their own set of tentacles in the form of properties. You are free to modify these, remove them, or add new ones.

¶ An object can be written like this:

var cat = {colour: "grey", name: "Spot", size: 46};
cat.size = 47;
show(cat.size);
delete cat.size;
show(cat.size);
show(cat);

¶ Like variables, each property attached to an object is labelled by a string. The first statement creates an object in which the property "colour" holds the string "grey", the property "name" is attached to the string "Spot", and the property "size" refers to the number 46. The second statement gives the property named size a new value, which is done in the same way as modifying a variable.

¶ The keyword delete cuts off properties. Trying to read a non-existent property gives the value undefined.

¶ If a property that does not yet exist is set with the = operator, it is added to the object.

var empty = {};
empty.notReally = 1000;
show(empty.notReally);

¶ Properties whose names are not valid variable names have to be quoted when creating the object, and approached using brackets:

var thing = {"gabba gabba": "hey", "5": 10};
show(thing["5"]);
thing["5"] = 20;
show(thing[2 + 3]);
delete thing["gabba gabba"];

¶ As you can see, the part between the brackets can be any expression. It is converted to a string to determine the property name it refers to. One can even use variables to name properties:

var propertyName = "length";
var text = "mainline";
show(text[propertyName]);

¶ The operator in can be used to test whether an object has a certain property. It produces a boolean.

var chineseBox = {};
chineseBox.content = chineseBox;
show("content" in chineseBox);
show("content" in chineseBox.content);

¶ When object values are shown on the console, they can be clicked to inspect their properties. This changes the output window to an 'inspect' window. The little 'x' at the top-right can be used to return to the output window, and the left-arrow can be used to go back to the properties of the previously inspected object.

show(chineseBox);

Ex. 4.1

¶ The solution for the cat problem talks about a 'set' of names. A set is a collection of values in which no value may occur more than once. If names are strings, can you think of a way to use an object to represent a set of names?

¶ Show how a name can be added to this set, how one can be removed, and how you can check whether a name occurs in it.

¶ This can be done by storing the content of the set as the properties of an object. Adding a name is done by setting a property by that name to a value, any value. Removing a name is done by deleting this property. The in operator can be used to determine whether a certain name is part of the set1.

var set = {"Spot": true};
// Add "White Fang" to the set
set["White Fang"] = true;
// Remove "Spot"
delete set["Spot"];
// See if "Asoka" is in the set
show("Asoka" in set);

¶ Object values, apparently, can change. The types of values discussed in chapter 2 are all immutable, it is impossible to change an existing value of those types. You can combine them and derive new values from them, but when you take a specific string value, the text inside it can not change. With objects, on the other hand, the content of a value can be modified by changing its properties.

¶ When we have two numbers, 120 and 120, they can for all practical purposes be considered the precise same number. With objects, there is a difference between having two references to the same object and having two different objects that contain the same properties. Consider the following code:

var object1 = {value: 10};
var object2 = object1;
var object3 = {value: 10};

show(object1 == object2);
show(object1 == object3);

object1.value = 15;
show(object2.value);
show(object3.value);

¶ object1 and object2 are two variables grasping the same value. There is only one actual object, which is why changing object1 also changes the value of object2. The variable object3 points to another object, which initially contains the same properties as object1, but lives a separate life.

¶ JavaScript's == operator, when comparing objects, will only return true if both values given to it are the precise same value. Comparing different objects with identical contents will give false. This is useful in some situations, but impractical in others.

¶ Object values can play a lot of different roles. Behaving like a set is only one of those. We will see a few other roles in this chapter, and chapter 8 shows another important way of using objects.

¶ In the plan for the cat problem ― in fact, call it an algorithm, not a plan, that makes it sound like we know what we are talking about ― in the algorithm, it talks about going over all the e-mails in an archive. What does this archive look like? And where does it come from?

¶ Do not worry about the second question for now. Chapter 14 talks about some ways to import data into your programs, but for now you will find that the e-mails are just magically there. Some magic is really easy, inside computers.

¶ The way in which the archive is stored is still an interesting question. It contains a number of e-mails. An e-mail can be a string, that should be obvious. The whole archive could be put into one huge string, but that is hardly practical. What we want is a collection of separate strings.

¶ Collections of things are what objects are used for. One could make an object like this:

var mailArchive = {"the first e-mail": "Dear nephew, ...",
                   "the second e-mail": "..."
                   /* and so on ... */};

¶ But that makes it hard to go over the e-mails from start to end ― how does the program guess the name of these properties? This can be solved by more predictable property names:

var mailArchive = {0: "Dear nephew, ... (mail number 1)",
                   1: "(mail number 2)",
                   2: "(mail number 3)"};

for (var current = 0; current in mailArchive; current++)
  print("Processing e-mail #", current, ": ", mailArchive[current]);

¶ Luck has it that there is a special kind of objects specifically for this kind of use. They are called arrays, and they provide some conveniences, such as a length property that contains the amount of values in the array, and a number of operations useful for this kind of collection.

¶ New arrays can be created using brackets ([ and ]):

var mailArchive = ["mail one", "mail two", "mail three"];

for (var current = 0; current < mailArchive.length; current++)
  print("Processing e-mail #", current, ": ", mailArchive[current]);

¶ In this example, the numbers of the elements are not specified explicitly anymore. The first one automatically gets the number 0, the second the number 1, and so on.

¶ Why start at 0? People tend to start counting from 1. As unintuitive as it seems, numbering the elements in a collection from 0 is often more practical. Just go with it for now, it will grow on you.

¶ Starting at element 0 also means that in a collection with X elements, the last element can be found at position X - 1. This is why the for loop in the example checks for current < mailArchive.length. There is no element at position mailArchive.length, so as soon as current has that value, we stop looping.

Ex. 4.2

¶ Write a function range that takes one argument, a positive number, and returns an array containing all numbers from 0 up to and including the given number.

¶ An empty array can be created by simply typing []. Also remember that adding properties to an object, and thus also to an array, can be done by assigning them a value with the = operator. The length property is automatically updated when elements are added.

function range(upto) {
  var result = [];
  for (var i = 0; i <= upto; i++)
    result[i] = i;
  return result;
}
show(range(4));

¶ Instead of naming the loop variable counter or current, as I have been doing so far, it is now called simply i. Using single letters, usually i, j, or k for loop variables is a widely spread habit among programmers. It has its origin mostly in laziness: We'd rather type one character than seven, and names like counter and current do not really clarify the meaning of the variable much.

¶ If a program uses too many meaningless single-letter variables, it can become unbelievably confusing. In my own programs, I try to only do this in a few common cases. Small loops are one of these cases. If the loop contains another loop, and that one also uses a variable named i, the inner loop will modify the variable that the outer loop is using, and everything will break. One could use j for the inner loop, but in general, when the body of a loop is big, you should come up with a variable name that has some clear meaning.

¶ Both string and array objects contain, in addition to the length property, a number of properties that refer to function values.

var doh = "Doh";
print(typeof doh.toUpperCase);
print(doh.toUpperCase());

¶ Every string has a toUpperCase property. When called, it will return a copy of the string, in which all letters have been converted to uppercase. There is also toLowerCase. Guess what that does.

¶ Notice that, even though the call to toUpperCase does not pass any arguments, the function does somehow have access to the string "Doh", the value of which it is a property. How this works precisely is described in chapter 8.

¶ Properties that contain functions are generally called methods, as in 'toUpperCase is a method of a string object'.

var mack = [];
mack.push("Mack");
mack.push("the");
mack.push("Knife");
show(mack.join(" "));
show(mack.pop());
show(mack);

¶ The method push, which is associated with arrays, can be used to add values to it. It could have been used in the last exercise, as an alternative to result[i] = i. Then there is pop, the opposite of push: it takes off and returns the last value in the array. join builds a single big string from an array of strings. The parameter it is given is pasted between the values in the array.

¶ Coming back to those cats, we now know that an array would be a good way to store the archive of e-mails. On this page, the function retrieveMails can be used to (magically) get hold of this array. Going over them to process them one after another is not rocket science anymore either:

var mailArchive = retrieveMails();

for (var i = 0; i < mailArchive.length; i++) {
  var email = mailArchive[i];
  print("Processing e-mail #", i);
  // Do more things...
}

¶ We have also decided on a way to represent the set of cats that are alive. The next problem, then, is to find the paragraphs in an e-mail that start with "born" or "died".

¶ The first question that comes up is what exactly a paragraph is. In this case, the string value itself can't help us much: JavaScript's concept of text does not go any deeper than the 'sequence of characters' idea, so we must define paragraphs in those terms.

¶ Earlier, we saw that there is such a thing as a newline character. These are what most people use to split paragraphs. We consider a paragraph, then, to be a part of an e-mail that starts at a newline character or at the start of the content, and ends at the next newline character or at the end of the content.

¶ And we don't even have to write the algorithm for splitting a string into paragraphs ourselves. Strings already have a method named split, which is (almost) the opposite of the join method of arrays. It splits a string into an array, using the string given as its argument to determine in which places to cut.

var words = "Cities of the Interior";
show(words.split(" "));

¶ Thus, cutting on newlines ("\n"), can be used to split an e-mail into paragraphs.

Ex. 4.3

¶ split and join are not precisely each other's inverse. string.split(x).join(x) always produces the original value, but array.join(x).split(x) does not. Can you give an example of an array where .join(" ").split(" ") produces a different value?

var array = ["a", "b", "c d"];
show(array.join(" ").split(" "));

¶ Paragraphs that do not start with either "born" or "died" can be ignored by the program. How do we test whether a string starts with a certain word? The method charAt can be used to get a specific character from a string. x.charAt(0) gives the first character, 1 is the second one, and so on. One way to check whether a string starts with "born" is:

var paragraph = "born 15-11-2003 (mother Spot): White Fang";
show(paragraph.charAt(0) == "b" && paragraph.charAt(1) == "o" &&
     paragraph.charAt(2) == "r" && paragraph.charAt(3) == "n");

¶ But that gets a bit clumsy ― imagine checking for a word of ten characters. There is something to be learned here though: when a line gets ridiculously long, it can be spread over multiple lines. The result can be made easier to read by lining up the start of the new line with the first element on the original line that plays a similar role.

¶ Strings also have a method called slice. It copies out a piece of the string, starting from the character at the position given by the first argument, and ending before (not including) the character at the position given by the second one. This allows the check to be written in a shorter way.

show(paragraph.slice(0, 4) == "born");

Ex. 4.4

¶ Write a function called startsWith that takes two arguments, both strings. It returns true when the first argument starts with the characters in the second argument, and false otherwise.

function startsWith(string, pattern) {
  return string.slice(0, pattern.length) == pattern;
}

show(startsWith("rotation", "rot"));

¶ What happens when charAt or slice are used to take a piece of a string that does not exist? Will the startsWith I showed still work when the pattern is longer than the string it is matched against?

show("Pip".charAt(250));
show("Nop".slice(1, 10));

¶ charAt will return "" when there is no character at the given position, and slice will simply leave out the part of the new string that does not exist.

¶ So yes, that version of startsWith works. When startsWith("Idiots", "Most honoured colleagues") is called, the call to slice will, because string does not have enough characters, always return a string that is shorter than pattern. Because of that, the comparison with == will return false, which is correct.

¶ It helps to always take a moment to consider abnormal (but valid) inputs for a program. These are usually called corner cases, and it is very common for programs that work perfectly on all the 'normal' inputs to screw up on corner cases.

¶ The only part of the cat-problem that is still unsolved is the extraction of names from a paragraph. The algorithm was this:

Find the colon in the paragraph.
Take the part after this colon.
Split this part into separate names by looking for commas.

¶ This has to happen both for paragraphs that start with "died", and paragraphs that start with "born". It would be a good idea to put it into a function, so that the two pieces of code that handle these different kinds of paragraphs can both use it.

Ex. 4.5

¶ Can you write a function catNames that takes a paragraph as an argument and returns an array of names?

¶ Strings have an indexOf method that can be used to find the (first) position of a character or sub-string within that string. Also, when slice is given only one argument, it will return the part of the string from the given position all the way to the end.

¶ It can be helpful to use the console to 'explore' functions. For example, type "foo: bar".indexOf(":") and see what you get.

function catNames(paragraph) {
  var colon = paragraph.indexOf(":");
  return paragraph.slice(colon + 2).split(", ");
}

show(catNames("born 20/09/2004 (mother Yellow Bess): " +
              "Doctor Hobbles the 2nd, Noog"));

¶ The tricky part, which the original description of the algorithm ignored, is dealing with spaces after the colon and the commas. The + 2 used when slicing the string is needed to leave out the colon itself and the space after it. The argument to split contains both a comma and a space, because that is what the names are really separated by, rather than just a comma.

¶ This function does not do any checking for problems. We assume, in this case, that the input is always correct.

¶ All that remains now is putting the pieces together. One way to do that looks like this:

var mailArchive = retrieveMails();
var livingCats = {"Spot": true};

for (var mail = 0; mail < mailArchive.length; mail++) {
  var paragraphs = mailArchive[mail].split("\n");
  for (var paragraph = 0;
       paragraph < paragraphs.length;
       paragraph++) {
    if (startsWith(paragraphs[paragraph], "born")) {
      var names = catNames(paragraphs[paragraph]);
      for (var name = 0; name < names.length; name++)
        livingCats[names[name]] = true;
    }
    else if (startsWith(paragraphs[paragraph], "died")) {
      var names = catNames(paragraphs[paragraph]);
      for (var name = 0; name < names.length; name++)
        delete livingCats[names[name]];
    }
  }
}

show(livingCats);

¶ That is quite a big dense chunk of code. We'll look into making it a bit lighter in a moment. But first let us look at our results. We know how to check whether a specific cat survives:

if ("Spot" in livingCats)
  print("Spot lives!");
else
  print("Good old Spot, may she rest in peace.");

¶ But how do we list all the cats that are alive? The in keyword has a somewhat different meaning when it is used together with for:

for (var cat in livingCats)
  print(cat);

¶ A loop like that will go over the names of the properties in an object, which allows us to enumerate all the names in our set.

¶ Some pieces of code look like an impenetrable jungle. The example solution to the cat problem suffers from this. One way to make some light shine through it is to just add some strategic blank lines. This makes it look better, but doesn't really solve the problem.

¶ What is needed here is to break the code up. We already wrote two helper functions, startsWith and catNames, which both take care of a small, understandable part of the problem. Let us continue doing this.

function addToSet(set, values) {
  for (var i = 0; i < values.length; i++)
    set[values[i]] = true;
}

function removeFromSet(set, values) {
  for (var i = 0; i < values.length; i++)
    delete set[values[i]];
}

¶ These two functions take care of the adding and removing of names from the set. That already cuts out the two most inner loops from the solution:

var livingCats = {Spot: true};

for (var mail = 0; mail < mailArchive.length; mail++) {
  var paragraphs = mailArchive[mail].split("\n");
  for (var paragraph = 0;
       paragraph < paragraphs.length;
       paragraph++) {
    if (startsWith(paragraphs[paragraph], "born"))
      addToSet(livingCats, catNames(paragraphs[paragraph]));
    else if (startsWith(paragraphs[paragraph], "died"))
      removeFromSet(livingCats, catNames(paragraphs[paragraph]));
  }
}

¶ Quite an improvement, if I may say so myself.

¶ Why do addToSet and removeFromSet take the set as an argument? They could use the variable livingCats directly, if they wanted to. The reason is that this way they are not completely tied to our current problem. If addToSet directly changed livingCats, it would have to be called addCatsToCatSet, or something similar. The way it is now, it is a more generally useful tool.

¶ Even if we are never going to use these functions for anything else, which is quite probable, it is useful to write them like this. Because they are 'self sufficient', they can be read and understood on their own, without needing to know about some external variable called livingCats.

¶ The functions are not pure: They change the object passed as their set argument. This makes them slightly trickier than real pure functions, but still a lot less confusing than functions that run amok and change any value or variable they please.

¶ We continue breaking the algorithm into pieces:

function findLivingCats() {
  var mailArchive = retrieveMails();
  var livingCats = {"Spot": true};

  function handleParagraph(paragraph) {
    if (startsWith(paragraph, "born"))
      addToSet(livingCats, catNames(paragraph));
    else if (startsWith(paragraph, "died"))
      removeFromSet(livingCats, catNames(paragraph));
  }

  for (var mail = 0; mail < mailArchive.length; mail++) {
    var paragraphs = mailArchive[mail].split("\n");
    for (var i = 0; i < paragraphs.length; i++)
      handleParagraph(paragraphs[i]);
  }
  return livingCats;
}

var howMany = 0;
for (var cat in findLivingCats())
  howMany++;
print("There are ", howMany, " cats.");

¶ The whole algorithm is now encapsulated by a function. This means that it does not leave a mess after it runs: livingCats is now a local variable in the function, instead of a top-level one, so it only exists while the function runs. The code that needs this set can call findLivingCats and use the value it returns.

¶ It seemed to me that making handleParagraph a separate function also cleared things up. But this one is so closely tied to the cat-algorithm that it is meaningless in any other situation. On top of that, it needs access to the livingCats variable. Thus, it is a perfect candidate to be a function-inside-a-function. When it lives inside findLivingCats, it is clear that it is only relevant there, and it has access to the variables of its parent function.

¶ This solution is actually bigger than the previous one. Still, it is tidier and I hope you'll agree that it is easier to read.

¶ The program still ignores a lot of the information that is contained in the e-mails. There are birth-dates, dates of death, and the names of mothers in there.

¶ To start with the dates: What would be a good way to store a date? We could make an object with three properties, year, month, and day, and store numbers in them.

var when = {year: 1980, month: 2, day: 1};

¶ But JavaScript already provides a kind of object for this purpose. Such an object can be created by using the keyword new:

var when = new Date(1980, 1, 1);
show(when);

¶ Just like the notation with braces and colons we have already seen, new is a way to create object values. Instead of specifying all the property names and values, a function is used to build up the object. This makes it possible to define a kind of standard procedure for creating objects. Functions like this are called constructors, and in chapter 8 we will see how to write them.

¶ The Date constructor can be used in different ways.

show(new Date());
show(new Date(1980, 1, 1));
show(new Date(2007, 2, 30, 8, 20, 30));

¶ As you can see, these objects can store a time of day as well as a date. When not given any arguments, an object representing the current time and date is created. Arguments can be given to ask for a specific date and time. The order of the arguments is year, month, day, hour, minute, second, milliseconds. These last four are optional, they become 0 when not given.

¶ The month numbers these objects use go from 0 to 11, which can be confusing. Especially since day numbers do start from 1.

¶ The content of a Date object can be inspected with a number of get... methods.

var today = new Date();
print("Year: ", today.getFullYear(), ", month: ",
      today.getMonth(), ", day: ", today.getDate());
print("Hour: ", today.getHours(), ", minutes: ",
      today.getMinutes(), ", seconds: ", today.getSeconds());
print("Day of week: ", today.getDay());

¶ All of these, except for getDay, also have a set... variant that can be used to change the value of the date object.

¶ Inside the object, a date is represented by the amount of milliseconds it is away from January 1st 1970. You can imagine this is quite a large number.

var today = new Date();
show(today.getTime());

¶ A very useful thing to do with dates is comparing them.

var wallFall = new Date(1989, 10, 9);
var gulfWarOne = new Date(1990, 6, 2);
show(wallFall < gulfWarOne);
show(wallFall == wallFall);
// but
show(wallFall == new Date(1989, 10, 9));

¶ Comparing dates with <, >, <=, and >= does exactly what you would expect. When a date object is compared to itself with == the result is true, which is also good. But when == is used to compare a date object to a different, equal date object, we get false. Huh?

¶ As mentioned earlier, == will return false when comparing two different objects, even if they contain the same properties. This is a bit clumsy and error-prone here, since one would expect >= and == to behave in a more or less similar way. Testing whether two dates are equal can be done like this:

var wallFall1 = new Date(1989, 10, 9),
    wallFall2 = new Date(1989, 10, 9);
show(wallFall1.getTime() == wallFall2.getTime());

¶ In addition to a date and time, Date objects also contain information about a timezone. When it is one o'clock in Amsterdam, it can, depending on the time of year, be noon in London, and seven in the morning in New York. Such times can only be compared when you take their time zones into account. The getTimezoneOffset function of a Date can be used to find out how many minutes it differs from GMT (Greenwich Mean Time).

var now = new Date();
print(now.getTimezoneOffset());

Ex. 4.6

"died 27/04/2006: Black Leclère"

¶ The date part is always in the exact same place of a paragraph. How convenient. Write a function extractDate that takes such a paragraph as its argument, extracts the date, and returns it as a date object.

function extractDate(paragraph) {
  function numberAt(start, length) {
    return Number(paragraph.slice(start, start + length));
  }
  return new Date(numberAt(11, 4), numberAt(8, 2) - 1,
                  numberAt(5, 2));
}

show(extractDate("died 27-04-2006: Black Leclère"));

¶ It would work without the calls to Number, but as mentioned earlier, I prefer not to use strings as if they are numbers. The inner function was introduced to prevent having to repeat the Number and slice part three times.

¶ Note the - 1 for the month number. Like most people, Aunt Emily counts her months from 1, so we have to adjust the value before giving it to the Date constructor. (The day number does not have this problem, since Date objects count days in the usual human way.)

¶ In chapter 10 we will see a more practical and robust way of extracting pieces from strings that have a fixed structure.

¶ Storing cats will work differently from now on. Instead of just putting the value true into the set, we store an object with information about the cat. When a cat dies, we do not remove it from the set, we just add a property death to the object to store the date on which the creature died.

¶ This means our addToSet and removeFromSet functions have become useless. Something similar is needed, but it must also store birth-dates and, later, the mother's name.

function catRecord(name, birthdate, mother) {
  return {name: name, birth: birthdate, mother: mother};
}

function addCats(set, names, birthdate, mother) {
  for (var i = 0; i < names.length; i++)
    set[names[i]] = catRecord(names[i], birthdate, mother);
}
function deadCats(set, names, deathdate) {
  for (var i = 0; i < names.length; i++)
    set[names[i]].death = deathdate;
}

¶ catRecord is a separate function for creating these storage objects. It might be useful in other situations, such as creating the object for Spot. 'Record' is a term often used for objects like this, which are used to group a limited number of values.

¶ So let us try to extract the names of the mother cats from the paragraphs.

"born 15/11/2003 (mother Spot): White Fang"

¶ One way to do this would be...

function extractMother(paragraph) {
  var start = paragraph.indexOf("(mother ") + "(mother ".length;
  var end = paragraph.indexOf(")");
  return paragraph.slice(start, end);
}

show(extractMother("born 15/11/2003 (mother Spot): White Fang"));

¶ Notice how the start position has to be adjusted for the length of "(mother ", because indexOf returns the position of the start of the pattern, not its end.

Ex. 4.7

¶ The thing that extractMother does can be expressed in a more general way. Write a function between that takes three arguments, all of which are strings. It will return the part of the first argument that occurs between the patterns given by the second and the third arguments.

¶ So between("born 15/11/2003 (mother Spot): White Fang", "(mother ", ")") gives "Spot".

¶ between("bu ] boo [ bah ] gzz", "[ ", " ]") returns "bah".

¶ To make that second test work, it can be useful to know that indexOf can be given a second, optional parameter that specifies at which point it should start searching.

function between(string, start, end) {
  var startAt = string.indexOf(start) + start.length;
  var endAt = string.indexOf(end, startAt);
  return string.slice(startAt, endAt);
}
show(between("bu ] boo [ bah ] gzz", "[ ", " ]"));

¶ Having between makes it possible to express extractMother in a simpler way:

function extractMother(paragraph) {
  return between(paragraph, "(mother ", ")");
}

¶ The new, improved cat-algorithm looks like this:

function findCats() {
  var mailArchive = retrieveMails();
  var cats = {"Spot": catRecord("Spot", new Date(1997, 2, 5),
              "unknown")};

  function handleParagraph(paragraph) {
    if (startsWith(paragraph, "born"))
      addCats(cats, catNames(paragraph), extractDate(paragraph),
              extractMother(paragraph));
    else if (startsWith(paragraph, "died"))
      deadCats(cats, catNames(paragraph), extractDate(paragraph));
  }

  for (var mail = 0; mail < mailArchive.length; mail++) {
    var paragraphs = mailArchive[mail].split("\n");
    for (var i = 0; i < paragraphs.length; i++)
      handleParagraph(paragraphs[i]);
  }
  return cats;
}

var catData = findCats();

¶ Having that extra data allows us to finally have a clue about the cats aunt Emily talks about. A function like this could be useful:

function formatDate(date) {
  return date.getDate() + "/" + (date.getMonth() + 1) +
         "/" + date.getFullYear();
}

function catInfo(data, name) {
  if (!(name in data))
    return "No cat by the name of " + name + " is known.";

  var cat = data[name];
  var message = name + ", born " + formatDate(cat.birth) +
                " from mother " + cat.mother;
  if ("death" in cat)
    message += ", died " + formatDate(cat.death);
  return message + ".";
}

print(catInfo(catData, "Fat Igor"));

¶ The first return statement in catInfo is used as an escape hatch. If there is no data about the given cat, the rest of the function is meaningless, so we immediately return a value, which prevents the rest of the code from running.

¶ In the past, certain groups of programmers considered functions that contain multiple return statements sinful. The idea was that this made it hard to see which code was executed and which code was not. Other techniques, which will be discussed in chapter 5, have made the reasons behind this idea more or less obsolete, but you might still occasionally come across someone who will criticise the use of 'shortcut' return statements.

Ex. 4.8

¶ The formatDate function used by catInfo does not add a zero before the month and the day part when these are only one digit long. Write a new version that does this.

function formatDate(date) {
  function pad(number) {
    if (number < 10)
      return "0" + number;
    else
      return number;
  }
  return pad(date.getDate()) + "/" + pad(date.getMonth() + 1) +
             "/" + date.getFullYear();
}
print(formatDate(new Date(2000, 0, 1)));

Ex. 4.9

¶ Write a function oldestCat which, given an object containing cats as its argument, returns the name of the oldest living cat.

function oldestCat(data) {
  var oldest = null;

  for (var name in data) {
    var cat = data[name];
    if (!("death" in cat) &&
        (oldest == null || oldest.birth > cat.birth))
      oldest = cat;
  }

  if (oldest == null)
    return null;
  else
    return oldest.name;
}

print(oldestCat(catData));

¶ The condition in the if statement might seem a little intimidating. It can be read as 'only store the current cat in the variable oldest if it is not dead, and oldest is either null or a cat that was born after the current cat'.

¶ Note that this function returns null when there are no living cats in data. What does your solution do in that case?

¶ Now that we are familiar with arrays, I can show you something related. Whenever a function is called, a special variable named arguments is added to the environment in which the function body runs. This variable refers to an object that resembles an array. It has a property 0 for the first argument, 1 for the second, and so on for every argument the function was given. It also has a length property.

¶ This object is not a real array though, it does not have methods like push, and it does not automatically update its length property when you add something to it. Why not, I never really found out, but this is something one needs to be aware of.

function argumentCounter() {
  print("You gave me ", arguments.length, " arguments.");
}
argumentCounter("Death", "Famine", "Pestilence");

¶ Some functions can take any number of arguments, like print does. These typically loop over the values in the arguments object to do something with them. Others can take optional arguments which, when not given by the caller, get some sensible default value.

function add(number, howmuch) {
  if (arguments.length < 2)
    howmuch = 1;
  return number + howmuch;
}

show(add(6));
show(add(6, 4));

Ex. 4.10

¶ Extend the range function from exercise 4.2 to take a second, optional argument. If only one argument is given, it behaves as earlier and produces a range from 0 to the given number. If two arguments are given, the first indicates the start of the range, the second the end.

function range(start, end) {
  if (arguments.length < 2) {
    end = start;
    start = 0;
  }
  var result = [];
  for (var i = start; i <= end; i++)
    result.push(i);
  return result;
}

show(range(4));
show(range(2, 4));

¶ The optional argument does not work precisely like the one in the add example above. When it is not given, the first argument takes the role of end, and start becomes 0.

Ex. 4.11

¶ You may remember this line of code from the introduction:

print(sum(range(1, 10)));

¶ We have range now. All we need to make this line work is a sum function. This function takes an array of numbers, and returns their sum. Write it, it should be easy.

function sum(numbers) {
  var total = 0;
  for (var i = 0; i < numbers.length; i++)
    total += numbers[i];
  return total;
}

print(sum(range(1, 10)));

¶ Chapter 2 mentioned the functions Math.max and Math.min. With what you know now, you will notice that these are really the properties max and min of the object stored under the name Math. This is another role that objects can play: A warehouse holding a number of related values.

¶ There are quite a lot of values inside Math, if they would all have been placed directly into the global environment they would, as it is called, pollute it. The more names have been taken, the more likely one is to accidentally overwrite the value of some variable. For example, it is not a far shot to want to name something max.

¶ Most languages will stop you, or at least warn you, when you are defining a variable with a name that is already taken. Not JavaScript.

¶ In any case, one can find a whole outfit of mathematical functions and constants inside Math. All the trigonometric functions are there ― cos, sin, tan, acos, asin, atan. π and e, which are written with all capital letters (PI and E), which was, at one time, a fashionable way to indicate something is a constant. pow is a good replacement for the power functions we have been writing, it also accepts negative and fractional exponents. sqrt takes square roots. max and min can give the maximum or minimum of two values. round, floor, and ceil will round numbers to the closest whole number, the whole number below it, and the whole number above it respectively.

¶ There are a number of other values in Math, but this text is an introduction, not a reference. References are what you look at when you suspect something exists in the language, but need to find out what it is called or how it works exactly. Unfortunately, there is no one comprehensive complete reference for JavaScript. This is mostly because its current form is the result of a chaotic process of different browsers adding different extensions at different times. The ECMA standard document that was mentioned in the introduction provides a solid documentation of the basic language, but is more or less unreadable. For most things, your best bet is the Mozilla Developer Network.

¶ Maybe you already thought of a way to find out what is available in the Math object:

for (var name in Math)
  print(name);

¶ But alas, nothing appears. Similarly, when you do this:

for (var name in ["Huey", "Dewey", "Loui"])
  print(name);

¶ You only see 0, 1, and 2, not length, or push, or join, which are definitely also in there. Apparently, some properties of objects are hidden. There is a good reason for this: All objects have a few methods, for example toString, which converts the object into some kind of relevant string, and you do not want to see those when you are, for example, looking for the cats that you stored in the object.

¶ Why the properties of Math are hidden is unclear to me. Someone probably wanted it to be a mysterious kind of object.

¶ All properties your programs add to objects are visible. There is no way to make them hidden, which is unfortunate because, as we will see in chapter 8, it would be nice to be able to add methods to objects without having them show up in our for/in loops.

¶ Some properties are read-only, you can get their value but not change it. For example, the properties of a string value are all read-only.

¶ Other properties can be 'active'. Changing them causes things to happen. For example, lowering the length of an array causes excess elements to be discarded:

var array = ["Heaven", "Earth", "Man"];
array.length = 2;
show(array);

There are a few subtle problems with this approach, which will be discussed and solved in chapter 8. For this chapter, it works well enough.