Chapter 20
Node.js

A student asked ‘The programmers of old used only simple machines and no programming languages, yet they made beautiful programs. Why do we use complicated machines and programming languages?’. Fu-Tzu replied ‘The builders of old used only sticks and clay, yet they made beautiful huts.’

Master Yuan-Ma, The Book of Programming

So far, we have learned the JavaScript language, and used it within a single environment, the browser. The following two chapters will briefly introduce you to Node.js, which allows you to apply your JavaScript skills outside of the browser, allowing you to build anything from simple command-line tools to dynamic HTTP servers.

These chapters aim to teach you the important ideas that Node.js builds on and give you enough information to write some useful programs for it. They do not try to be a complete, or even a thorough, treatment of Node.

If you want to follow along and run the code in this chapter, start by going to nodejs.org, and following the installation instructions for your operating system. Also refer to that website for further documentation on Node and its built-in modules.

Whereas you could run the code in previous chapters directly on these pages, since it was either raw JavaScript or written for the browser, the code samples in this chapter are written for Node, and won’t run in the browser.

Background

One of the more difficult problems in writing systems that communicate over the network is managing in- and output—that is, the reading and writing of data to and from the network, the hard disk, and other such devices. Such transfers tend to take time, and scheduling them cleverly can make a huge difference in how quickly a system can respond to requests.

The traditional way to do in- and output is to have a function, for example readFile, start reading the file and return only when the file has been fully read. This is called synchronous I/O (I/O stands for input/output).

Node was initially conceived for the purpose of making asynchronous I/O easy and convenient. We have seen that a browser’s XMLHttpRequest interface, discussed in Chapter 17, supported both a synchronous mode, where the script would stop until the result came in, and an asynchronous mode, where the script continued running while the request was in progress, and a function was called later, when the request finished.

That latter model, in which the program can continue to do other things while the operation is in progress, is the way Node does all its I/O. The interfaces it provides for this are based around callback functions, just like XMLHttpRequest.

JavaScript lends itself well for a system like Node. It is one of the few programming languages that does not have a built-in way to do I/O, which made it easy to integrate it with Node’s rather odd approach. In 2009, when Node was being designed, people were already doing callback-based I/O in the browser, so the community around the language was already used to that programming style.

Asynchronicity

I’ll try to illustrate synchronous versus asynchronous I/O with a small example, where a program needs to fetch two resources from the Internet, and then do some simple processing with the result.

In a synchronous environment, the obvious way to do this is to make the requests one after the other. This has the drawback that the second request will only be initiated when the first has finished, meaning that the total time taken is at least the sum of the two response times. This is not a very effective use of the machine, which will be mostly idle when it is transmitting and receiving data over the network.

In a synchronous system, the solution to this problem is to start additional threads of control (refer back to Chapter 14 for a previous discussion of threads). A second thread could start the second request, and then both threads wait for their results to come back, after which they somehow re-synchronize to combine their results.

In the following diagram, the thick lines represents time the program spends running normally, whereas the thin lines represent time spent waiting for I/O. In the synchronous model, the time taken by I/O is part of the timeline for a given thread of control. In the asynchronous model, initiating an I/O action conceptually causes a split in the timeline. The thread that initiated the I/O continues running, and the I/O itself is done alongside it, finally calling a callback function when it is finished.

Control flow for synchronous and asynchronous I/O

Another way to express this difference is that waiting for I/O to finish is implicit in the synchronous model, whereas is it is explicit, directly under our control, in the asynchronous one. This cuts both ways. Asynchronicity makes it easier to express programs that do not fit the straight-line model of control, but it also makes it more awkward to express the programs that do.

In Chapter 17, I already touched on the fact that all those callbacks do add quite a lot of noise and indirection to a program. Whether this style of asynchronicity is a good idea in general is debatable. In any case, it will look bizarre at first, and takes some getting used to.

But for a JavaScript-based system, I would argue that callback-style asynchronicity is a sensible choice. One of the strengths of JavaScript is its simplicity, and trying to add multiple threads of control to it would add a lot of complexity. Though they don’t tend to lead to simple code, callbacks as a concept are pleasantly simple, but powerful enough to write high-performance Web servers.

The node command

When Node.js is installed on a system, it provides a program called node, which is used to run JavaScript files. Say you have a file hello.js, containing this code:

var message = "Hello world";
console.log(message);

You can then run node from the command line like this to execute the program:

# node hello.js
Hello world

The console.log method in Node does something similar to what it does in the browser. It prints out a piece of text. But in Node, the text will go to the process’ standard output stream, rather than to a browser’s JavaScript console.

If you run node without giving it a file, it provides you with a prompt at which you can type JavaScript code, and immediately see the result.

# node
> 1 + 1
2
> [-1, -2, -3].map(Math.abs)
[1, 2, 3]
> process.exit(0)
#

The process variable, just like the console variable, is available globally in Node. It provides various ways to inspect and manipulate the current program. The exit method ends the process, and can be given an exit status code, which tells the program that started node (in this case, the command line shell) whether the program completed successfully (code zero) or encountered an error (any other code).

To find the command line arguments given to your script, you can read process.argv, which is an array of strings. Note that it also includes the name of the node commands and your script name, so the actual arguments start at index 2. If showargv.js simply contains the statement console.log(process.argv), you could run it like this:

# node showargv.js one --and two
["node", "/home/marijn/showargv.js", "one", "--and", "two"]

All the standard JavaScript global variables, like Array, Math, and JSON, are also present in Node’s environment. Browser-related functionality, like document and alert, is absent.

The global scope object, which is called window in the browser, has the more sensible name global in Node.

Modules

Beyond the few variables I mentioned, such as console and process, Node puts very little functionality in the global scope. If you want to access other built-in functionality, you have to ask the module system for it.

The CommonJS module system, based on the require function, was described in Chapter 10. This system is built into Node, and is used to load anything from built-in modules to downloaded libraries to files that are part of your own program.

When require is called, Node has to resolve the given string to an actual file to load. Path names that start with "/", "./", or "../" are resolved relative to the current module’s path, where "./" stands for the current directory, "../" for one directory up, and "/" for the root of the file system. So if you ask for "./world/world" from the file /home/marijn/elife/run.js, Node will try to load the file /home/marijn/elife/world/world.js. The .js extension may be omitted.

When a string that does not look like a relative or absolute path is given to require, it is assumed to refer to either a built-in module, or a module installed in a node_modules directory. For example, require("fs") will give you Node’s built-in filesystem module, and require("elife") will try to load the library found in node_modules/elife/. A common way to install such libraries is by using NPM, which I will discuss in a moment.

To illustrate the use of require, let’s set up a simple project consisting of two files. The first one is called main.js, which defines a script that can be called from the command line to garble a string.

var garble = require("./garble");

// Index 2 holds the first actual command line argument
var argument = process.argv[2];

console.log(garble(argument));

The file garble.js defines a library for gabling strings, which can be used both by the command line tool defined above, and by other scripts that need direct access to a garbling function.

module.exports = function(string) {
  return string.split("").map(function(ch) {
    return String.fromCharCode(ch.charCodeAt(0) + 5);
  }).join("");
};

Remember that replacing module.exports, rather than adding properties to it, allows us to export a specific value from a module. In this case, we make the result of requiring our garble file the garbling function itself.

The function splits the string it is given into single characters by splitting on the empty string, and then replaces each character with the character whose code is five points higher. Finally, it joins the result together again into a string.

We can now call our tool like this:

# node main.js JavaScript
Of{fXhwnuy

Installing with NPM

NPM, which was briefly discussed in Chapter 10, is an online repository of JavaScript modules, many of which are specifically written for Node. When you install Node on your computer, you also get a program called npm, which provides a convenient interface to this repository.

For example, one module you will find on NPM is figlet, which can convert text into “ASCII art”, drawings made out of text characters. The transcript below shows how to install and use it:

# npm install figlet
npm GET https://registry.npmjs.org/figlet
npm 200 https://registry.npmjs.org/figlet
npm GET https://registry.npmjs.org/figlet/-/figlet-1.0.9.tgz
npm 200 https://registry.npmjs.org/figlet/-/figlet-1.0.9.tgz
figlet@1.0.9 node_modules/figlet
# node
> var figlet = require("figlet");
> figlet.text("Hello world!", function(error, data) {
    if (error) {
      console.error(error);
      process.exit(1);
    }
    console.log(data);
  });
  _   _      _ _                            _     _ _
 | | | | ___| | | ___   __      _____  _ __| | __| | |
 | |_| |/ _ \ | |/ _ \  \ \ /\ / / _ \| '__| |/ _` | |
 |  _  |  __/ | | (_) |  \ V  V / (_) | |  | | (_| |_|
 |_| |_|\___|_|_|\___/    \_/\_/ \___/|_|  |_|\__,_(_)

After running npm install, NPM will have created a directory node_modules, and put a figlet directory inside of that which contains the library itself. When we run node and call require("figlet"), this library is loaded, and we can call its text method to draw some big letters.

Somewhat unexpectedly perhaps, instead of simply returning the string that makes up the big letters, figlet.text takes a callback function that it passes its result to, along with another argument, error, which will hold an error object when something went wrong, or null when everything went well.

This is a common pattern in Node code. Rendering something with figlet requires the library to read a file from disk that defines the way the letters look. Since reading from disk is an asynchronous operation in Node, that means that figlet.text can’t immediately return its result. Asynchronicity is “infectuous”, in a way—every function that calls an asynchronous function must itself become asynchronous.

There is much more to NPM than npm install. It reads package.json files, which contain JSON-encoded information about a program or library, such as which other libraries it depends on. Doing npm install in a directory that contains such a file will automatically install all dependencies, as well as their dependencies. The npm tool is also used to publish libraries to the online repository, so that other people can find, download, and use them.

This book won’t delve further into the details of NPM usage. Refer to npmjs.org for further documentation, and for an easy way to search for libraries.

The file system module

One of the most commonly used built-in modules that come with Node is the "fs" module, which stands for “file system”. This module provides a number of functions for working with files and directories.

A useful function from this module is readFile, which reads a file and calls a callback with the file’s contents when it is done.

var fs = require("fs");
fs.readFile("file.txt", "utf8", function(error, text) {
  if (error)
    throw error;
  console.log("The file contained:", text);
});

The second argument to readFile indicates the character encoding used to decode the file into a string. There are several ways in which text can be encoded to binary data, but most modern systems use UTF-8 to encode text, so unless you have reasons to believe another encoding is used, passing "utf8" when reading a text file is a safe bet. If you do not pass an encoding, Node will assume you are interested in the binary data itself, and will give you a Buffer object instead of a string. This is an array-like object that contains numbers representing the bytes in the files.

var fs = require("fs");
fs.readFile("file.txt", function(error, buffer) {
  if (error)
    throw error;
  console.log("The file contained", buffer.length, "bytes.",
              "The first byte is:", buffer[0]);
});

A similar function, writeFile is used to write a file to disk.

var fs = require("fs");
fs.writeFile("graffiti.txt", "Node was here", function(err) {
  if (err)
    console.log("Failed to write file:", err);
  else
    console.log("File written.");
});

Here it was not necessary to specify the encoding, since writeFile will assume that if it is given a string to write, rather than a Buffer object, it should write it out as text using its default text encoding, which is UTF-8.

The "fs" module contains many other useful functions, such as readdir, which will return the files in a directory as an array of strings, exists, which checks whether a file exists, rename to rename a file, unlink to remove one, and so on. See the documentation at nodejs.org for specifics.

Many of the functions in "fs" come in two variants, a synchronous one and an asynchronous one. For example, there is a synchronous version of readFile which is called readFileSync.

var fs = require("fs");
console.log(fs.readFileSync("file.txt", "utf8"));

These require less ceremony to use, and can be useful in simple scripts, where the extra speed provided by asynchronous I/O is not important. But note that, while such a synchronous operation is being performed, your program will be stopped entirely. If it should be responding to the user or to other machines on the network, being stuck on synchronous I/O might cause undesirable delays.

The HTTP module

Another central module is called "http", which provides functionality for running HTTP servers and making HTTP requests.

This is all it takes to start a simple HTTP server:

var http = require("http");
var server = http.createServer(function(request, response) {
  response.writeHead(200, {"Content-Type": "text/html"});
  response.write("<h1>Hello!</h1><p>You asked for <code>" +
                 request.url + "</code></p>");
  response.end();
});
server.listen(8000);

If you run this script on your own machine, you can point your web browser at http://localhost:8000/hello to make a request to your server. It will respond with a very small HTML page.

The function passed as argument to createServer is called every time a client tries to connect to the server. The request and response variables are objects representing the incoming and outgoing data. The first contains information about the request that came in, for example its url property tells us to what URL the request was made.

To send something back, you call methods on the response object. The first, writeHead, will write out the response headers (see Chapter 17). You give it the status code (200 for “OK” in this case), and an object that contains header values. Here we tell the client that we will be sending back an HTML document.

Next, the actual response body, the document itself, is sent with response.write. You are allowed to call this method multiple times, if you want to send the response piece by piece (possibly streaming data to the client as it becomes available). Finally, response.end signals the end of the response.

The call to server.listen causes the server to start waiting for connections on port 8000. This is the reason you have to connect to localhost:8000, rather than just localhost (which would use the default port, 80), to speak to this server.

To stop running a Node script like this, which doesn’t finish automatically because it is waiting for further events (in this case, network connections), press Control-C.

A real webserver would probably need to do a lot of other things, such as analyze the URL to figure out which resource the request is interested in, and actually looking at the request’s method (the method property) to see what action the client is trying to perform.

To act as an HTTP client, we can use the request function in the "http" module.

var http = require("http");
var request = http.request({
  hostname: "eloquentjavascript.net",
  path: "/20_node.html",
  method: "GET",
  headers: {Accept: "text/html"}
}, function(response) {
  console.log("Server responded with status code",
              response.statusCode);
});
request.end();

The first argument to request configures the request, telling Node what server to talk to, what path to request from that server, which method to use, and so on. The second argument is the function that should be called when a response comes in. It is given an object that allows us to inspect the response, for example to find out its status code.

The object returned by request, just like the response object we saw in the server, allows us to stream data into the request with the write method, and finish the request with the end method. The example does not use write, because requests using the GET method should not contain data in their request body.

To make requests to HTTPS (secure HTTP) URLs, Node provides a package https, which contains its own request function, which behaves in a similar way.

Streams

We have seen two examples of writable streams in the HTTP examples—namely the response object that the server could write to, and the request object that was returned from http.request. These are a widely used concept in Node interfaces. All writable streams have a write method, which can be passed a string or a Buffer object, and an end method, which can also optionally be passed a piece of data, which it will write out before closing the stream.

Both of these methods can also be given a callback, as an additional argument, which they will call when the writing to or closing of the stream has finished.

It is possible to create a writable stream that points at a file with the fs.createWriteStream function, and then use the write method on the resulting object to write the file one piece at a time, rather than in one shot, as with fs.writeFile.

Readable streams are a little more involved. Both the request variable that was passed to the HTTP server’s callback function and the response variable passed to the HTTP client are readable streams (a server reads requests and then writes responses, whereas a client first writes a request and then reads a response). Reading from a stream is done using event handlers, rather than simple methods.

Object that emit events in Node have a method called on that is very similar to the addEventListener method in the browser. You give it an event name and then a function, and it will register that function to be called whenever the given event occurs.

Readable streams have "data" and "end" events. The first is fired every time some data comes in, and the second is called whenever the stream is at its end. This model is most suited for “streaming” data, which can be immediately processed, even when the whole document isn’t available yet. In the case where you want to see the whole document before you start to do something with it, you have to listen for "data" events and collect their content, and then use the built-up document when the "end" event occurs.

A file can be read as a readable stream by using the fs.createReadStream function.

Let’s set up a server that reads request bodies, and streams them back to the client as all-uppercase text.

var http = require("http");
http.createServer(function(request, response) {
  response.writeHead(200, {"Content-Type": "text/plain"});
  request.on("data", function(chunk) {
    response.write(chunk.toString().toUpperCase());
  });
  request.on("end", function() {
    response.end();
  });
}).listen(8000);

The chunk variable passed to the data handler will be a binary Buffer, which we can convert to a string by calling toString on it, which will decode it using the default encoding (UTF-8).

The following piece of code, if run while the upcasing server is running, will send a request to it and write out the response it gets.

var http = require("http");
var request = http.request({
  hostname: "localhost",
  port: 8000,
  method: "POST"
}, function(response) {
  response.on("data", function(chunk) {
    process.stdout.write(chunk.toString());
  });
});
request.end("Hello server");

The example writes to process.stdout (the process’s standard output, as a writable stream) instead of using console.log because the latter terminates the line after each call, adding a newline character, and in this case, if the response happens to arrive as more than one chunk and thus cause multiple calls to the "data" handler, using console.log would cause undesirable line breaks in the output.

A simple file server

Let’s combine our newfound knowledge about HTTP servers and talking to the file system, and create a bridge between them: an HTTP server that allows remote access to a file system. Such a server has many uses—it could be used by a web application to store data, or even to provide direct access to some shared files, for example to allow users to edit their website remotely.

HTTP resources can cleanly be represented as files. The HTTP verbs GET, PUT, and DELETE can be used to read, write, and delete files, respectively. We will interpret the path from the request URL as the path of the file that the request refers to.

Because we probably don’t want to share our whole file system, we’ll interpret these paths as starting in the server’s working directory, which is the directory in which it was started. If I ran the server from /home/marijn/public/ (or c:\Users\marijn\public\ on Windows), then a request for /file.txt should refer to /home/marijn/public/file.txt (or c:\Users\marijn\public\file.txt).

We’ll build the program in small pieces again, using an object (methods) to store the functions that handle the various HTTP methods.

var http = require("http"), fs = require("fs");

var methods = Object.create(null);

http.createServer(function(request, response) {
  function respond(code, body, type) {
    if (!type) type = "text/plain";
    response.writeHead(code, {"Content-Type": type});
    if (body && body.pipe)
      body.pipe(response);
    else
      response.end(body);
  }
  if (request.method in methods)
    methods[request.method](urlToPath(request.url),
                            respond, request);
  else
    respond(405, "Method " + request.method +
            " not allowed.");
}).listen(8000);

function urlToPath(url) {
  var path = require("url").parse(url).pathname;
  return "." + decodeURIComponent(path);
}

This starts a server that just returns 405 error responses (which is the code used to indicate that a given method isn’t understood by the server).

The respond function is passed to the functions that handle the various methods, and acts as a callback to finish the request. It takes an HTTP status code, a body, and optionally a content type. If the value passed as the body is a readable stream, it will have a pipe method, which is used to forward a readable stream to a writeable stream. If not, it is assumed to be either null (no body) or a string, and is passed directly to the response’s end method.

To get a path from the URL in the request, the urlToPath function uses Node’s built-in "url" module to parse the URL. It takes its pathname, which will be something like /file.txt, decodes that to get rid of the %20-style escape codes, and prefixes a single dot to produce a path relative to the current directory.

(If you are worried about the security of the urlToPath function, you are right. We will come back to it in the exercises.)

We will set up the GET method to return a list of files when reading a directory, and the file’s content when reading a regular file.

One tricky question is what kind of Content-Type header we should add when returning a file’s content. Since these files could be anything, our server can’t simply return the same type for all of them. But NPM comes to the rescue. The mime package (content type indicators like text/plain are also called MIME types) knows the correct type for a huge amount of file extensions.

If you do this in the directory where the server script lives, you’ll be able to require("mime") to get access to the library.

# npm install mime
npm http GET https://registry.npmjs.org/mime
npm http 304 https://registry.npmjs.org/mime
mime@1.2.11 node_modules/mime

When a requested file does not exist, the correct HTTP error code to return is 404. We will use fs.stat, which looks up information on a file, to find out both whether the file exists and whether it is a directory.

methods.GET = function(path, respond) {
  fs.stat(path, function(error, stats) {
    if (error && error.code == "ENOENT")
      respond(404, "File not found");
    else if (error)
      respond(500, error.toString());
    else if (stats.isDirectory())
      fs.readdir(path, function(error, files) {
        if (error)
          respond(500, error.toString());
        else
          respond(200, files.join("\n"));
      });
    else
      respond(200, fs.createReadStream(path),
              require("mime").lookup(path));
  });
};

Because it has to touch the disk, and thus might take a while, fs.stat is asynchronous. When the file does not exist, fs.stat will pass an error object with a code property of "ENOENT" to its callback. It would have been nice if Node defined different sub-types of Error for different types of error, but it doesn’t, and instead just puts obscure, Unix-inspired codes in there.

We are going to report any errors we didn’t expect with status code 500, which indicates that the problem exists in the server, as opposed to codes starting with 4 (such as 404) which refer to bad requests. There are some situations in which this is not entirely accurate, but for a small example program like this, it will have to be good enough.

The stats object returned by fs.stat tells us a number of things about a file, such as its size (size property) and its modification date (mtime property). Here we are interested in the question of whether it is a directory or a regular file, which the isDirectory method tells us.

We use fs.readdir to read the list of files in a directory, and, in yet another callback, return it to the user. For normal files, we create a readable stream with fs.createReadStream, and pass it to respond, along with the content type that the "mime" module gives us for the file’s name.

The code to handle DELETE requests is slightly simpler.

methods.DELETE = function(path, respond) {
  fs.stat(path, function(error, stats) {
    if (error && error.code == "ENOENT")
      respond(204);
    else if (error)
      respond(500, error.toString());
    else if (stats.isDirectory())
      fs.rmdir(path, respondErrorOrNothing(respond));
    else
      fs.unlink(path, respondErrorOrNothing(respond));
  });
};

function respondErrorOrNothing(respond) {
  return function(error) {
    if (error)
      respond(500, error.toString());
    else
      respond(204);
  };
}

When an HTTP response does not contain any data, the status code 204 (“no content”) can be used to indicate this. Since we need to provide callbacks that either report an error or return a 204 response in a few different situations, I wrote a respondErrorOrNothing function that creates such a callback.

You may be wondering why trying to delete a non-existent file returns a 204 status, rather than an error. When the file that is being deleted is not there, you could say that the request’s objective is already fulfilled. The HTTP standard encourages people to make requests idempotent, which means that applying them multiple times does not produce a different result.

And finally, here is the handler for PUT requests:

methods.PUT = function(path, respond, request) {
  var outStream = fs.createWriteStream(path);
  outStream.on("error", function(error) {
    respond(500, error.toString());
  });
  outStream.on("finish", function() {
    respond(204);
  });
  request.pipe(outStream);
};

Here we don’t need to check whether the file exists—if it does, we’ll just overwrite it. We again use pipe to move data from a readable stream to a writable one, in this case form the request to the file. If creating the stream fails, an "error" event is raised on it, which we report in our response. When the data is transferred successfully, pipe will close both streams, which will cause a "finish" event to be fired on the writable stream. When that happens, we can report success to the client with a 204 response.

The full script for the server eloquentjavascript.net/code/file_server.js. You can download that and run it with Node to run your own file server. And of course, you can modify and extend it, to solve this chapter’s exercises or to experiment.

The command-line tool curl, widely available on Unix-like systems, can be used to make HTTP requests. The following session briefly tests our server. Note that -X is used to set the request’s method, and -d to include a request body.

# curl http://localhost:8000/file.txt
File not found
# curl -X PUT -d hello http://localhost:8000/file.txt
# curl http://localhost:8000/file.txt
hello
# curl -X DELETE http://localhost:8000/file.txt
# curl http://localhost:8000/file.txt
File not found

The first request for file.txt fails, since the file does not exist yet. The PUT request creates the file, and behold, the next request successfully retrieves it. After deleting it with a DELETE request it can no longer be found, as you would expect.

Error handling

In the code for the file server, there are six places where we are explicitly routing exceptions that we don’t know how to handle to error responses. Because exceptions aren’t automatically propagated to callbacks, but rather passed to them as arguments, they have to be handled explicitly every time. This completely defeats the advantage of exception handling, namely the ability to centralize the handling of exceptions.

What happens when something actually throws an exception in this system? Since we are not using any try blocks, the exception will propagate to the top of the call stack. In Node, that causes the program to be aborted, and information about the exception (including a stack trace) is written to the program’s standard error stream.

This means that our server will crash whenever a problem is encountered in the server’s code itself, as opposed to asynchronous problems which will be passed as arguments to the callbacks. If we wanted to handle all exceptions raised during the handling of a request, to make sure we send back a response, we would have to add try/catch blocks to every callback.

This is hardly workable. Many Node programs are written to make as little use of exceptions as possible, with the assumption that if an exception is raised, it is not something the program can handle, and crashing is the right response.

Another approach is to use promises, which were introduced in Chapter 17. Those catch exceptions raised by the callback functions they call, and propagate exceptions raised as failures. It is possible to load a promise library in Node, and use that to manage your asynchronous control. Very few Node libraries integrate promises, but it is often trivial to wrap them. The excellent "promise" module from NPM contains a function denodeify, which takes an asynchronous function like fs.readFile, and converts it to a promise-returning function.

var Promise = require("promise");
var fs = require("fs");

var readFile = Promise.denodeify(fs.readFile);
readFile("file.txt", "utf8").then(function(content) {
  console.log("The file contained: " + content);
}, function(error) {
  console.log("Failed to read file: " + error);
});

For comparison, I’ve written another version of the file server based on promises, which you can find at eloquentjavascript.net/code/file_server_promises.js. It is slightly cleaner, because functions can now return their results, rather than having to call callbacks, and the routing of exceptions is implicit, rather than explicit.

I’ll list a few lines from promise-based file server, to illustrate the difference in the style of programming.

The fsp object contains promise-style variants of a number of fs functions, wrapped by Promise.denodeify. The object returned from the callback, with code and body properties, will become the final result of the chain of promises, and be used to determine what kind of response to send to the client.

methods.GET = function(path) {
  return inspectPath(path).then(function(stats) {
    if (!stats) // Does not exist
      return {code: 404, body: "File not found"};
    else if (stats.isDirectory())
      return fsp.readdir(path).then(function(files) {
        return {code: 200, body: files.join("\n")};
      });
    else
      return {code: 200,
              type: require("mime").lookup(path),
              body: fs.createReadStream(path)};
  });
};

function inspectPath(path) {
  return fsp.stat(path).then(null, function(error) {
    if (error.code == "ENOENT") return null;
    else throw error;
  });
}

The inspectPath function is a simple wrapper around fs.stat, which handles the case where the file is not found, replacing the failure with a success that yields null in that case. All other errors are allowed to propagate implicitly. When the promise that is returned from these handlers fails, the HTTP server responds with a 500 status code.

Summary

Node is a nice, straightforward system that lets us run JavaScript in a non-browser context. It was originally designed for network tasks, to play the role of a “node” in a network. But it lends itself to all kinds of scripting tasks, and if writing JavaScript is something you enjoy, automating everyday tasks with Node is an attractive option.

That option is made more attractive by the fact that NPM provides libraries for everything you can think of (and quite a few things you’d probably never think of) at the tip of your fingertips. The npm command can be used to install modules.

Node also comes with a number of built-in modules, among which the "fs" module, which contains functionality related to the file system, and the "http" module, which gives you tools for running HTTP servers and making HTTP requests.

All I/O in Node is done asynchronously, unless you explicitly use a synchronous variant of a function, such as fs.readFileSync. You pass callback functions, and Node will call them at the appropriate time, when the I/O you asked for is completed.

Exercises

Content negotiation, again

In Chapter 17, the first exercise was to make several requests to eloquentjavascript.net/author, asking for different types of content by passing different Accept headers.

Do this again, using Node’s http.request function. Ask for at least the media types text/plain, text/html, and application/json. Remember that headers to a request can be given as an object, in the headers property of http.request’s first argument.

Write out the content of the responses to each request.

Don’t forget to call the end method on the object returned by http.request, in order to actually fire off the request.

The response object passed to http.request’s callback is a readable stream. This means that it is not entirely trivial to get the whole response body out of it. The following utility function reads a whole stream and calls a callback function with the result, using the usual pattern of passing any errors it encounters as first argument to the callback.

function readStreamAsString(stream, callback) {
  var data = "";
  stream.on("data", function(chunk) {
    data += chunk;
  });
  stream.on("end", function() {
    callback(null, data);
  });
  stream.on("error", function(error) {
    callback(error);
  });
}

Fixing a leak

For easy remote access to some files, I might get into the habit of having the file server defined in this chapter running on my machine, in the /home/marijn/public directory. Then, one day, I find that someone has gained access to all the passwords I stored in my browser.

What happened?

If it isn’t clear to you yet, think back to the urlToPath function, defined like this:

function urlToPath(url) {
  var path = require("url").parse(url).pathname;
  return "." + decodeURIComponent(path);
}

Now consider the fact that paths passed to the "fs" functions can be relative—they may contain "../" to go up a directory. What happens when a client sends requests to URLs like the ones below?

http://myhostname:8000/../.config/config/google-chrome/Default/Web%20Data
http://myhostname:8000/../.ssh/id_dsa
http://myhostname:8000/../../../etc/passwd

Change urlToPath to fix this problem. Take into account the fact that Node on Windows allows both forward slashes and backslashes to separate directories.

Also, meditate on the fact that as soon as you expose some half-baked system on the Internet, the bugs in that system can often be used to do bad things to the machine the it is running on.

It is enough to strip out all occurrences of two dots which have a one of a slash, backslash, or the end of the string on both sided. Using the replace method with a regular expression is the easiest way to do this. Do not forget the g flag on the expression, or replace will only replace a single instance, and people could still get around this safety measure by including additional double dots in their paths! Also make sure you do the replace after decoding the string, or it would be possible to foil the check by encoding a dot or a slash.

Another potentially worrying case is paths starting with a slash, which interpreted as absolute paths. But because urlToPath puts a dot character in front of the path, it is impossible to create requests that result in such a path. Multiple slashes in a row, inside of the path, are odd, but will be treated as a single slash by the file system.

Creating directories

Though the DELETE method is wired up to delete directories (using fs.rmdir) when applied to one, the file server currently does not provide any way to create a directory.

Add support for a method MKCOL, which should create a directory by calling fs.mkdir. MKCOL is not one of the basic HTTP methods, but it does exist, for this same purpose, in the WebDAV standard, which specifies a set of extensions to HTTP that make it suitable for writing resources, not just reading them.

You can use the function that implements the DELETE method as a blueprint for methods.MKCOL. When no file is found, try to create a directory with fs.mkdir. When a directory exists at that path, you can return a 204 response, so that directory creation requests are idempotent. If a non-directory file exists here, return an error code. 400 (“bad request”) would be appropriate here.

A public space on the web

Since the file server serves up any kind of files, and even includes the right Content-Type header, you can use it to serve a website. Since it allows everybody to delete and replace files, it would be an interesting kind of website: one that can be modified, vandalized, and destroyed by everybody who takes the time to create the right HTTP request. Still, it would be a website.

Write a simple HTML page, which includes a simple JavaScript file, put them in a directory served by the file server, and open them in your browser.

Next, as an advanced exercise, or even a weekend project, combine all the knowledge you gained from this book to build a more user-friendly interface for modifying the website, inside of the website itself.

Include HTML forms (Chapter 18) to edit the content of the files that make up the website, allowing the user to update them on the server (using HTTP request as described in Chapter 17).

Start by making only a single file editable. Then try to extend the code to allow the user to select a file to edit, using the fact that our file server returns lists of files when reading a directory.

Don’t work directly in the code on the file server, since if you make a mistake, you are likely to damage the files there. Instead, keep you work outside of the publicly accessible directory, and copy it in to test it.

If your computer is directly connected to the internet, without a firewall, router, or other interfering device in between, you might be able to invite a friend to use your website. To check, go to whatismyip.com, copy the IP address it gives you into the address bar of your browser, and add :8000 after it to select the right port. If that brings you to your site, it is online for everybody to see.

You can create a <textarea> element to hold the content of the file that is being edited. A GET request, using XMLHttpRequest, can be used to get the current content of the file. You can use relative URLs like index.html, instead of http://localhost:8000/index.html, to refer to files on the same server as the running script.

Then, when the user clicks a button (you can use a <form> element and "submit" event, or simply a "click" handler), make a PUT request to the same URL, with the content of the <textarea> as request body, to save the file.

You can then add a <select> element that contains all the files in the server’s root directory, by adding <option> elements containing the lines returned by a GET request to the URL /. When the user selects another file (a "change" event on the field), the script must fetch and display that file. Also make sure that when saving a file, you use the currently selected file name.

Unfortunately, the server is too simplistic to be able to reliably read files from subdirectories, since it does not tell us whether the thing we fetched with a GET request is a regular file or a directory. Can you think of a way to extend the server to address this?