Nonsense

Why would I use a Webpack?

Question: I am having a hard time grasping why I would use Webpack, and what it is really for. […] It seems like I can't just drop a script in my page anymore.

Loading JavaScript

Instead of focusing just on the particularities of Webpack, I think it's more interesting and useful to extend the question to all JavaScript Packers. Frequently called bundlers, there are a number of tools, such as Webpack, Rollup, Browserify, Parcel, JSPM, and a lot more, which offer a similar function. It can also be achieved, to a fair extent, with custom scripts.

In any case, let us be clear that you sure can just drop a script tag in your page. It's a totally valid solution and it is, in fact, still used in certain situations.

That said, let's try understand what “the problem” is and walk all the way to what a generic solution (such as Browserify, Webpack, etc). Notice, though, that some of these tools can also process other types of assets (CSS, images, fonts…) but we will only focus on JavaScript.

Origins

So you have a single script for your page. We could imagine that the whole script may be, say, 500 lines of JS. Life's mostly ok, the script is manageable, not excessively big. But, as time goes by, the project grows and you start adding more functionalities to the page (or application). It may also happen that your company hires some more developers to work with you on that JavaScript.

You might just continue with your current configuration. A single script file and you (and maybe others) start adding functionality to it. You're bound to encounter 2 main problems:

  • First, your single script file will grow. It will get bigger with each functionality you add. This is bad because large files are harder to maintain, it is clumsier to find particular sections in them and it's also harder to have many developers working on the same file at the same time.
  • Second, you might want to use some external libraries. This is not a problem in itself, but in a moment we'll see how it may add up to the whole situation.

Splitting code

With this problem in mind, we obviously decide to split this large file into several smaller files. This is not only because of the large file problem, of course, it is also a good practice to separate things into independent parts to keep our code better organized and clean. So we might want to separate functions that deal with DOM manipulation from others that deal with XHR calls or data processing, etc.

Say we originally had reached some 10kLOCs and we quickly break the file into smaller pieces and we get, e.g., 20 smaller, 500LOC files. Not great but definitely better.

But how do we manage these files? The simplest approach is, well, simply drop 20 script tags in our page. Well, or maybe 20 plus some other 5-6 libraries we've added in the process. (Say jQuery and a couple of plug-ins, or whatever you may think of.)

There's one thing we have to take into account, though: order. We have to be careful to insert these tags in a particularly chosen order because, of course, we have to make sure that things (some utility function, for example) are available before other parts try to use them.

Now life is better in some aspects. We have smaller files which helps a lot. But not everything is nice. We've added these other concerns we have to take care of. And the problem we've added is not only the order of those 30 <script> tags. There's also the fact that when developers add new functions, they have to think where they put them. In what file should they put it so that it is available when needed? Will it be better to add one more file and one more <script> tag?

Also, now that we've learned about separating concerns, we seem to be producing a much larger number of smaller files and we probably want to keep them organized into folders and such.

We quickly reach 50, 80, a hundred JavaScript files. We can see the code is much better now but we can also see these new problems.

New problems

More problems appear. We have the number of <script> tags, the order we have to maintain… and now something else: Those files define a lot of names. Names of functions, names of variables. And when you're editing a file you don't really see what names other files have already used. Collisions appear. Development becomes a tedious task of remembering a lot of used names or finding some convention or… Maybe we can look for a solution for this?

So we reach out (to our own knowledge, some book, tutorial, ask someone on reddit, whatever) and we discover “Module Patterns” (probably the “Revealing Module Pattern”). I'm not going to explain this pattern. If you need to, you can read about it all over the web, but suffice to say that it is a structure like so…

    let something = (function() {
        // "so called private" code here
        // ...

        // and then...
        return {
            publicOne: ...,
            publicTwo: ...
        };
    })();

…which basically provides you, through a closure, with some encapsulation. The thing returned and assigned to something has some visible methods and/or properties, and those methods have access to the local stuff defined inside the function expression which no one else has access to. So, to some extent, it is a structure that allows you to write some encapsulated blocks with “private” visibility. Why this is good for your problem?

It avoids name collisions.

What you do is in each of those 80 or now 100 files you have, you create this structure, engulfing all the content of the file. And at the end you only return the things you really want to be visible.

This is a huge gain, because you can now split the files as small as you want without concern for name collisions. On the other hand you now have 180 <script> tags or a two hundred. And that thing with the order, ouch. Life's not nice at all in that front.

The first script

Let's recap a bit:

  1. We've successfully solved the problem of having a single huge file. It was a problem, because it was really huge and because you needed to have various people working on it at the same time and that's nasty.
  2. But you now have these problems:
    1. You have 220 <script> tags in your page.
    2. They need to be kept in a certain order.
    3. Oh, and some people have been complaining that it takes a lot longer having to load 230 different files. We'll have to deal with this sooner or later. Just keep it in mind.

Wanting to progress and solve problems, we try to solve these new problems.

And there's a really simple solution. It won't solve everything, but it is simple and it is helpful: We could have a shell script or some similar tool that simply concatenates all the script files in the correct order. That way, the first problem is clearly solved. Our sources are now in 273 files, but the script included in the page is back to being just this one file. So just one <script> tag again.

This is great progress, for sure. But the second problem remains. For the shell script to correctly concatenate the JS files, we have to tell it what's the correct order.

We may go through a number of naive solutions. Some may even work to some extent.

We might go for naming our files under a certain pattern, like 00100-somefile.js, 001300-anotherfile.js… and then just concatenate following that number order. It sort of works. Not pretty to maintain, but it sort of works. At first we used sequential numbers but after having to rename 80 files one day, we set up for leaving empty spaces to be filled as needed.

Or we could keep somewhere in the shell script the ordered list of files, or some other ideas.

Any solution along those lines is still a kludge and doesn't really solve the problem, it just tries to make it a bit less painful.

So instead, we reach out again to our knowledge and resources. Maybe there is a pattern a bit more sophisticated than RMP or something else we could add to it.

First approximation to modules

We think about it for some time and decide that it would be very nice if we could add some way for a particular file to say what other files it needs to be available before it can run… its dependencies if you will.

I don't want to write what that could look like in a naive approach, because it would take too much space here and I don't want to deviate too much from our original goal, but you can look into RequireJS for an approach that is somewhat close to what it could look like. (RequireJS is predated by Dojo's module system, but I won't inflict the pain of referring you to Dojo's documentation.)

//my/shirt.js now has some dependencies, a cart and inventory
//module in the same directory as shirt.js
define(["./cart", "./inventory"], function(cart, inventory) {
        //return an object to define the "my/shirt" module.
        return {
            color: "blue",
            size: "large",
            addToCart: function() {
                inventory.decrement(this);
                cart.add(this);
            }
        }
    }
);

Then again, while we're doing all this, some folks publish NodeJS and it becomes popular.

This is relevant as it precisely includes a mechanism to do exactly what we want: define “modules” which can have some private/local parts, can export some public parts, and can require other modules.

NodeJS's mechanism (actually, based on a CommonJS spec), with that particular syntax and all, becomes very popular too. Note, that later, the ES standard decides on a different, more extensive syntax and mechanism but that doesn't really matter much; the important bit is that there are some particular syntaxes that become popular, and so it's a good idea to follow that.

But of course, the syntax in NodeJS works well for NodeJS. And while ECMAScript finally standardizes on another syntax, module support is still not generally available in all browsers, and these systems are designed towards really having multiple independent files and you don't want to serve your files separately.

In any case, the syntax leaves out most of the “weird” boilerplate about closures and simply allows us to write our modules in a manner such as…

    // one of these to require dependencies:
    let a = require('a.js'); // NodeJS modules
    import a from 'a.js';    // ES modules

    // ...your code here...

    // One of these (or other similar variations) to make things visible out of your module:
    module.exports = something; // NodeJS
    export something;           // ES

We see that this is good and decide to go with this.

The second script

But since we still need to bundle it all up into a single file, and we need to make that file work in a browser (not in NodeJS), we need to revise that shell script we had that simply concatenated all our JavaScript files, and turn it into something a bit more sophisticated.

Note that we could opt for a number of approaches here.

One approach we can try is, on the one hand, we make it so that:

  1. Before any one of our files, the script will insert a number of generic utility functions. This is is mainly because, as mentioned, in the browser we don't have support for those require(…) and module.exports = … things, so we'll need to provide that.
  2. Each file's contents are wrapped into a little bit of boilerplate. Think of this wrap as, approximately, the code that we eliminated when going from the RequireJS code sample to the NodeJS module code sample. Not exactly as is, but the general idea is the same. What this does is put your module's code inside a function expression (as with the Revealing Module Pattern) and instead of calling it directly, it passes that expression to one of the “general utility functions” we added at the beginning.

All this will result in this effect: Each module's code is written in that comfortable way we've come to appreciate. But then, when the time comes to executing it, we will manage how (and when) it gets executed. This is so that, when our code calls require(“a.js”), our added functions will be able to provide that other module's code or, if it hasn't loaded yet, it can delay our module's execution until a.js is indeed available.

Generally, this is done by keeping some sort of registry or identifiers that allow the system to reference the modules correctly, as if it was referencing files or something similar.

Our own bundler

Let's recap again.

  1. We have a manageable code base, where the sources are split functionally into small modules. This is a very good thing.
  2. We have a process or tool that:
    1. Puts all the small files into one big file
    2. Adds some completely generic utility functions that:
      1. Take care of providing each module with any other modules it asks for
      2. Solves the problem of order.

How is the problem of order solved? Did I miss explaining that? Simple: As I mentioned, execution of our modules is now managed and we can delay a module's execution if its dependencies haven't loaded yet. So we don't care about the problem anymore. We can just load all the modules and then execute what we want.

Note that it's not only this. Really the dependency system can work with our code, whether it is encapsulated and bundled into one single file or whether it is left as separate files and loaded on demand when they are required or imported. As long as the system or the tool provides the mechanisms and understands the same syntax we gain that ability for our code without our code noticing.

Now, this tool is something we can do ourselves, as I've been imagining. But it would be much “better” if lots of people making similar tools, made them so that they worked in the same way, or used the same tool. That way, we could treat external libraries in the same way we treat our own code.

So, instead of actually building yourself such a tool, you use an existing one. These tools are Browserify, Webpack, Parcel, etc.

Some of these tools, as others have mentioned, tend to take advantage of the fact that we are already doing all those code transformations and bundling process, to offer doing other tasks too. Tasks such as minifying our code (compressing it so that it's smaller and is loaded faster). Or maybe some tools are sophisticated enough that they can even avoid including files (or even parts of files) which are not actually used (tree-shaking). Or they can also process other assets such as CSS and/or images. Once we've agreed to have that tool or process as step in your work-flow, well, why not make the most of it?

Part 2 - Additional Details

The following is an extension to the above explanation that is not needed. It doesn't provide more insight into why we use packers, the problems they serve to solve, or how we arrived to them.

Generated code

What follows is only a bit more of detail on how it is actually done. I'll focus in particular on Browserify, mostly because it's simpler, but all bundlers work in a similar fashion and produce similar results. Do not mind the code too much, as it is not a complete example.

Say one of our files (something called linkLoader.js) looks sort of like this:

    const xhr = require('../lib/xhr.js');
    const dom = require('../lib/domUtils.js');

    function loader(container) {
        const output = dom.printTo(container);

        xhr.get(href, function(content) {
            var { content, js } = dom.parse(content);
            // ...
        });
    }

    module.exports = loader;

I've removed most of the code, but the interesting bits are still there (importing and exporting). So, we run Browserify on our code, and it spits out a bundled file. I won't show all the result of that here because it's too big and noisy. But this particular file gets transformed into something like:

    {
        1: ...,
        2:[
                function(require,module,exports){
                    const xhr = require('../lib/xhr.js');
                    const dom = require('../lib/domUtils.js');

                    function loader(container) {
                        const output = dom.printTo(container);

                        xhr.get(href, function(content) {
                            var { content, js } = dom.parse(content);
                        // ...
                        });
                    }
                    module.exports = loader;
                },
                {"../lib/domUtils.js":4,"../lib/fnbasics.js":5,"../lib/xhr.js":6}
        ],
        3: ...
    }

So it gets thrown into an object. This object will be passed to the function I mentioned above that will execute each of those modules/function-expressions. As you can see, the transformation is mainly just wrapping the original code and extracting the dependencies that each module requires i.e. it reads each require we make and puts all the names together, just for easier management later.

It is interesting to note that our code then gets executed in an environment where we have access to three things:

  • a require function
  • a module and exports references.

This is basically all we need for our code to work and it is interesting that it doesn't really matter much what these are or how they work at a detailed level. Just that they do what you expect. This is what allows what I mentioned earlier: the actual loading can happen like it is done here, in a single bundled file or it could happen in some other way (e.g. by loading it on demand through XHR or from the file-system or whatever).

If you want to actually see what those things look like or what the general function at the start of the bundle looks like, you can have a look at the browser-pack package. But a general idea might be doing something like this.

    //I have that object there with all those functions so
    forEach(key, module) -> {
        funct = module[0]; dependencies = module[1];
        registry[key] = execute(funct, getDeps(dependencies, registry) );
    }

This, of course is a very naive approach. A real solution needs to take into account the availability of dependencies before they are used. It also doesn't really work like this at all in regards to that registry because your code does not return. Instead you add things to the passed module.exports or exports, but that's just an unimportant implementation detail.

Now, I've used Browserify because it is much simpler than Webpack. The output Webpack generates is similar in spirit. Webpack builds an array, instead of an object and wraps our modules into something like this:

    /* 1 */
    /***/
    (function(module, exports, __webpack_require__) {
                    const xhr = __webpack_require__(0);
                    const dom = __webpack_require__(3);
                    // ...
    }),

(The comments are there just for human debugging purposes, as far as I know.)

As you can see, the main difference is that Webpack applies some transformations to your code while it is generating the bundle. The main transformation is that one you see with __webpack_require__. Not only it does change the name or require to that, which is a superficial change, while it is doing that, it also removes the references to actual filenames and substitutes them for a simpler index number. In any case, the result is similar: All the benefits explained previously are there.

Also, as already mentioned, Webpack does more than this. This is all in relation to modules. But Webpack also includes other tasks which you might do with other software. Like compressing (minifying) the output file, or managing CSS alongside JS, or running a transpiler… Or a common one: As I mentioned there are mainly 2 different syntaxes for importing and exporting. The CommonJS (what NodeJS uses) and ESM (the ECMAScript standard), i.e. require('bla.js') vs import from “bla.js”. While Browserify only supports CommonJS, Webpack supports both by transforming “at pack time” those imports into requires. (Note that this isn't strictly correct. Webpack 1 didn't support import either, but Webpack 2 (and 3) does. And also, you can get the same result combining Browserify with other tools -Babel- so that they do the transformation and then Browserify does the packing.)

Future? Support of Modules in Browsers?

Now, there is just one remaining thing you may be wondering about. It could be something like: “Well, now that there is a standard way to load modules, can't we just use that and forget all this about bundling it all into one file and just let the browsers load what they need?”

The answer to that is not completely straightforward. Let's just say that…

  • While there is a standard (mostly, some details still being discussed), there has been no available implementation of it in any browser until… well, very recently. The very latest versions of some browsers are just now starting to ship with (some) support for ES modules. (See the warning at the top here).
  • In the future it may be a way or the way, but for now, needing to support current browsers, the solution does seem to inevitably go through a bundled file or some similar solution that offers the functionality browsers don't.
  • There are also some other things that affect usage of all this. In particular, performance concerns and HTTP2 support may help or may not help going back to multiple independent files being loaded. This is a bit hard to determine yet, but it may mean that in some scenarios bundling all files into one (or a few) file(s) might still perform better.

So the answer to that is a classic it depends. Or, if you prefer, it could be something like: “For now, bundling is a good idea in many cases. In time, we'll see”.