Tinselcity

This is not really about shuffling. But then again, it is, to some extent. It's also not about algorithms; but again it is a bit about them. It is totally not about pointing fingers at some stranger on the internet. I mean, who cares? Is it about testing? Well, in part it is. But only in a fairly specific sense. It is, I guess, about programming. Not so much about code, yes, I'd say this is about programming…

Programming is hard; let's go shopping

Anyway, some time ago I wrote this piece here showing how to absolutely not shuffle an array in JavaScript. It is an extreme case but a very common one. The wrong knowledge is spread far and wide through the internet. It is tempting. A simple thing, a one liner, it seems to work ok… But it is wrong.

I'm not going to insist on that. I mean, just read the link. Run the examples, notice the problems.

I'll just remind everyone of the conclusion: Don't be tempted by apparently easy solutions. Do make the effort to reach correct solutions.

Ok, so there's a proven algorithm. You search for how to shuffle an array and it does come up. It's there, on WikiPedia. In S.O. too. In various thousand algorithm websites. In most of them you don't just get the algorithm on pseudo-code, but also an explanation and even implementations in various languages.

Engineering is hard; let's go shopping

So… it's there. It's a solved thing. Just grab it and go.

Really.

I mean, sure, if you insist, do write your own implementation… for learning purposes. Yes, for learning it's a nice exercise. Not terribly interesting but nice. It does have its quirks and tricky indexes and whatever.

But, really, this is what engineering is about. If you ever call yourself a software engineer, at work, in your resume, in casual conversations, wherever, then you need to understand this is exactly what engineering is about. It is a solved problem. It is also not your problem. You want to shuffle an array to do some other thing. I'm pretty sure your business model is not shuffling arrays. And so, being a solved problem, you don't need to solve it again. You just take the known solution and apply it. This is the kind of code that you should just copy into your project. Not that 3000-lines boilerplate template which you will need to pick apart, twist, deconstruct and reconstruct again as if you were practising sculpture.

This is engineering. Building on proven solutions to progress further.

Being cool is hard; let's go shopping

And with all that said, you will inevitably find this sort of person in your career. Chances are, you will be that person.

This is someone who, by their own account, has “been programming for almost 20 years”. And what do they do? They see “a pretty non-standard for loop” -which just means that it loops backwards from the end of the array towards the beginning- and they decide to refactor the code.

Notice that there isn't any other reason given for this refactoring other than that. They didn't quite like a loop going backwards. There is no claim of the code being wrong, inefficient, or less performant. Just that, for them, the backward-going loop makes it harder to understand. Ok, the reason is valid to some extent. We should indeed strive to write code which is easier to read so that it is easier to maintain. But then again… We should not be maintaining this code at all! It is correct, it is proven, it has been verified and explained through countless eyes and countless uses in many decades.

In any case, not able to stop themselves from re-writing it in a more fashionable way and brag about it on the internet, they simply do that:

    const myArray = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10];
    for (let i = 0; i < myArray.length-2; i++) {
        const select = i + parseInt(Math.random() * (myArray.length - i)); 
        const swap = myArray[i];
        myArray[i] = myArray[select];
        myArray[select] = swap;
    }
    // myArray is now shuffled

Verifying is hard; let's go shopping

The problem is… the new implementation is broken. Worse, it is subtly broken, like the naive approach using sort but maybe more subtly.

Now, did I just look at that code and see that it was broken? No, I'm not that good. I did however just looked at the code and found it suspicious.

I found it suspicious because I trust the original F-Y implementation. I have used it. I have run tests on it in the past1). I know it works. I know, not just that it apparently works. I know that it does. That a correct implementation is fair and balanced. That it produces uniform shuffles with equal chances for all the results.

I found it suspicious because I never trust some random code that claims to be a refactor of F-Y. I haven't run it. I haven't tested it. I do not know if it is really balanced. Also, if the original reddit commentor tends to distrust a backward loop, I tend to distrust a lot more a loop with myArray.length - 2. Uhm, - 2? Maybe it's right, of course, but… why? And later… i + parseInt(Math.random() * (myArray.length - i)). Uh… is that really right? I mean, maybe it is, but if we're going to rant about things being hard to read… In any case, that index juggling makes me suspect that there may be something going on with the last element in the array.

Anyhow, I found it suspicious but I didn't know it was wrong. How do I know now that it is? Well, I just ran it and saw the results.

And yes, running it and reading the results can mean a couple of different things. The refactoring author most probably ran it too and watched the results. But most probably too, they didn't run it a hundred thousand times, then plotted all the results and looked for the absence of patterns to verify the uniformity.

I did. And it turns out that it does present a problem at the end of the array. In fact, it presents a problem in the last 2 positions of the array. Given a starting array with n items, the last item will both have an unusually high chance of remaining in its last position, and also a zero probability of ending up in the last-but-one position. The last-but-one item in the starting array suffers a similar fate, having most chances of remaining in its position and having zero chance of going into the last position.

Oh, uh, weird

Talking... well, talking is very easy

Talking is easy. Anyone can discuss about this code or that code being easier to read. Anybody can justify a decision to refactor on some more or less acceptable arguments. Getting things “apparently correct” is also fairly easy. I mean, just use items.sort((a,b) ⇒ Math.random() - 0.5), right? Who cares anyway.

Then again, I don't really care much about that. As I said at the beginning this is not about pointing fingers at anyone; that person is a lot of persons. What this is about is about doing things correctly.

Again it is not about going back to that code and running those tests on it. It is not about spotting the subtle bug. It is not about fixing it. It's about never ending up in this situation. It's about engineering. It's about not solving something which has been solved already. About not touching perfectly good code just because you sort of didn't like it.

1)
Just like the one in the first link above