How to send a self-correcting message (Hamming codes)

How to send a self-correcting message (Hamming codes)


How to send a self-correcting message (Hamming codes)

A discovery-oriented introduction to error correction codes.
Part 2:    • Hamming codes part 2, the elegance of…  
Ben Eater:’s take:    • What is error correction? Hamming cod…  
Help fund future projects: https://www.patreon.com/3blue1brown
An equally valuable form of support is to simply share some of the videos.
Special thanks to these supporters: https://3b1b.co/hamming-thanks

Heavily related is the chessboard puzzle I did with Matt Parker:
   • The almost impossible chessboard puzzle  

You can read Hamming’s own perspective on his discovery of these codes in chapter 12 of “The Art of Doing Science and Engineering”.
https://amzn.to/3lwcnmh
\r
The viewer Harry Li made an interactive on this topic:
https://harryli0088.github.io/hamming

------------------\r
\r
These animations are largely made using manim, a scrappy open-source python library: https://github.com/3b1b/manim\r
\r
If you want to check it out, I feel compelled to warn you that it’s not the most well-documented tool, and it has many other quirks you might expect in a library someone wrote with only their own use in mind.\r
\r
Music by Vincent Rubinetti.\r
Download the music on Bandcamp:\r
https://vincerubinetti.bandcamp.com/a…\r
\r
Stream the music on Spotify:\r
https://open.spotify.com/album/1dVyjw…\r
\r
------------------\r
\r
3blue1brown is a channel about animating math, in all senses of the word animate. And you know the drill with YouTube, if you want to stay posted on new videos, subscribe: http://3b1b.co/subscribe\r
\r
Various social media links:\r
Website: https://www.3blue1brown.com\r
Twitter: https://twitter.com/3blue1brown\r
Reddit: https://www.reddit.com/r/3blue1brown\r
Instagram: https://www.instagram.com/3blue1brown…\r
Patreon: https://patreon.com/3blue1brown\r
Facebook: https://www.facebook.com/3blue1brown


Content

4.009 -> Have you ever wondered how it’s possible to scratch a CD or DVD and still have it play
8.78 -> back whatever it’s storing?
10.86 -> The scratch really does affect the 1’s and 0’s on the disc, so it reads off different
15.23 -> data from what was stored, but unless it’s really scratched up, the bits it reads are
20.57 -> decoded into precisely the same file that was encoded on to it, a bit-for-bit copy,
26.1 -> despite all those errors.
27.86 -> There’s a whole pile of mathematical cleverness that allows us to store data, and just as
32.78 -> importantly to transmit it, in a way that’s resilient to errors.
37.05 -> Well, actually, it doesn’t take too much cleverness to come up with a way to do this.
42.199 -> Any file, whether it’s a video, or sound, or text, code, an image, whatever, is ultimately
48.51 -> stored as a sequence of 1’s and 0’s, and a simple strategy to correct any bit that
53.079 -> gets flipped would be to store three copies of each bit.
57.739 -> Then the machine reading the file could compare these three copies and always take the best
62.089 -> two out of three whenever there’s a discrepancy.
67.42 -> But what that means is using two thirds of your space for redundancy.
71.03 -> And even then, for all that space given up, there’s no strong guarantee about what happens
75.65 -> if more than 1 bit gets flipped.
78.1 -> The much more interesting question is how to make it so that errors can be corrected
81.92 -> while giving up as little space as possible.
84.71 -> For example, using the method you’ll learn about in this video, you could store your
88.18 -> data in 256-bit blocks, where each block uses nine bits –Nine!– to act as a kind of
94.77 -> redundancy, the other 247 bits are free to carry whatever meaningful message or data
99.85 -> you want.
100.85 -> And it will still be the case that if a bit gets flipped, just by looking at this block
105.25 -> and nothing more, a machine will be able to identify that there was an error and precisely
110.24 -> where it was, so that it knows how to correct it.
112.78 -> And honestly, that feels like magic.
115.52 -> And for this particular scheme, if two bits get flipped, the machine will at least be
119.409 -> able to detect that there were two errors, though it won’t know how to fix them.
123.08 -> We’ll talk later about how this scales for blocks of a different size.
128.03 -> Methods that let you correct errors like this are known, reasonably enough, as “Error
131.98 -> correction code.”
133.599 -> For the better part of the last century, this field has been a really rich source of surprisingly
138.189 -> deep math being incorporated into devices we all use every day.
142.969 -> The goal here is to give you a very thorough understanding of one of the earliest examples,
147.72 -> known as a Hamming code.
149.61 -> And by the way, the way I’m thinking about the structure of this video is less about
152.58 -> explaining it as directly as possible, and more a matter of prompting you to invent it
157.15 -> yourself, with some a little gentle guidance here and there.
160.2 -> So when you feel like you see where it’s going at some point, take that moment to pause,
163.989 -> actively predict what the scheme is going to be before I tell you.
167.18 -> Also, if you want that understanding to get down to the hardware level, Ben Eater has
171.469 -> made a video in conjunction with this one showing you how to actually implement Hamming
175.42 -> codes on breadboards, which is extremely satisfying.
178.579 -> Now, you should know, Hamming codes are not as widely used as more modern codes, like
182.98 -> the Reed Solomon algorithm.
184.769 -> But there is a certain magic to the contrast between just how impossible the task feels
188.87 -> at the start, and how utterly reasonable it is once you learn about Hamming.
194.2 -> The basic principle of error correction is that in a vast space of all possible messages,
199.049 -> only some subset are going to be considered valid messages.
202.829 -> As an analogy, think of correctly spelled words vs incorrectly spelled words.
209.2 -> Whenever a valid message gets altered, the receiver is responsible for correcting what
213.299 -> they see back to the nearest valid neighbor, as you might do with a typo.
218.359 -> Coming up with a concrete algorithm to efficiently categorize messages like this, though, take
223.12 -> a certain cleverness.
227.129 -> The story begins in the 1940s, when a young Richard Hamming was working for Bell Labs
231.93 -> and some of his work involved using a very big very expensive punchcard computer that
235.819 -> he had only limited access to.
237.209 -> And the programs he kept putting through it kept failing because every now and then a
241.159 -> bit would get misread.
243.109 -> Frustration being the crucible for invention, he got so fed up that he invented error correction
248.03 -> codes.
249.249 -> There are many different ways to frame Hamming codes, but as a first pass, we're going to
253.159 -> go through it the way that Hamming himself thought about them.
255.84 -> Let’s use an example that’s simple, but not too simple: A block of 16 bits.
261.53 -> We’ll number the positions of these bits from 0 up to 15.
265.66 -> The actual data we want to store is only going to make up 12 of these bits, while 4 of the
270.89 -> positions will be reserved as a kind of redundancy.
273.9 -> The word “redundant” here doesn’t simply mean copy, after all those four bits don’t
278.08 -> give us enough room to blindly copy the data.
280.25 -> Instead, they'll need to be a much more nuanced and clever kind of redundancy, not adding
285.09 -> any new information, but adding resilience.
288.72 -> You might expect these four special bits to come nicely packaged together, maybe at the
292.31 -> end or something like that, but as you’ll see, having them sit in positions which are
296.06 -> powers of 2 allows for something that's really elegant by the end.
299.88 -> It also might give you a little hint about how this scales for larger blocks.
304.73 -> Also, technically it ends up being only 11 bits of data; you’ll find there’s a mild
309.17 -> nuance for what goes on with position 0, but don’t worry about that for now.
314.23 -> Like any error correction algorithm, this will involve two players: A sender who is
318.44 -> responsible for setting these four special bits, and then a receiver who is responsible
322.59 -> for performing some checks and correcting any errors.
326.32 -> Of course, the words “sender” and “receiver” really refer to machines or software that's
329.8 -> doing all these checks, and the idea of a message is meant really broadly, to include
333.79 -> things like storage.
335.46 -> After all, storing data is the thing as sending a message, just from the past to the future
340.19 -> instead of from one place to another.
342.61 -> So that’s the setup, but before we can dive in, we need to talk about a related idea which
346.91 -> was fresh on Hamming’s mind in the time of his discovery, a method which lets you
351.16 -> detect any single-bit error, but not to correct them, known in the business as a “parity
356.4 -> check”.
357.4 -> For a parity check, we separate out only a single bit that the sender is responsible
361.26 -> for tuning, and the rest are free to carry a message.
364.91 -> The only job of this special bit is to make sure the total number of 1’s in the message
370.33 -> is an even number.
372.3 -> For example, right now that total number of 1’s is 7, that's odd, so the sender needs
376.89 -> to flip that special bit to be a 1, making the count even.
380.94 -> But if the block had already started off with an even number of 1’s, then this special
385.29 -> bit would have been kept at a 0.
387.44 -> This is pretty simple, deceptively simple, but it’s an incredibly elegant way to distill
391.63 -> the idea of change anywhere in a message to be reflected in a single bit of information.
397.13 -> Notice, if any bit of this message gets flipped, either from 0 to 1 or 1 to 0, it changes the
403.65 -> total count of 1’s from being even to being odd.
408.1 -> So if you're the receiver, you look at this message, and you see an odd number of 1’s,
411.85 -> you can know, for sure, that some error occurred, even though you might have no idea where it
417.11 -> was.
418.67 -> In the jargon, whether a group of bits has an even or odd number of 1’s is known as
422.59 -> its “parity”.
424.92 -> You could also use numbers and say that the parity is 0 or 1, which is typically more
428.77 -> helpful once you start doing math with the idea.
431.29 -> And this special bit that the sender uses to control the parity is called a “parity
436.13 -> bit”.
437.13 -> And actually, we should be clear, if the receiver sees an odd parity, it doesn't necessarily
442.43 -> mean there was just 1 error.
443.7 -> There might have been 3 errors, or 5, or any other odd number, but they can know for sure
448.42 -> that it wasn’t 0.
449.53 -> On the other hand, if there had been 2 errors, or any even number of errors, that final count
455.11 -> of 1’s would still be even, so the receiver can’t have full confidence that an even
459.55 -> count necessarily means the message is error-free.
463.36 -> You might complain that a method which gets messed up by only two bit flips is pretty
467.17 -> weak, and you would be absolutely right!
469.22 -> Keep in mind, though, there is no method for error detection, or correction, that could
474.33 -> give you 100% confidence that the message you receive is the one the sender intended.
479.73 -> After all, enough random noise can always change one valid message into another valid
483.75 -> message just by pure chance.
485.9 -> Instead, the goal is to come up with a scheme that's robust up to a certain maximum number
490.52 -> of errors or maybe to reduce the probability of a false positive like this.
496.46 -> Parity checks on their own are weak, but by distilling the idea of change across a full
501.31 -> message down to a single bit, what they give us is a powerful building block for more sophisticated
506.73 -> schemes.
507.98 -> For example, as Hamming was searching for a way to identify where an error happened,
512.8 -> not just that it happened, his key insight was that if you apply some parity checks,
517.07 -> not to the full message, but to certain carefully selected subsets, you can ask a more refined
522.09 -> series of questions that pin down the location of any single-bit error.
526.84 -> The overall feeling is a bit like playing a game of 20 questions, asking yes or no queries
531.3 -> that chop the space of possibilities in half.
534.3 -> For example, let's say we do a parity check on just these 8 bits, all the odd-numbered
539.21 -> positions.
540.39 -> Then if an error is detected, it gives the receiver a little more information about where
544.55 -> specifically the error is; namely that it’s in an odd position.
549.14 -> If no error is detected among those 8 bits, it either means there's no error at all, or
554.59 -> it sits somewhere in the even position.
557.22 -> You might think that limiting a parity check to half the bits makes it less effective,
561.54 -> but when it’s done in conjunction with other well-chosen parity checks, it counterintuitively
565.43 -> gives us something more powerful.
568.41 -> To actually set up that parity check, remember, it requires earmarking some special bit that
574.4 -> has control for the parity of that full group.
577.56 -> Here let's just choose position number 1.
579.54 -> So for the example shown, the parity of these 8 bits is currently odd, so the sender is
583.94 -> responsible for toggling that parity bit, and now it's even.
588.2 -> This is only one out of four parity checks we’ll do.
591 -> The second check is going to be among the 8 bits on the right half of the grid, at least
595.06 -> as we've drawn it here.
596.86 -> This time, we might use position number 2 as a parity bit.
599.85 -> So these 8 bits already have an even parity, and the sender can feel good leaving that
604.46 -> bit number 2 unchanged.
606.6 -> Then, on the other end, if the receiver checks the parity of this group and they find that
611 -> it’s odd, they’ll know that the error is somewhere among these 8 bits on the right.
615.6 -> Otherwise, it means either there's no error, or the error is somewhere on the left half.
620.5 -> Or I guess there could have been two errors, but for right now we’re going to assume
624.18 -> that there’s at most one error in the entire block; things break down completely for more
628.35 -> than that.
629.35 -> Here, before looking at the next two checks, take a moment to think about what these first
632.89 -> two allow us to do when you consider them together.
635.29 -> Let's say you detect an error among the odd columns, and among the right half.
640.23 -> It necessarily means the error is somewhere in the last column.
643.9 -> If there was no error in the odd column, but there was one in the right half, well that
648.06 -> tells you it's in the second to last column.
650.14 -> Likewise, if there is an error in the odd columns but not the right half, you know that
654.7 -> it's somewhere in the second column.
656.07 -> And then, if neither of those two parity checks detects anything, it means the only place
660.79 -> an error could be is in that leftmost column, but it also might simply mean there’s no
665.2 -> error at all.
666.46 -> Which is all a rather belabored way to say that two parity checks let us pin down the
670.54 -> column.
671.62 -> From here, you can probably guess what follows.
673.98 -> We do basically the same thing, but for the rows.
676.26 -> There’s going to be a parity check on the odd rows, using position 4 as a parity bit.
681.07 -> So in this example, that group already has even parity, so bit 4 would be set to a 0.
686.6 -> And finally there's a parity check on the bottom two rows, using position 8 as a parity
691.25 -> bit.
692.25 -> In this case, it looks like the sender needs to turn on bit 8 on in order to give that
695.85 -> group an even parity.
697.6 -> Just as the first two checks let us pin down the column, these next two let you pin down
701.32 -> the row.
702.96 -> As an example, imagine that during transmission there’s an error at, say position 3.
707.92 -> Well, this affects the first parity group, and it also affects the second parity group,
712.22 -> so the receiver knows that there's an error somewhere in that right column.
716.2 -> But it doesn’t affect the third group, and it doesn't affect the fourth group.
721.35 -> And that lets the receiver pinpoint the error up to the first row, which necessarily means
725.44 -> position 3, so they can fix the error.
728.62 -> You might enjoy taking a moment to convince yourself that the answers to these four questions
732.6 -> really will always let you pin down a specific location, no matter where they turn out to
736.839 -> be.
737.839 -> In fact, the astute among you might even notice a connection between these questions and binary
742.88 -> counting.
743.88 -> And if you do, again let me emphasize, pause, try for yourself to draw the connection before
747.81 -> I spoil it.
750.85 -> If you’re wondering what happens if a parity bit itself gets affected…well you can just
755.53 -> try it.
756.55 -> Take a moment to think about how any error among these four special bits is going to
760.7 -> be tracked down just like any other, with the same game of 4 questions.
767.43 -> It doesn’t really matter, since at the end of the day what we want is to protect the
770.49 -> message bits, the error correction bits are just riding along, but protecting those bits
774.55 -> as well is something that naturally falls out of this scheme as a byproduct.
779.24 -> You might also enjoy anticipating how this scales.
782.37 -> If we used a block of size 256 bits, for example, in order to pin down a location, you need
788.029 -> only 8 yes or no questions to binary search your way down to some specific spot.
795.72 -> And remember, each question requires giving up only a single bit to set the appropriate
799.7 -> parity check.
803.35 -> Some of you may already see it, but we’ll talk later about the systematic way to find
806.96 -> what these questions are in just a minute or two.
809.77 -> Hopefully, this sketch is enough to appreciate the efficiency of what we’re developing
813.46 -> here.
814.48 -> Everything except for those 8 highlighted parity bits, can be whatever you want it to
818.51 -> be, carrying whatever message or data you want.
821.73 -> The 8 bits are “redundant” in the sense that they’re completely determined by the
825.27 -> rest of the message, but it’s in a much smarter way than simply copying the message
829.26 -> as a whole.
833.65 -> And still, for so little given up, you would be able to identify and fix any single bit
839.07 -> error.
840.07 -> Well...almost.
841.07 -> Okay, so, the one problem here is that if none of the four parity checks detect an error,
845.98 -> meaning that the specially selected subsets of eights bits all have even parities just
850.11 -> like the sender intended, then it either means there was no error at all, or it narrows us
855.61 -> down into position 0.
857.75 -> You see, with four yes or no questions, we have 16 possible outcomes for our parity checks.
862.77 -> And at first, that feels perfect for pinpointing 1 of the 16 positions in the block.
867.91 -> But you also need to communicate a 17th outcome, the “no error” condition.
873 -> The solution here is actually pretty simple: Just forget about 0th bit entirely.
877.87 -> So when we do our four parity checks, and we see that they're all even, it unambiguously
881.92 -> means that there is no error.
884.32 -> What that means, is that rather than working with a 16-bit block, work with a 15-bit block,
889.24 -> where 11 of the bits are free to carry a message, and 4 of them are there for redundancy.
894.05 -> And with that, we now have what people in the business would refer to as a “(15, 11)
898.36 -> Hamming code”.
899.91 -> That said, it is nice to have block size that's a clean power of 2, and there’s a clever
904.12 -> way that we can keep that 0th bit around and get it to do a little extra work for us.
908.85 -> If we use it as a parity bit across the whole block, it lets us actually detect, even though
913.55 -> we can't correct, 2-bit errors.
915.67 -> Here’s how it works.
917.41 -> After setting those four special error-correcting bits, we set that 0th bit so that the parity
921.709 -> of the full block is even, just like a normal parity check.
925.66 -> Now if there’s a single-bit error, then the parity of the full block toggles to be
929.741 -> odd, but we would catch that anyway thanks to our four error-correcting checks.
933.97 -> However, if there are two errors, then the overall parity is going to toggle back to
938.06 -> being even, but the receiver will still see that there’s been at least some error because
942.62 -> of what's going on with the four usual parity checks.
945.79 -> So if they notice an even parity overall, but something nonzero happening with the other
950.24 -> checks, it tells them there were at least two errors.
952.86 -> Isn’t that clever?
954.41 -> Even though we can’t correct those two-bit errors, just by putting that one little bothersome
958.63 -> 0th bit to work, it lets us detect them.
962.33 -> This is pretty standard, it's known as an “extended” hamming code.
966.81 -> Technically speaking, you now have a full description of what a Hamming code does, at
970.86 -> least for the example of a 16-bit block.
973.58 -> But I think you'll find it more satisfying to check your understanding and solidify everything
977.3 -> up to this point by doing one full example from start to finish yourself.
981.399 -> I’ll step through it with you, though, so you can check yourself.
984.99 -> To set up a message, whether that’s a literal message that you’re transmitting over space,
989.18 -> or some data that you want to store over time, the first step is to divide it up into 11-bit
994.649 -> chunks.
995.68 -> Each chunk is going to get packaged into an error-resistant 16-bit block, so let’s take
1000.77 -> this one as an example and actually work it out.
1003.269 -> Go ahead, actually do it, pause and try putting together this block
1006.74 -> ....okay, you ready?
1011.07 -> Remember, position 0 along with all the powers of 2 are reserved for error-correction duty,
1019.21 -> so you start by placing the message bits in all of the remaining spots, in order.
1025.709 -> You need this group to have even parity, which it already does, so you should have set the
1030.13 -> parity bit in position 1 to be a 0.
1033.14 -> The next group starts off with odd parity, so you should have set its parity bit to 1.
1039.28 -> The group after that starts with odd parity, so again you should have set it’s parity
1043.549 -> bit to 1.
1044.549 -> And the final group also has odd parity, meaning we set that bit in position 8 to be a 1.
1051.5 -> And then, as the final step, the full block now has even parity, meaning that you can
1056.3 -> set that bit number 0, the overarching parity bit, to be 0.
1061.49 -> So as this block is sent off, the parity of the four special subsets, and the block as
1065.76 -> a whole, will all be even, or zero.
1068.94 -> As the second part of the exercise, let’s have you play the role of the receiver.
1073.61 -> Of course, that would mean you don’t already know what this message is; maybe some of you
1077.59 -> memorized it but, let’s assume that you haven’t.
1080.16 -> What I’m going to do is change either zero, one or two of the bits in that block, and
1085.32 -> then ask you to figure out what it is that I did.
1088.12 -> So, again, pause and try working it out.
1095.62 -> ...okay, so you as the receiver now check the first parity group and you can see that
1103.88 -> it’s even, so any error that exists would have to be in an even column.
1109.799 -> The next check gives us an odd number, telling us both that there’s at least one error,
1114.45 -> and narrowing us down into this specific column.
1118.73 -> The third check is even, chopping down the possibilities even further.
1121.95 -> And the last parity check is odd, telling us there’s an error somewhere in the bottom,
1126.78 -> which by now we can see must be in position number 10.
1130.49 -> What’s more, the parity of the whole block is odd, giving us confidence that there was
1136.11 -> one flip and not two (if it’s three or more, all bets are off).
1141.57 -> After correcting that bit number 10, pulling out the 11 bits that were not used for correction
1146.49 -> gives us the relevant segment of the original message, which if you rewind and compare,
1151.41 -> is indeed exactly what we started the example with.
1156.12 -> And now that you know how to do all this by hand, I’d like to show you how you can carry
1159.23 -> out the core part of all of this logic with a single line of python code.
1163.91 -> You see, what I haven’t told you yet is just how elegant this algorithm really is,
1168.26 -> how simple it is to get a machine to point to the position of an error, how to systematically
1172.35 -> scale it, and how we can frame all of this as one single operation rather than multiple
1177.65 -> separate checks.
1179.549 -> To see what I mean, come join me in part 2.

Source: https://www.youtube.com/watch?v=X8jsijhllIA