How to send a self-correcting message (Hamming codes)

Aug 15, 2023

How to send a self-correcting message (Hamming codes)

A discovery-oriented introduction to error correction codes.
Part 2:    • Hamming codes part 2, the elegance of…
Ben Eater:’s take:    • What is error correction? Hamming cod…
Help fund future projects: https://www.patreon.com/3blue1brown
An equally valuable form of support is to simply share some of the videos.
Special thanks to these supporters: https://3b1b.co/hamming-thanks

Heavily related is the chessboard puzzle I did with Matt Parker:
   • The almost impossible chessboard puzzle

You can read Hamming’s own perspective on his discovery of these codes in chapter 12 of “The Art of Doing Science and Engineering”.
https://amzn.to/3lwcnmh
\r
The viewer Harry Li made an interactive on this topic:
https://harryli0088.github.io/hamming…

------------------\r
\r
These animations are largely made using manim, a scrappy open-source python library: https://github.com/3b1b/manim\r
\r
If you want to check it out, I feel compelled to warn you that it’s not the most well-documented tool, and it has many other quirks you might expect in a library someone wrote with only their own use in mind.\r
\r
Music by Vincent Rubinetti.\r
Download the music on Bandcamp:\r
https://vincerubinetti.bandcamp.com/a…\r
\r
Stream the music on Spotify:\r
https://open.spotify.com/album/1dVyjw…\r
\r
------------------\r
\r
3blue1brown is a channel about animating math, in all senses of the word animate. And you know the drill with YouTube, if you want to stay posted on new videos, subscribe: http://3b1b.co/subscribe\r
\r
Various social media links:\r
Website: https://www.3blue1brown.com\r
Twitter: https://twitter.com/3blue1brown\r
Reddit: https://www.reddit.com/r/3blue1brown\r
Instagram: https://www.instagram.com/3blue1brown…\r
Patreon: https://patreon.com/3blue1brown\r
Facebook: https://www.facebook.com/3blue1brown

Content

4.009 -> Have you ever wondered how it’s possible to scratch a CD or DVD and still have it play

8.78 -> back whatever it’s storing?

10.86 -> The scratch really does affect the 1’s and 0’s on the disc, so it reads off different

15.23 -> data from what was stored, but unless it’s really scratched up, the bits it reads are

20.57 -> decoded into precisely the same file that was encoded on to it, a bit-for-bit copy,

26.1 -> despite all those errors.

27.86 -> There’s a whole pile of mathematical cleverness that allows us to store data, and just as

32.78 -> importantly to transmit it, in a way that’s resilient to errors.

37.05 -> Well, actually, it doesn’t take too much cleverness to come up with a way to do this.

42.199 -> Any file, whether it’s a video, or sound, or text, code, an image, whatever, is ultimately

48.51 -> stored as a sequence of 1’s and 0’s, and a simple strategy to correct any bit that

53.079 -> gets flipped would be to store three copies of each bit.

57.739 -> Then the machine reading the file could compare these three copies and always take the best

62.089 -> two out of three whenever there’s a discrepancy.

67.42 -> But what that means is using two thirds of your space for redundancy.

71.03 -> And even then, for all that space given up, there’s no strong guarantee about what happens

75.65 -> if more than 1 bit gets flipped.

78.1 -> The much more interesting question is how to make it so that errors can be corrected

81.92 -> while giving up as little space as possible.

84.71 -> For example, using the method you’ll learn about in this video, you could store your

88.18 -> data in 256-bit blocks, where each block uses nine bits –Nine!– to act as a kind of

94.77 -> redundancy, the other 247 bits are free to carry whatever meaningful message or data

99.85 -> you want.

100.85 -> And it will still be the case that if a bit gets flipped, just by looking at this block

105.25 -> and nothing more, a machine will be able to identify that there was an error and precisely

110.24 -> where it was, so that it knows how to correct it.

112.78 -> And honestly, that feels like magic.

115.52 -> And for this particular scheme, if two bits get flipped, the machine will at least be

119.409 -> able to detect that there were two errors, though it won’t know how to fix them.

123.08 -> We’ll talk later about how this scales for blocks of a different size.

128.03 -> Methods that let you correct errors like this are known, reasonably enough, as “Error

131.98 -> correction code.”

133.599 -> For the better part of the last century, this field has been a really rich source of surprisingly

138.189 -> deep math being incorporated into devices we all use every day.

142.969 -> The goal here is to give you a very thorough understanding of one of the earliest examples,

147.72 -> known as a Hamming code.

149.61 -> And by the way, the way I’m thinking about the structure of this video is less about

152.58 -> explaining it as directly as possible, and more a matter of prompting you to invent it

157.15 -> yourself, with some a little gentle guidance here and there.

160.2 -> So when you feel like you see where it’s going at some point, take that moment to pause,

163.989 -> actively predict what the scheme is going to be before I tell you.

167.18 -> Also, if you want that understanding to get down to the hardware level, Ben Eater has

171.469 -> made a video in conjunction with this one showing you how to actually implement Hamming

175.42 -> codes on breadboards, which is extremely satisfying.

178.579 -> Now, you should know, Hamming codes are not as widely used as more modern codes, like

182.98 -> the Reed Solomon algorithm.

184.769 -> But there is a certain magic to the contrast between just how impossible the task feels

188.87 -> at the start, and how utterly reasonable it is once you learn about Hamming.

194.2 -> The basic principle of error correction is that in a vast space of all possible messages,

199.049 -> only some subset are going to be considered valid messages.

202.829 -> As an analogy, think of correctly spelled words vs incorrectly spelled words.

209.2 -> Whenever a valid message gets altered, the receiver is responsible for correcting what

213.299 -> they see back to the nearest valid neighbor, as you might do with a typo.

218.359 -> Coming up with a concrete algorithm to efficiently categorize messages like this, though, take

223.12 -> a certain cleverness.

227.129 -> The story begins in the 1940s, when a young Richard Hamming was working for Bell Labs

231.93 -> and some of his work involved using a very big very expensive punchcard computer that

235.819 -> he had only limited access to.

237.209 -> And the programs he kept putting through it kept failing because every now and then a

241.159 -> bit would get misread.

243.109 -> Frustration being the crucible for invention, he got so fed up that he invented error correction

248.03 -> codes.

249.249 -> There are many different ways to frame Hamming codes, but as a first pass, we're going to

253.159 -> go through it the way that Hamming himself thought about them.

255.84 -> Let’s use an example that’s simple, but not too simple: A block of 16 bits.

261.53 -> We’ll number the positions of these bits from 0 up to 15.

265.66 -> The actual data we want to store is only going to make up 12 of these bits, while 4 of the

270.89 -> positions will be reserved as a kind of redundancy.

273.9 -> The word “redundant” here doesn’t simply mean copy, after all those four bits don’t

278.08 -> give us enough room to blindly copy the data.

280.25 -> Instead, they'll need to be a much more nuanced and clever kind of redundancy, not adding

285.09 -> any new information, but adding resilience.

288.72 -> You might expect these four special bits to come nicely packaged together, maybe at the

292.31 -> end or something like that, but as you’ll see, having them sit in positions which are

296.06 -> powers of 2 allows for something that's really elegant by the end.

299.88 -> It also might give you a little hint about how this scales for larger blocks.

304.73 -> Also, technically it ends up being only 11 bits of data; you’ll find there’s a mild

309.17 -> nuance for what goes on with position 0, but don’t worry about that for now.

314.23 -> Like any error correction algorithm, this will involve two players: A sender who is

318.44 -> responsible for setting these four special bits, and then a receiver who is responsible

322.59 -> for performing some checks and correcting any errors.

326.32 -> Of course, the words “sender” and “receiver” really refer to machines or software that's

329.8 -> doing all these checks, and the idea of a message is meant really broadly, to include

333.79 -> things like storage.

335.46 -> After all, storing data is the thing as sending a message, just from the past to the future

340.19 -> instead of from one place to another.

342.61 -> So that’s the setup, but before we can dive in, we need to talk about a related idea which

346.91 -> was fresh on Hamming’s mind in the time of his discovery, a method which lets you

351.16 -> detect any single-bit error, but not to correct them, known in the business as a “parity

356.4 -> check”.

357.4 -> For a parity check, we separate out only a single bit that the sender is responsible

361.26 -> for tuning, and the rest are free to carry a message.

364.91 -> The only job of this special bit is to make sure the total number of 1’s in the message

370.33 -> is an even number.

372.3 -> For example, right now that total number of 1’s is 7, that's odd, so the sender needs

376.89 -> to flip that special bit to be a 1, making the count even.

380.94 -> But if the block had already started off with an even number of 1’s, then this special

385.29 -> bit would have been kept at a 0.

387.44 -> This is pretty simple, deceptively simple, but it’s an incredibly elegant way to distill

391.63 -> the idea of change anywhere in a message to be reflected in a single bit of information.

397.13 -> Notice, if any bit of this message gets flipped, either from 0 to 1 or 1 to 0, it changes the

403.65 -> total count of 1’s from being even to being odd.

408.1 -> So if you're the receiver, you look at this message, and you see an odd number of 1’s,

411.85 -> you can know, for sure, that some error occurred, even though you might have no idea where it

417.11 -> was.

418.67 -> In the jargon, whether a group of bits has an even or odd number of 1’s is known as

422.59 -> its “parity”.

424.92 -> You could also use numbers and say that the parity is 0 or 1, which is typically more

428.77 -> helpful once you start doing math with the idea.

431.29 -> And this special bit that the sender uses to control the parity is called a “parity

436.13 -> bit”.

437.13 -> And actually, we should be clear, if the receiver sees an odd parity, it doesn't necessarily

442.43 -> mean there was just 1 error.

443.7 -> There might have been 3 errors, or 5, or any other odd number, but they can know for sure

448.42 -> that it wasn’t 0.

449.53 -> On the other hand, if there had been 2 errors, or any even number of errors, that final count

455.11 -> of 1’s would still be even, so the receiver can’t have full confidence that an even

459.55 -> count necessarily means the message is error-free.

463.36 -> You might complain that a method which gets messed up by only two bit flips is pretty

467.17 -> weak, and you would be absolutely right!

469.22 -> Keep in mind, though, there is no method for error detection, or correction, that could

474.33 -> give you 100% confidence that the message you receive is the one the sender intended.

479.73 -> After all, enough random noise can always change one valid message into another valid

483.75 -> message just by pure chance.

485.9 -> Instead, the goal is to come up with a scheme that's robust up to a certain maximum number

490.52 -> of errors or maybe to reduce the probability of a false positive like this.

496.46 -> Parity checks on their own are weak, but by distilling the idea of change across a full

501.31 -> message down to a single bit, what they give us is a powerful building block for more sophisticated

506.73 -> schemes.

507.98 -> For example, as Hamming was searching for a way to identify where an error happened,

512.8 -> not just that it happened, his key insight was that if you apply some parity checks,

517.07 -> not to the full message, but to certain carefully selected subsets, you can ask a more refined

522.09 -> series of questions that pin down the location of any single-bit error.

526.84 -> The overall feeling is a bit like playing a game of 20 questions, asking yes or no queries

531.3 -> that chop the space of possibilities in half.

534.3 -> For example, let's say we do a parity check on just these 8 bits, all the odd-numbered

539.21 -> positions.

540.39 -> Then if an error is detected, it gives the receiver a little more information about where

544.55 -> specifically the error is; namely that it’s in an odd position.

549.14 -> If no error is detected among those 8 bits, it either means there's no error at all, or

554.59 -> it sits somewhere in the even position.

557.22 -> You might think that limiting a parity check to half the bits makes it less effective,

561.54 -> but when it’s done in conjunction with other well-chosen parity checks, it counterintuitively

565.43 -> gives us something more powerful.

568.41 -> To actually set up that parity check, remember, it requires earmarking some special bit that

574.4 -> has control for the parity of that full group.

577.56 -> Here let's just choose position number 1.

579.54 -> So for the example shown, the parity of these 8 bits is currently odd, so the sender is

583.94 -> responsible for toggling that parity bit, and now it's even.

588.2 -> This is only one out of four parity checks we’ll do.

591 -> The second check is going to be among the 8 bits on the right half of the grid, at least

595.06 -> as we've drawn it here.

596.86 -> This time, we might use position number 2 as a parity bit.

599.85 -> So these 8 bits already have an even parity, and the sender can feel good leaving that

604.46 -> bit number 2 unchanged.

606.6 -> Then, on the other end, if the receiver checks the parity of this group and they find that

611 -> it’s odd, they’ll know that the error is somewhere among these 8 bits on the right.

615.6 -> Otherwise, it means either there's no error, or the error is somewhere on the left half.

620.5 -> Or I guess there could have been two errors, but for right now we’re going to assume

624.18 -> that there’s at most one error in the entire block; things break down completely for more

628.35 -> than that.

629.35 -> Here, before looking at the next two checks, take a moment to think about what these first

632.89 -> two allow us to do when you consider them together.

635.29 -> Let's say you detect an error among the odd columns, and among the right half.

640.23 -> It necessarily means the error is somewhere in the last column.

643.9 -> If there was no error in the odd column, but there was one in the right half, well that

648.06 -> tells you it's in the second to last column.

650.14 -> Likewise, if there is an error in the odd columns but not the right half, you know that

654.7 -> it's somewhere in the second column.

656.07 -> And then, if neither of those two parity checks detects anything, it means the only place

660.79 -> an error could be is in that leftmost column, but it also might simply mean there’s no

665.2 -> error at all.

666.46 -> Which is all a rather belabored way to say that two parity checks let us pin down the

670.54 -> column.

671.62 -> From here, you can probably guess what follows.

673.98 -> We do basically the same thing, but for the rows.

676.26 -> There’s going to be a parity check on the odd rows, using position 4 as a parity bit.

681.07 -> So in this example, that group already has even parity, so bit 4 would be set to a 0.

686.6 -> And finally there's a parity check on the bottom two rows, using position 8 as a parity

691.25 -> bit.

692.25 -> In this case, it looks like the sender needs to turn on bit 8 on in order to give that

695.85 -> group an even parity.

697.6 -> Just as the first two checks let us pin down the column, these next two let you pin down

701.32 -> the row.

702.96 -> As an example, imagine that during transmission there’s an error at, say position 3.

707.92 -> Well, this affects the first parity group, and it also affects the second parity group,

712.22 -> so the receiver knows that there's an error somewhere in that right column.

716.2 -> But it doesn’t affect the third group, and it doesn't affect the fourth group.

721.35 -> And that lets the receiver pinpoint the error up to the first row, which necessarily means

725.44 -> position 3, so they can fix the error.

728.62 -> You might enjoy taking a moment to convince yourself that the answers to these four questions

732.6 -> really will always let you pin down a specific location, no matter where they turn out to

736.839 -> be.

737.839 -> In fact, the astute among you might even notice a connection between these questions and binary

742.88 -> counting.

743.88 -> And if you do, again let me emphasize, pause, try for yourself to draw the connection before

747.81 -> I spoil it.

750.85 -> If you’re wondering what happens if a parity bit itself gets affected…well you can just

755.53 -> try it.

756.55 -> Take a moment to think about how any error among these four special bits is going to

760.7 -> be tracked down just like any other, with the same game of 4 questions.

767.43 -> It doesn’t really matter, since at the end of the day what we want is to protect the

770.49 -> message bits, the error correction bits are just riding along, but protecting those bits

774.55 -> as well is something that naturally falls out of this scheme as a byproduct.

779.24 -> You might also enjoy anticipating how this scales.

782.37 -> If we used a block of size 256 bits, for example, in order to pin down a location, you need

788.029 -> only 8 yes or no questions to binary search your way down to some specific spot.

795.72 -> And remember, each question requires giving up only a single bit to set the appropriate

799.7 -> parity check.

803.35 -> Some of you may already see it, but we’ll talk later about the systematic way to find

806.96 -> what these questions are in just a minute or two.

809.77 -> Hopefully, this sketch is enough to appreciate the efficiency of what we’re developing

813.46 -> here.

814.48 -> Everything except for those 8 highlighted parity bits, can be whatever you want it to

818.51 -> be, carrying whatever message or data you want.

821.73 -> The 8 bits are “redundant” in the sense that they’re completely determined by the

825.27 -> rest of the message, but it’s in a much smarter way than simply copying the message

829.26 -> as a whole.

833.65 -> And still, for so little given up, you would be able to identify and fix any single bit

839.07 -> error.

840.07 -> Well...almost.

841.07 -> Okay, so, the one problem here is that if none of the four parity checks detect an error,

845.98 -> meaning that the specially selected subsets of eights bits all have even parities just

850.11 -> like the sender intended, then it either means there was no error at all, or it narrows us

855.61 -> down into position 0.

857.75 -> You see, with four yes or no questions, we have 16 possible outcomes for our parity checks.

862.77 -> And at first, that feels perfect for pinpointing 1 of the 16 positions in the block.

867.91 -> But you also need to communicate a 17th outcome, the “no error” condition.

873 -> The solution here is actually pretty simple: Just forget about 0th bit entirely.

877.87 -> So when we do our four parity checks, and we see that they're all even, it unambiguously

881.92 -> means that there is no error.

884.32 -> What that means, is that rather than working with a 16-bit block, work with a 15-bit block,

889.24 -> where 11 of the bits are free to carry a message, and 4 of them are there for redundancy.

894.05 -> And with that, we now have what people in the business would refer to as a “(15, 11)

898.36 -> Hamming code”.

899.91 -> That said, it is nice to have block size that's a clean power of 2, and there’s a clever

904.12 -> way that we can keep that 0th bit around and get it to do a little extra work for us.

908.85 -> If we use it as a parity bit across the whole block, it lets us actually detect, even though

913.55 -> we can't correct, 2-bit errors.

915.67 -> Here’s how it works.

917.41 -> After setting those four special error-correcting bits, we set that 0th bit so that the parity

921.709 -> of the full block is even, just like a normal parity check.

925.66 -> Now if there’s a single-bit error, then the parity of the full block toggles to be

929.741 -> odd, but we would catch that anyway thanks to our four error-correcting checks.

933.97 -> However, if there are two errors, then the overall parity is going to toggle back to

938.06 -> being even, but the receiver will still see that there’s been at least some error because

942.62 -> of what's going on with the four usual parity checks.

945.79 -> So if they notice an even parity overall, but something nonzero happening with the other

950.24 -> checks, it tells them there were at least two errors.

952.86 -> Isn’t that clever?

954.41 -> Even though we can’t correct those two-bit errors, just by putting that one little bothersome

958.63 -> 0th bit to work, it lets us detect them.

962.33 -> This is pretty standard, it's known as an “extended” hamming code.

966.81 -> Technically speaking, you now have a full description of what a Hamming code does, at

970.86 -> least for the example of a 16-bit block.

973.58 -> But I think you'll find it more satisfying to check your understanding and solidify everything

977.3 -> up to this point by doing one full example from start to finish yourself.

981.399 -> I’ll step through it with you, though, so you can check yourself.

984.99 -> To set up a message, whether that’s a literal message that you’re transmitting over space,

989.18 -> or some data that you want to store over time, the first step is to divide it up into 11-bit

994.649 -> chunks.

995.68 -> Each chunk is going to get packaged into an error-resistant 16-bit block, so let’s take

1000.77 -> this one as an example and actually work it out.

1003.269 -> Go ahead, actually do it, pause and try putting together this block

1006.74 -> ....okay, you ready?

1011.07 -> Remember, position 0 along with all the powers of 2 are reserved for error-correction duty,

1019.21 -> so you start by placing the message bits in all of the remaining spots, in order.

1025.709 -> You need this group to have even parity, which it already does, so you should have set the

1030.13 -> parity bit in position 1 to be a 0.

1033.14 -> The next group starts off with odd parity, so you should have set its parity bit to 1.

1039.28 -> The group after that starts with odd parity, so again you should have set it’s parity

1043.549 -> bit to 1.

1044.549 -> And the final group also has odd parity, meaning we set that bit in position 8 to be a 1.

1051.5 -> And then, as the final step, the full block now has even parity, meaning that you can

1056.3 -> set that bit number 0, the overarching parity bit, to be 0.

1061.49 -> So as this block is sent off, the parity of the four special subsets, and the block as

1065.76 -> a whole, will all be even, or zero.

1068.94 -> As the second part of the exercise, let’s have you play the role of the receiver.

1073.61 -> Of course, that would mean you don’t already know what this message is; maybe some of you

1077.59 -> memorized it but, let’s assume that you haven’t.

1080.16 -> What I’m going to do is change either zero, one or two of the bits in that block, and

1085.32 -> then ask you to figure out what it is that I did.

1088.12 -> So, again, pause and try working it out.

1095.62 -> ...okay, so you as the receiver now check the first parity group and you can see that

1103.88 -> it’s even, so any error that exists would have to be in an even column.

1109.799 -> The next check gives us an odd number, telling us both that there’s at least one error,

1114.45 -> and narrowing us down into this specific column.

1118.73 -> The third check is even, chopping down the possibilities even further.

1121.95 -> And the last parity check is odd, telling us there’s an error somewhere in the bottom,

1126.78 -> which by now we can see must be in position number 10.

1130.49 -> What’s more, the parity of the whole block is odd, giving us confidence that there was

1136.11 -> one flip and not two (if it’s three or more, all bets are off).

1141.57 -> After correcting that bit number 10, pulling out the 11 bits that were not used for correction

1146.49 -> gives us the relevant segment of the original message, which if you rewind and compare,

1151.41 -> is indeed exactly what we started the example with.

1156.12 -> And now that you know how to do all this by hand, I’d like to show you how you can carry

1159.23 -> out the core part of all of this logic with a single line of python code.

1163.91 -> You see, what I haven’t told you yet is just how elegant this algorithm really is,

1168.26 -> how simple it is to get a machine to point to the position of an error, how to systematically

1172.35 -> scale it, and how we can frame all of this as one single operation rather than multiple

1177.65 -> separate checks.

1179.549 -> To see what I mean, come join me in part 2.

Source: https://www.youtube.com/watch?v=X8jsijhllIA