How to send a self-correcting message (Hamming codes)
Aug 15, 2023
How to send a self-correcting message (Hamming codes)
A discovery-oriented introduction to error correction codes. Part 2: • Hamming codes part 2, the elegance of… Ben Eater:’s take: • What is error correction? Hamming cod… Help fund future projects: https://www.patreon.com/3blue1brown An equally valuable form of support is to simply share some of the videos. Special thanks to these supporters: https://3b1b.co/hamming-thanks Heavily related is the chessboard puzzle I did with Matt Parker: • The almost impossible chessboard puzzle You can read Hamming’s own perspective on his discovery of these codes in chapter 12 of “The Art of Doing Science and Engineering”.https://amzn.to/3lwcnmh \r The viewer Harry Li made an interactive on this topic:https://harryli0088.github.io/hamming … ------------------\r \r These animations are largely made using manim, a scrappy open-source python library: https://github.com/3b1b/manim\r \r If you want to check it out, I feel compelled to warn you that it’s not the most well-documented tool, and it has many other quirks you might expect in a library someone wrote with only their own use in mind.\r \r Music by Vincent Rubinetti.\r Download the music on Bandcamp:\rhttps://vincerubinetti.bandcamp.com/a…\r \r Stream the music on Spotify:\rhttps://open.spotify.com/album/1dVyjw…\r \r ------------------\r \r 3blue1brown is a channel about animating math, in all senses of the word animate. And you know the drill with YouTube, if you want to stay posted on new videos, subscribe: http://3b1b.co/subscribe\r \r Various social media links:\r Website: https://www.3blue1brown.com\r Twitter: https://twitter.com/3blue1brown\r Reddit: https://www.reddit.com/r/3blue1brown\r Instagram: https://www.instagram.com/3blue1brown…\r Patreon: https://patreon.com/3blue1brown\r Facebook: https://www.facebook.com/3blue1brown
Content
4.009 -> Have you ever wondered how it’s possible
to scratch a CD or DVD and still have it play
8.78 -> back whatever it’s storing?
10.86 -> The scratch really does affect the 1’s and
0’s on the disc, so it reads off different
15.23 -> data from what was stored, but unless it’s
really scratched up, the bits it reads are
20.57 -> decoded into precisely the same file that
was encoded on to it, a bit-for-bit copy,
26.1 -> despite all those errors.
27.86 -> There’s a whole pile of mathematical cleverness
that allows us to store data, and just as
32.78 -> importantly to transmit it, in a way that’s
resilient to errors.
37.05 -> Well, actually, it doesn’t take too much
cleverness to come up with a way to do this.
42.199 -> Any file, whether it’s a video, or sound,
or text, code, an image, whatever, is ultimately
48.51 -> stored as a sequence of 1’s and 0’s, and
a simple strategy to correct any bit that
53.079 -> gets flipped would be to store three copies
of each bit.
57.739 -> Then the machine reading the file could compare
these three copies and always take the best
62.089 -> two out of three whenever there’s a discrepancy.
67.42 -> But what that means is using two thirds of
your space for redundancy.
71.03 -> And even then, for all that space given up,
there’s no strong guarantee about what happens
75.65 -> if more than 1 bit gets flipped.
78.1 -> The much more interesting question is how
to make it so that errors can be corrected
81.92 -> while giving up as little space as possible.
84.71 -> For example, using the method you’ll learn
about in this video, you could store your
88.18 -> data in 256-bit blocks, where each block uses
nine bits –Nine!– to act as a kind of
94.77 -> redundancy, the other 247 bits are free to
carry whatever meaningful message or data
99.85 -> you want.
100.85 -> And it will still be the case that if a bit
gets flipped, just by looking at this block
105.25 -> and nothing more, a machine will be able to
identify that there was an error and precisely
110.24 -> where it was, so that it knows how to correct
it.
112.78 -> And honestly, that feels like magic.
115.52 -> And for this particular scheme, if two bits
get flipped, the machine will at least be
119.409 -> able to detect that there were two errors,
though it won’t know how to fix them.
123.08 -> We’ll talk later about how this scales for
blocks of a different size.
128.03 -> Methods that let you correct errors like this
are known, reasonably enough, as “Error
131.98 -> correction code.”
133.599 -> For the better part of the last century, this
field has been a really rich source of surprisingly
138.189 -> deep math being incorporated into devices
we all use every day.
142.969 -> The goal here is to give you a very thorough
understanding of one of the earliest examples,
147.72 -> known as a Hamming code.
149.61 -> And by the way, the way I’m thinking about
the structure of this video is less about
152.58 -> explaining it as directly as possible, and
more a matter of prompting you to invent it
157.15 -> yourself, with some a little gentle guidance
here and there.
160.2 -> So when you feel like you see where it’s
going at some point, take that moment to pause,
163.989 -> actively predict what the scheme is going
to be before I tell you.
167.18 -> Also, if you want that understanding to get
down to the hardware level, Ben Eater has
171.469 -> made a video in conjunction with this one
showing you how to actually implement Hamming
175.42 -> codes on breadboards, which is extremely satisfying.
178.579 -> Now, you should know, Hamming codes are not
as widely used as more modern codes, like
182.98 -> the Reed Solomon algorithm.
184.769 -> But there is a certain magic to the contrast
between just how impossible the task feels
188.87 -> at the start, and how utterly reasonable it
is once you learn about Hamming.
194.2 -> The basic principle of error correction is
that in a vast space of all possible messages,
199.049 -> only some subset are going to be considered
valid messages.
202.829 -> As an analogy, think of correctly spelled
words vs incorrectly spelled words.
209.2 -> Whenever a valid message gets altered, the
receiver is responsible for correcting what
213.299 -> they see back to the nearest valid neighbor,
as you might do with a typo.
218.359 -> Coming up with a concrete algorithm to efficiently
categorize messages like this, though, take
223.12 -> a certain cleverness.
227.129 -> The story begins in the 1940s, when a young
Richard Hamming was working for Bell Labs
231.93 -> and some of his work involved using a very
big very expensive punchcard computer that
235.819 -> he had only limited access to.
237.209 -> And the programs he kept putting through it
kept failing because every now and then a
241.159 -> bit would get misread.
243.109 -> Frustration being the crucible for invention,
he got so fed up that he invented error correction
248.03 -> codes.
249.249 -> There are many different ways to frame Hamming
codes, but as a first pass, we're going to
253.159 -> go through it the way that Hamming himself
thought about them.
255.84 -> Let’s use an example that’s simple, but
not too simple: A block of 16 bits.
261.53 -> We’ll number the positions of these bits
from 0 up to 15.
265.66 -> The actual data we want to store is only going
to make up 12 of these bits, while 4 of the
270.89 -> positions will be reserved as a kind of redundancy.
273.9 -> The word “redundant” here doesn’t simply
mean copy, after all those four bits don’t
278.08 -> give us enough room to blindly copy the data.
280.25 -> Instead, they'll need to be a much more nuanced
and clever kind of redundancy, not adding
285.09 -> any new information, but adding resilience.
288.72 -> You might expect these four special bits to
come nicely packaged together, maybe at the
292.31 -> end or something like that, but as you’ll
see, having them sit in positions which are
296.06 -> powers of 2 allows for something that's really
elegant by the end.
299.88 -> It also might give you a little hint about
how this scales for larger blocks.
304.73 -> Also, technically it ends up being only 11
bits of data; you’ll find there’s a mild
309.17 -> nuance for what goes on with position 0, but
don’t worry about that for now.
314.23 -> Like any error correction algorithm, this
will involve two players: A sender who is
318.44 -> responsible for setting these four special
bits, and then a receiver who is responsible
322.59 -> for performing some checks and correcting
any errors.
326.32 -> Of course, the words “sender” and “receiver”
really refer to machines or software that's
329.8 -> doing all these checks, and the idea of a
message is meant really broadly, to include
333.79 -> things like storage.
335.46 -> After all, storing data is the thing as sending
a message, just from the past to the future
340.19 -> instead of from one place to another.
342.61 -> So that’s the setup, but before we can dive
in, we need to talk about a related idea which
346.91 -> was fresh on Hamming’s mind in the time
of his discovery, a method which lets you
351.16 -> detect any single-bit error, but not to correct
them, known in the business as a “parity
356.4 -> check”.
357.4 -> For a parity check, we separate out only a
single bit that the sender is responsible
361.26 -> for tuning, and the rest are free to carry
a message.
364.91 -> The only job of this special bit is to make
sure the total number of 1’s in the message
370.33 -> is an even number.
372.3 -> For example, right now that total number of
1’s is 7, that's odd, so the sender needs
376.89 -> to flip that special bit to be a 1, making
the count even.
380.94 -> But if the block had already started off with
an even number of 1’s, then this special
385.29 -> bit would have been kept at a 0.
387.44 -> This is pretty simple, deceptively simple,
but it’s an incredibly elegant way to distill
391.63 -> the idea of change anywhere in a message to
be reflected in a single bit of information.
397.13 -> Notice, if any bit of this message gets flipped,
either from 0 to 1 or 1 to 0, it changes the
403.65 -> total count of 1’s from being even to being
odd.
408.1 -> So if you're the receiver, you look at this
message, and you see an odd number of 1’s,
411.85 -> you can know, for sure, that some error occurred,
even though you might have no idea where it
417.11 -> was.
418.67 -> In the jargon, whether a group of bits has
an even or odd number of 1’s is known as
422.59 -> its “parity”.
424.92 -> You could also use numbers and say that the
parity is 0 or 1, which is typically more
428.77 -> helpful once you start doing math with the
idea.
431.29 -> And this special bit that the sender uses
to control the parity is called a “parity
436.13 -> bit”.
437.13 -> And actually, we should be clear, if the receiver
sees an odd parity, it doesn't necessarily
442.43 -> mean there was just 1 error.
443.7 -> There might have been 3 errors, or 5, or any
other odd number, but they can know for sure
448.42 -> that it wasn’t 0.
449.53 -> On the other hand, if there had been 2 errors,
or any even number of errors, that final count
455.11 -> of 1’s would still be even, so the receiver
can’t have full confidence that an even
459.55 -> count necessarily means the message is error-free.
463.36 -> You might complain that a method which gets
messed up by only two bit flips is pretty
467.17 -> weak, and you would be absolutely right!
469.22 -> Keep in mind, though, there is no method for
error detection, or correction, that could
474.33 -> give you 100% confidence that the message
you receive is the one the sender intended.
479.73 -> After all, enough random noise can always
change one valid message into another valid
483.75 -> message just by pure chance.
485.9 -> Instead, the goal is to come up with a scheme
that's robust up to a certain maximum number
490.52 -> of errors or maybe to reduce the probability
of a false positive like this.
496.46 -> Parity checks on their own are weak, but by
distilling the idea of change across a full
501.31 -> message down to a single bit, what they give
us is a powerful building block for more sophisticated
506.73 -> schemes.
507.98 -> For example, as Hamming was searching for
a way to identify where an error happened,
512.8 -> not just that it happened, his key insight
was that if you apply some parity checks,
517.07 -> not to the full message, but to certain carefully
selected subsets, you can ask a more refined
522.09 -> series of questions that pin down the location
of any single-bit error.
526.84 -> The overall feeling is a bit like playing
a game of 20 questions, asking yes or no queries
531.3 -> that chop the space of possibilities in half.
534.3 -> For example, let's say we do a parity check
on just these 8 bits, all the odd-numbered
539.21 -> positions.
540.39 -> Then if an error is detected, it gives the
receiver a little more information about where
544.55 -> specifically the error is; namely that it’s
in an odd position.
549.14 -> If no error is detected among those 8 bits,
it either means there's no error at all, or
554.59 -> it sits somewhere in the even position.
557.22 -> You might think that limiting a parity check
to half the bits makes it less effective,
561.54 -> but when it’s done in conjunction with other
well-chosen parity checks, it counterintuitively
565.43 -> gives us something more powerful.
568.41 -> To actually set up that parity check, remember,
it requires earmarking some special bit that
574.4 -> has control for the parity of that full group.
577.56 -> Here let's just choose position number 1.
579.54 -> So for the example shown, the parity of these
8 bits is currently odd, so the sender is
583.94 -> responsible for toggling that parity bit,
and now it's even.
588.2 -> This is only one out of four parity checks
we’ll do.
591 -> The second check is going to be among the
8 bits on the right half of the grid, at least
595.06 -> as we've drawn it here.
596.86 -> This time, we might use position number 2
as a parity bit.
599.85 -> So these 8 bits already have an even parity,
and the sender can feel good leaving that
604.46 -> bit number 2 unchanged.
606.6 -> Then, on the other end, if the receiver checks
the parity of this group and they find that
611 -> it’s odd, they’ll know that the error
is somewhere among these 8 bits on the right.
615.6 -> Otherwise, it means either there's no error,
or the error is somewhere on the left half.
620.5 -> Or I guess there could have been two errors,
but for right now we’re going to assume
624.18 -> that there’s at most one error in the entire
block; things break down completely for more
628.35 -> than that.
629.35 -> Here, before looking at the next two checks,
take a moment to think about what these first
632.89 -> two allow us to do when you consider them
together.
635.29 -> Let's say you detect an error among the odd
columns, and among the right half.
640.23 -> It necessarily means the error is somewhere
in the last column.
643.9 -> If there was no error in the odd column, but
there was one in the right half, well that
648.06 -> tells you it's in the second to last column.
650.14 -> Likewise, if there is an error in the odd
columns but not the right half, you know that
654.7 -> it's somewhere in the second column.
656.07 -> And then, if neither of those two parity checks
detects anything, it means the only place
660.79 -> an error could be is in that leftmost column,
but it also might simply mean there’s no
665.2 -> error at all.
666.46 -> Which is all a rather belabored way to say
that two parity checks let us pin down the
670.54 -> column.
671.62 -> From here, you can probably guess what follows.
673.98 -> We do basically the same thing, but for the
rows.
676.26 -> There’s going to be a parity check on the
odd rows, using position 4 as a parity bit.
681.07 -> So in this example, that group already has
even parity, so bit 4 would be set to a 0.
686.6 -> And finally there's a parity check on the
bottom two rows, using position 8 as a parity
691.25 -> bit.
692.25 -> In this case, it looks like the sender needs
to turn on bit 8 on in order to give that
695.85 -> group an even parity.
697.6 -> Just as the first two checks let us pin down
the column, these next two let you pin down
701.32 -> the row.
702.96 -> As an example, imagine that during transmission
there’s an error at, say position 3.
707.92 -> Well, this affects the first parity group,
and it also affects the second parity group,
712.22 -> so the receiver knows that there's an error
somewhere in that right column.
716.2 -> But it doesn’t affect the third group, and
it doesn't affect the fourth group.
721.35 -> And that lets the receiver pinpoint the error
up to the first row, which necessarily means
725.44 -> position 3, so they can fix the error.
728.62 -> You might enjoy taking a moment to convince
yourself that the answers to these four questions
732.6 -> really will always let you pin down a specific
location, no matter where they turn out to
736.839 -> be.
737.839 -> In fact, the astute among you might even notice
a connection between these questions and binary
742.88 -> counting.
743.88 -> And if you do, again let me emphasize, pause,
try for yourself to draw the connection before
747.81 -> I spoil it.
750.85 -> If you’re wondering what happens if a parity
bit itself gets affected…well you can just
755.53 -> try it.
756.55 -> Take a moment to think about how any error
among these four special bits is going to
760.7 -> be tracked down just like any other, with
the same game of 4 questions.
767.43 -> It doesn’t really matter, since at the end
of the day what we want is to protect the
770.49 -> message bits, the error correction bits are
just riding along, but protecting those bits
774.55 -> as well is something that naturally falls
out of this scheme as a byproduct.
779.24 -> You might also enjoy anticipating how this
scales.
782.37 -> If we used a block of size 256 bits, for example,
in order to pin down a location, you need
788.029 -> only 8 yes or no questions to binary search
your way down to some specific spot.
795.72 -> And remember, each question requires giving
up only a single bit to set the appropriate
799.7 -> parity check.
803.35 -> Some of you may already see it, but we’ll
talk later about the systematic way to find
806.96 -> what these questions are in just a minute
or two.
809.77 -> Hopefully, this sketch is enough to appreciate
the efficiency of what we’re developing
813.46 -> here.
814.48 -> Everything except for those 8 highlighted
parity bits, can be whatever you want it to
818.51 -> be, carrying whatever message or data you
want.
821.73 -> The 8 bits are “redundant” in the sense
that they’re completely determined by the
825.27 -> rest of the message, but it’s in a much
smarter way than simply copying the message
829.26 -> as a whole.
833.65 -> And still, for so little given up, you would
be able to identify and fix any single bit
839.07 -> error.
840.07 -> Well...almost.
841.07 -> Okay, so, the one problem here is that if
none of the four parity checks detect an error,
845.98 -> meaning that the specially selected subsets
of eights bits all have even parities just
850.11 -> like the sender intended, then it either means
there was no error at all, or it narrows us
855.61 -> down into position 0.
857.75 -> You see, with four yes or no questions, we
have 16 possible outcomes for our parity checks.
862.77 -> And at first, that feels perfect for pinpointing
1 of the 16 positions in the block.
867.91 -> But you also need to communicate a 17th outcome,
the “no error” condition.
873 -> The solution here is actually pretty simple:
Just forget about 0th bit entirely.
877.87 -> So when we do our four parity checks, and
we see that they're all even, it unambiguously
881.92 -> means that there is no error.
884.32 -> What that means, is that rather than working
with a 16-bit block, work with a 15-bit block,
889.24 -> where 11 of the bits are free to carry a message,
and 4 of them are there for redundancy.
894.05 -> And with that, we now have what people in
the business would refer to as a “(15, 11)
898.36 -> Hamming code”.
899.91 -> That said, it is nice to have block size that's
a clean power of 2, and there’s a clever
904.12 -> way that we can keep that 0th bit around and
get it to do a little extra work for us.
908.85 -> If we use it as a parity bit across the whole
block, it lets us actually detect, even though
913.55 -> we can't correct, 2-bit errors.
915.67 -> Here’s how it works.
917.41 -> After setting those four special error-correcting
bits, we set that 0th bit so that the parity
921.709 -> of the full block is even, just like a normal
parity check.
925.66 -> Now if there’s a single-bit error, then
the parity of the full block toggles to be
929.741 -> odd, but we would catch that anyway thanks
to our four error-correcting checks.
933.97 -> However, if there are two errors, then the
overall parity is going to toggle back to
938.06 -> being even, but the receiver will still see
that there’s been at least some error because
942.62 -> of what's going on with the four usual parity
checks.
945.79 -> So if they notice an even parity overall,
but something nonzero happening with the other
950.24 -> checks, it tells them there were at least
two errors.
952.86 -> Isn’t that clever?
954.41 -> Even though we can’t correct those two-bit
errors, just by putting that one little bothersome
958.63 -> 0th bit to work, it lets us detect them.
962.33 -> This is pretty standard, it's known as an
“extended” hamming code.
966.81 -> Technically speaking, you now have a full
description of what a Hamming code does, at
970.86 -> least for the example of a 16-bit block.
973.58 -> But I think you'll find it more satisfying
to check your understanding and solidify everything
977.3 -> up to this point by doing one full example
from start to finish yourself.
981.399 -> I’ll step through it with you, though, so
you can check yourself.
984.99 -> To set up a message, whether that’s a literal
message that you’re transmitting over space,
989.18 -> or some data that you want to store over time,
the first step is to divide it up into 11-bit
994.649 -> chunks.
995.68 -> Each chunk is going to get packaged into an
error-resistant 16-bit block, so let’s take
1000.77 -> this one as an example and actually work it
out.
1003.269 -> Go ahead, actually do it, pause and try putting
together this block
1006.74 -> ....okay, you ready?
1011.07 -> Remember, position 0 along with all the powers
of 2 are reserved for error-correction duty,
1019.21 -> so you start by placing the message bits in
all of the remaining spots, in order.
1025.709 -> You need this group to have even parity, which
it already does, so you should have set the
1030.13 -> parity bit in position 1 to be a 0.
1033.14 -> The next group starts off with odd parity,
so you should have set its parity bit to 1.
1039.28 -> The group after that starts with odd parity,
so again you should have set it’s parity
1043.549 -> bit to 1.
1044.549 -> And the final group also has odd parity, meaning
we set that bit in position 8 to be a 1.
1051.5 -> And then, as the final step, the full block
now has even parity, meaning that you can
1056.3 -> set that bit number 0, the overarching parity
bit, to be 0.
1061.49 -> So as this block is sent off, the parity of
the four special subsets, and the block as
1065.76 -> a whole, will all be even, or zero.
1068.94 -> As the second part of the exercise, let’s
have you play the role of the receiver.
1073.61 -> Of course, that would mean you don’t already
know what this message is; maybe some of you
1077.59 -> memorized it but, let’s assume that you
haven’t.
1080.16 -> What I’m going to do is change either zero,
one or two of the bits in that block, and
1085.32 -> then ask you to figure out what it is that
I did.
1088.12 -> So, again, pause and try working it out.
1095.62 -> ...okay, so you as the receiver now check
the first parity group and you can see that
1103.88 -> it’s even, so any error that exists would
have to be in an even column.
1109.799 -> The next check gives us an odd number, telling
us both that there’s at least one error,
1114.45 -> and narrowing us down into this specific column.
1118.73 -> The third check is even, chopping down the
possibilities even further.
1121.95 -> And the last parity check is odd, telling
us there’s an error somewhere in the bottom,
1126.78 -> which by now we can see must be in position
number 10.
1130.49 -> What’s more, the parity of the whole block
is odd, giving us confidence that there was
1136.11 -> one flip and not two (if it’s three or more,
all bets are off).
1141.57 -> After correcting that bit number 10, pulling
out the 11 bits that were not used for correction
1146.49 -> gives us the relevant segment of the original
message, which if you rewind and compare,
1151.41 -> is indeed exactly what we started the example
with.
1156.12 -> And now that you know how to do all this by
hand, I’d like to show you how you can carry
1159.23 -> out the core part of all of this logic with
a single line of python code.
1163.91 -> You see, what I haven’t told you yet is
just how elegant this algorithm really is,
1168.26 -> how simple it is to get a machine to point
to the position of an error, how to systematically
1172.35 -> scale it, and how we can frame all of this
as one single operation rather than multiple
1177.65 -> separate checks.
1179.549 -> To see what I mean, come join me in part 2.
Source: https://www.youtube.com/watch?v=X8jsijhllIA