Data Structures: Hash Tables

Data Structures: Hash Tables


Data Structures: Hash Tables

Learn the basics of Hash Tables, one of the most useful data structures for solving interview questions. This video is a part of HackerRank’s Cracking The Coding Interview Tutorial with Gayle Laakmann McDowell. http://www.hackerrank.com/domains/tut


Content

0 -> Hi, I'm Gayle Laakmann McDowell, author of Cracking the Coding Interview. Today's topic
3.929 -> is one that you really don't want to miss - hash tables. A hash table is
8.189 -> possibly the most useful data structure for interview questions. It comes up all the
12.809 -> time both in interviews and in real life. In fact one technique I often tell
17.52 -> people is just, for any problem, have hash tables at the top of your mind as a
22.02 -> possible technique to solve the problem.
24.39 -> So let's talk a bit about what a hash table is. At a high level a hash table is
29.099 -> a key value
30.33 -> look up. So it gives you a way of, given a key, associating a value with it for very
36.75 -> very quick lookups. So suppose you had some situation where you needed to
40.68 -> associate somebody's name with some set of information about them. A hash table would be
45.6 -> the perfect solution for this problem because you can just put this into the
49.92 -> hash table and then you can say, okay give me the data associated with Mary
53.789 -> and then boom we can get that information immediately. So in a
57.36 -> hash table the key as well as the value can be basically any type of data
60.93 -> structure. A string is often used but it could be a circle, a square, a person,
67.5 -> pretty much anything, as long as you have a hash function. So what does that mean?
72.54 -> Well let's turn to the implementation to talk about that.
75.57 -> So at a high-level we do want to store the objects in an array. So let's picture
80.04 -> that. But how do we actually jump from a string to a particular index in the
84.869 -> array?
85.56 -> Well that's what a hash function does. So a hash function takes in a string,
89.549 -> converts into some sort of integer, and then remaps that integer into an index
95.49 -> into that array. And then that's where we can find the person we're looking for. So
101.28 -> it's important to understand that the hash code is not actually the index inn
104.46 -> this array. We map actually from the key to the hashcode and then over to the
110.61 -> index. And one of the reasons for this is that the array that actually stores the
114.96 -> data from the hash table might be much much smaller than all the available
119.159 -> potential hash codes. And so we don't want to have an array of three billion
124.17 -> just because there's three billion potential hash codes.
126.89 -> We actually remap it into something smaller. Now note here that two different
131.63 -> strings could actually have the same hash code and that's because there are
135.38 -> an infinite number of strings out there but a finite number of hash codes. So
140.48 -> it's theoretically possible for Alex and Sarah to actually have the same hash
143.84 -> code.
144.44 -> Additionally since we're remapping the hashcode into an even smaller index, two
149.27 -> things with different hash codes could actually wind up mapped to the same index.
153.32 -> So what do we do
155.3 -> when this happens, which is called the collision? There are different ways of
158.03 -> resolving collisions and I really do encourage you to look this up on your
161.15 -> own time, but I'll just talk about one of them which is called chaining. And
165.62 -> chaining is possibly the most common one and it's very simple. But it basically
169.13 -> means is just, hey when there is collisions just store them in a linked
173.66 -> list. So rather than this being an array of people it's actually going to
178.4 -> be an array of a linked list of people. And so that means though that when you
184.28 -> get a call to say hey get the value of Alex, you actually need to look through
189.44 -> all the values in that linked list, and pull out the value for Alex. Now as
195.44 -> you'll note here this linked list contains not just the actual person objects but
200.39 -> the actual original keys as well. And the reason for that is that if you only
204.38 -> store the person object you'd see all these people who mapped to this index but
208.22 -> you wouldn't know which one they are. And so you actually need to store the keys with
211.28 -> them. So when you get a call to say get me the value for Alex, then you actually
215.959 -> call this hash function, you get a hash code back, you map over to this index and
221.72 -> then you walk through and look for the thing with the key of Alex. So that's the
225.829 -> basics of how a hash table operates. So let's walk through this from start to
230.72 -> end. We're going to have this hash table class that underneath it has a array
236.93 -> that actually holds the data. And the array isn't going to be an array of the actual
241.82 -> values it's going to be an array of the linked list of the values. Then when
247.28 -> someone says put the value of Alex, put the value Alex mapping to this person, we
253.43 -> call this hash code function that gets us back the hashcode. Then we map from
259.58 -> this hash
260.15 -> code over to an index and that gets us over to this index in the array and then we
265.43 -> put it into this linked list. If there's nothing else,
268.34 -> great, then we're just going to have a one element linked list. But
271.94 -> there could be multiple values in there, in which case we walk through it. Now
276.199 -> what if there is already a value of Alex in there?
279.289 -> Well then we just fix that value immediately. So let's talk now about the
285.08 -> runtime of a hash table.
287.57 -> Well it really depends on what we're talking about and what assumptions we we
290.27 -> make. In many cases we assume that we have a good hash table and a good hash
295.22 -> function that really distributes our values well and so for
299.3 -> the purpose of an interview, we often just summarize this and say okay,
305.21 -> getting and setting in a hash table is constant time. In reality if something
311.03 -> weird happens we have a bad hash function blah blah blah, we could also
314.63 -> say it is linear time in the very worst case. But for the purpose of most
319.639 -> problems we generally talk about constant time because in the real world
323.24 -> we're going to make pretty sure hopefully that we have a good hash table.
326.12 -> So this is a bit of a high level view of what a hash table is and how it works.
330.8 -> There's a whole lot of complexity you can dive into here about different ways
335.06 -> of handling collisions and exactly what happens when a hash table, you start to
340.55 -> put a ton of elements in the hash table, how does it grow with new elements?
344.9 -> Can it be resized? All that sort of stuff. I really do encourage you
349.07 -> to go look at this stuff on your own time.
351.32 -> It's a fantastic thing to really understand a lot of the complexity here.
355.61 -> But for the purpose of a lot of interview questions we just sort of
359.389 -> simplify all this down and summarize things as, let's assume we have a good
363.68 -> hash table and so we get this beautiful constant-time insert find and delete. So
369.44 -> now that you understand the basics,
371.36 -> go ahead and do try to learn these more complex details of hash tables, but also go
375.979 -> practice some of these basics on your own time on your own problems.
379.97 -> Good luck.

Source: https://www.youtube.com/watch?v=shs0KM3wKv8