
Data Structures: Hash Tables
Data Structures: Hash Tables
Learn the basics of Hash Tables, one of the most useful data structures for solving interview questions. This video is a part of HackerRank’s Cracking The Coding Interview Tutorial with Gayle Laakmann McDowell. http://www.hackerrank.com/domains/tut…
Content
0 -> Hi, I'm Gayle Laakmann McDowell, author of Cracking the Coding Interview. Today's topic
3.929 -> is one that you really don't want to
miss - hash tables. A hash table is
8.189 -> possibly the most useful data structure
for interview questions. It comes up all the
12.809 -> time both in interviews and in real life.
In fact one technique I often tell
17.52 -> people is just, for any problem, have hash
tables at the top of your mind as a
22.02 -> possible technique to solve the problem.
24.39 -> So let's talk a bit about what a hash
table is. At a high level a hash table is
29.099 -> a key value
30.33 -> look up. So it gives you a way of, given a
key, associating a value with it for very
36.75 -> very quick lookups. So suppose you had
some situation where you needed to
40.68 -> associate somebody's name with some set
of information about them. A hash table would be
45.6 -> the perfect solution for this problem
because you can just put this into the
49.92 -> hash table and then you can say, okay
give me the data associated with Mary
53.789 -> and then boom we can get that
information immediately. So in a
57.36 -> hash table the key as well as the value
can be basically any type of data
60.93 -> structure. A string is often used but it
could be a circle, a square, a person,
67.5 -> pretty much anything, as long as you have
a hash function. So what does that mean?
72.54 -> Well let's turn to the implementation to
talk about that.
75.57 -> So at a high-level we do want to store the
objects in an array. So let's picture
80.04 -> that. But how do we actually jump from
a string to a particular index in the
84.869 -> array?
85.56 -> Well that's what a hash function does. So
a hash function takes in a string,
89.549 -> converts into some sort of integer, and
then remaps that integer into an index
95.49 -> into that array. And then that's where we
can find the person we're looking for. So
101.28 -> it's important to understand that the
hash code is not actually the index inn
104.46 -> this array. We map actually from the key
to the hashcode and then over to the
110.61 -> index. And one of the reasons for this is
that the array that actually stores the
114.96 -> data from the hash table might be much
much smaller than all the available
119.159 -> potential hash codes. And so we don't
want to have an array of three billion
124.17 -> just because there's three billion
potential hash codes.
126.89 -> We actually remap it into something
smaller. Now note here that two different
131.63 -> strings could actually have the same
hash code and that's because there are
135.38 -> an infinite number of strings out there
but a finite number of hash codes. So
140.48 -> it's theoretically possible for Alex and
Sarah to actually have the same hash
143.84 -> code.
144.44 -> Additionally since we're remapping the
hashcode into an even smaller index, two
149.27 -> things with different hash codes could
actually wind up mapped to the same index.
153.32 -> So what do we do
155.3 -> when this happens, which is called the
collision? There are different ways of
158.03 -> resolving collisions and I really do
encourage you to look this up on your
161.15 -> own time, but I'll just talk about one of
them which is called chaining. And
165.62 -> chaining is possibly the most common one
and it's very simple. But it basically
169.13 -> means is just, hey when there is
collisions just store them in a linked
173.66 -> list. So rather than this being an
array of people it's actually going to
178.4 -> be an array of a linked list of people.
And so that means though that when you
184.28 -> get a call to say hey get the value of
Alex, you actually need to look through
189.44 -> all the values in that linked list,
and pull out the value for Alex. Now as
195.44 -> you'll note here this linked list contains
not just the actual person objects but
200.39 -> the actual original keys as well. And the
reason for that is that if you only
204.38 -> store the person object you'd see all
these people who mapped to this index but
208.22 -> you wouldn't know which one they are. And
so you actually need to store the keys with
211.28 -> them. So when you get a call to say get
me the value for Alex, then you actually
215.959 -> call this hash function, you get a hash
code back, you map over to this index and
221.72 -> then you walk through and look for the thing
with the key of Alex. So that's the
225.829 -> basics of how a hash table operates. So
let's walk through this from start to
230.72 -> end. We're going to have this hash table
class that underneath it has a array
236.93 -> that actually holds the data. And the array
isn't going to be an array of the actual
241.82 -> values it's going to be an array of the
linked list of the values. Then when
247.28 -> someone says put the value of Alex, put the
value Alex mapping to this person, we
253.43 -> call this hash code function that gets
us back the hashcode. Then we map from
259.58 -> this hash
260.15 -> code over to an index and that gets us over
to this index in the array and then we
265.43 -> put it into this linked list. If there's
nothing else,
268.34 -> great, then we're just going to
have a one element linked list. But
271.94 -> there could be multiple values in there, in
which case we walk through it. Now
276.199 -> what if there is already a value of
Alex in there?
279.289 -> Well then we just fix that value
immediately. So let's talk now about the
285.08 -> runtime of a hash table.
287.57 -> Well it really depends on what we're
talking about and what assumptions we we
290.27 -> make. In many cases we assume that we
have a good hash table and a good hash
295.22 -> function that really distributes our
values well and so for
299.3 -> the purpose of an interview, we often
just summarize this and say okay,
305.21 -> getting and setting in a hash table is
constant time. In reality if something
311.03 -> weird happens we have a bad hash
function blah blah blah, we could also
314.63 -> say it is linear time in the very worst
case. But for the purpose of most
319.639 -> problems we generally talk about
constant time because in the real world
323.24 -> we're going to make pretty sure
hopefully that we have a good hash table.
326.12 -> So this is a bit of a high level view of
what a hash table is and how it works.
330.8 -> There's a whole lot of complexity you
can dive into here about different ways
335.06 -> of handling collisions and exactly what
happens when a hash table, you start to
340.55 -> put a ton of elements in the hash table,
how does it grow with new elements?
344.9 -> Can it be resized? All that sort of stuff. I really do encourage you
349.07 -> to go look at this stuff on your own
time.
351.32 -> It's a fantastic thing to really
understand a lot of the complexity here.
355.61 -> But for the purpose of a lot of
interview questions we just sort of
359.389 -> simplify all this down and summarize
things as, let's assume we have a good
363.68 -> hash table and so we get this beautiful
constant-time insert find and delete. So
369.44 -> now that you understand the basics,
371.36 -> go ahead and do try to learn these more
complex details of hash tables, but also go
375.979 -> practice some of these basics on your
own time on your own problems.
379.97 -> Good luck.
Source: https://www.youtube.com/watch?v=shs0KM3wKv8