Data Structures: Hash Tables

Aug 15, 2023

Data Structures: Hash Tables

Learn the basics of Hash Tables, one of the most useful data structures for solving interview questions. This video is a part of HackerRank’s Cracking The Coding Interview Tutorial with Gayle Laakmann McDowell. http://www.hackerrank.com/domains/tut…

Content

0 -> Hi, I'm Gayle Laakmann McDowell, author of Cracking the Coding Interview. Today's topic

3.929 -> is one that you really don't want to miss - hash tables. A hash table is

8.189 -> possibly the most useful data structure for interview questions. It comes up all the

12.809 -> time both in interviews and in real life. In fact one technique I often tell

17.52 -> people is just, for any problem, have hash tables at the top of your mind as a

22.02 -> possible technique to solve the problem.

24.39 -> So let's talk a bit about what a hash table is. At a high level a hash table is

29.099 -> a key value

30.33 -> look up. So it gives you a way of, given a key, associating a value with it for very

36.75 -> very quick lookups. So suppose you had some situation where you needed to

40.68 -> associate somebody's name with some set of information about them. A hash table would be

45.6 -> the perfect solution for this problem because you can just put this into the

49.92 -> hash table and then you can say, okay give me the data associated with Mary

53.789 -> and then boom we can get that information immediately. So in a

57.36 -> hash table the key as well as the value can be basically any type of data

60.93 -> structure. A string is often used but it could be a circle, a square, a person,

67.5 -> pretty much anything, as long as you have a hash function. So what does that mean?

72.54 -> Well let's turn to the implementation to talk about that.

75.57 -> So at a high-level we do want to store the objects in an array. So let's picture

80.04 -> that. But how do we actually jump from a string to a particular index in the

84.869 -> array?

85.56 -> Well that's what a hash function does. So a hash function takes in a string,

89.549 -> converts into some sort of integer, and then remaps that integer into an index

95.49 -> into that array. And then that's where we can find the person we're looking for. So

101.28 -> it's important to understand that the hash code is not actually the index inn

104.46 -> this array. We map actually from the key to the hashcode and then over to the

110.61 -> index. And one of the reasons for this is that the array that actually stores the

114.96 -> data from the hash table might be much much smaller than all the available

119.159 -> potential hash codes. And so we don't want to have an array of three billion

124.17 -> just because there's three billion potential hash codes.

126.89 -> We actually remap it into something smaller. Now note here that two different

131.63 -> strings could actually have the same hash code and that's because there are

135.38 -> an infinite number of strings out there but a finite number of hash codes. So

140.48 -> it's theoretically possible for Alex and Sarah to actually have the same hash

143.84 -> code.

144.44 -> Additionally since we're remapping the hashcode into an even smaller index, two

149.27 -> things with different hash codes could actually wind up mapped to the same index.

153.32 -> So what do we do

155.3 -> when this happens, which is called the collision? There are different ways of

158.03 -> resolving collisions and I really do encourage you to look this up on your

161.15 -> own time, but I'll just talk about one of them which is called chaining. And

165.62 -> chaining is possibly the most common one and it's very simple. But it basically

169.13 -> means is just, hey when there is collisions just store them in a linked

173.66 -> list. So rather than this being an array of people it's actually going to

178.4 -> be an array of a linked list of people. And so that means though that when you

184.28 -> get a call to say hey get the value of Alex, you actually need to look through

189.44 -> all the values in that linked list, and pull out the value for Alex. Now as

195.44 -> you'll note here this linked list contains not just the actual person objects but

200.39 -> the actual original keys as well. And the reason for that is that if you only

204.38 -> store the person object you'd see all these people who mapped to this index but

208.22 -> you wouldn't know which one they are. And so you actually need to store the keys with

211.28 -> them. So when you get a call to say get me the value for Alex, then you actually

215.959 -> call this hash function, you get a hash code back, you map over to this index and

221.72 -> then you walk through and look for the thing with the key of Alex. So that's the

225.829 -> basics of how a hash table operates. So let's walk through this from start to

230.72 -> end. We're going to have this hash table class that underneath it has a array

236.93 -> that actually holds the data. And the array isn't going to be an array of the actual

241.82 -> values it's going to be an array of the linked list of the values. Then when

247.28 -> someone says put the value of Alex, put the value Alex mapping to this person, we

253.43 -> call this hash code function that gets us back the hashcode. Then we map from

259.58 -> this hash

260.15 -> code over to an index and that gets us over to this index in the array and then we

265.43 -> put it into this linked list. If there's nothing else,

268.34 -> great, then we're just going to have a one element linked list. But

271.94 -> there could be multiple values in there, in which case we walk through it. Now

276.199 -> what if there is already a value of Alex in there?

279.289 -> Well then we just fix that value immediately. So let's talk now about the

285.08 -> runtime of a hash table.

287.57 -> Well it really depends on what we're talking about and what assumptions we we

290.27 -> make. In many cases we assume that we have a good hash table and a good hash

295.22 -> function that really distributes our values well and so for

299.3 -> the purpose of an interview, we often just summarize this and say okay,

305.21 -> getting and setting in a hash table is constant time. In reality if something

311.03 -> weird happens we have a bad hash function blah blah blah, we could also

314.63 -> say it is linear time in the very worst case. But for the purpose of most

319.639 -> problems we generally talk about constant time because in the real world

323.24 -> we're going to make pretty sure hopefully that we have a good hash table.

326.12 -> So this is a bit of a high level view of what a hash table is and how it works.

330.8 -> There's a whole lot of complexity you can dive into here about different ways

335.06 -> of handling collisions and exactly what happens when a hash table, you start to

340.55 -> put a ton of elements in the hash table, how does it grow with new elements?

344.9 -> Can it be resized? All that sort of stuff. I really do encourage you

349.07 -> to go look at this stuff on your own time.

351.32 -> It's a fantastic thing to really understand a lot of the complexity here.

355.61 -> But for the purpose of a lot of interview questions we just sort of

359.389 -> simplify all this down and summarize things as, let's assume we have a good

363.68 -> hash table and so we get this beautiful constant-time insert find and delete. So

369.44 -> now that you understand the basics,

371.36 -> go ahead and do try to learn these more complex details of hash tables, but also go

375.979 -> practice some of these basics on your own time on your own problems.

379.97 -> Good luck.

Source: https://www.youtube.com/watch?v=shs0KM3wKv8