Algorithms Course - Graph Theory Tutorial from a Google Engineer

Aug 15, 2023

Algorithms Course - Graph Theory Tutorial from a Google Engineer

This full course provides a complete introduction to Graph Theory algorithms in computer science. Knowledge of how to create and design excellent algorithms is an essential skill required in becoming a great programmer.

You will learn how many important algorithms work. The algorithms are accompanied by working source code in Java to solidify your understanding.

💻 Code: https://github.com/williamfiset/algor…
🔗 Slides: https://github.com/williamfiset/Algor…

🎥 Course created by William Fiset. Check out his YouTube channel: / @williamfiset-videos

⭐️ Course Contents ⭐️
⌨️ (0:00:00) Graph Theory Introduction
⌨️ (0:13:53) Problems in Graph Theory
⌨️ (0:23:15) Depth First Search Algorithm
⌨️ (0:33:18) Breadth First Search Algorithm
⌨️ (0:40:27) Breadth First Search grid shortest path
⌨️ (0:56:23) Topological Sort Algorithm
⌨️ (1:09:52) Shortest/Longest path on a Directed Acyclic Graph (DAG)
⌨️ (1:19:34) Dijkstra’s Shortest Path Algorithm
⌨️ (1:43:17) Dijkstra’s Shortest Path Algorithm | Source Code
⌨️ (1:50:47) Bellman Ford Algorithm
⌨️ (2:05:34) Floyd Warshall All Pairs Shortest Path Algorithm
⌨️ (2:20:54) Floyd Warshall All Pairs Shortest Path Algorithm | Source Code
⌨️ (2:29:19) Bridges and Articulation points Algorithm
⌨️ (2:49:01) Bridges and Articulation points source code
⌨️ (2:57:32) Tarjans Strongly Connected Components algorithm
⌨️ (3:13:56) Tarjans Strongly Connected Components algorithm source code
⌨️ (3:20:12) Travelling Salesman Problem | Dynamic Programming
⌨️ (3:39:59) Travelling Salesman Problem source code | Dynamic Programming
⌨️ (3:52:27) Existence of Eulerian Paths and Circuits
⌨️ (4:01:19) Eulerian Path Algorithm
⌨️ (4:15:47) Eulerian Path Algorithm | Source Code
⌨️ (4:23:00) Prim’s Minimum Spanning Tree Algorithm
⌨️ (4:37:05) Eager Prim’s Minimum Spanning Tree Algorithm
⌨️ (4:50:38) Eager Prim’s Minimum Spanning Tree Algorithm | Source Code
⌨️ (4:58:30) Max Flow Ford Fulkerson | Network Flow
⌨️ (5:11:01) Max Flow Ford Fulkerson | Source Code
⌨️ (5:27:25) Unweighted Bipartite Matching | Network Flow
⌨️ (5:38:11) Mice and Owls problem | Network Flow
⌨️ (5:46:11) Elementary Math problem | Network Flow
⌨️ (5:56:19) Edmonds Karp Algorithm | Network Flow
⌨️ (6:05:18) Edmonds Karp Algorithm | Source Code
⌨️ (6:10:08) Capacity Scaling | Network Flow
⌨️ (6:19:34) Capacity Scaling | Network Flow | Source Code
⌨️ (6:25:04) Dinic’s Algorithm | Network Flow
⌨️ (6:36:09) Dinic’s Algorithm | Network Flow | Source Code

—

Learn to code for free and get a developer job: https://www.freecodecamp.org

Read hundreds of articles on programming: https://www.freecodecamp.org/news

Content

1.03 -> Hello and welcome. My name is William and I'm super excited to bring to you this video

5.979 -> series focused on graph theory. graph theory is one of my absolute favorite topics in computer

12.98 -> science, we're going to see a lot of very awesome algorithms. The whole field is very

19.43 -> diverse and hugely applicable to real world applications. I think everybody should be

25.54 -> able to learn, love and enjoy graph theory. These first few videos are going to be a ramp

31.73 -> up the dose to introduce the topics of how we store represent and traverse graphs on

38.47 -> a computer. By the way, this whole video series will be taking on a computer science point

44.539 -> of view of graph theory rather than a mathematical one. So we won't be covering proofs, and so

51.789 -> on per se. Instead, we'll be looking at algorithm implementation details and code. So what is

59.51 -> graph theory? In essence, it is the study of properties and applications of graphs,

66.32 -> which common folk or non mathematical folks call networks. This is a very broad topic,

73.73 -> and my goal with this video series is to teach you how to apply graph theory to real world

79.75 -> situations. graphs can be used to represent almost any problem which makes them so interesting,

89.2 -> because they pop up absolutely everywhere. A simple problem that can be phrased as a

95.219 -> graph theory problem might be given the constraints in this picture, how many different sets of

102.119 -> clothes Can I make choosing an article from each category? Of course, this could be phrased

109.2 -> and solved using only mathematics. But the advantage to graph theory is that it allows

115.369 -> us to visualize the problem using nodes to represent an article of clothing and edges

121.14 -> to represent relationships between them. Another canonical example of a graph theory problem

127.649 -> is a social network of friends. A graph representation enables us to answer interesting questions

134.37 -> such as how many friends this person x have, or how many degrees of separation are there

140.62 -> between person x and person y. Now, we have talked about different types of graphs. There

147.22 -> are many different types of graph representations. And it's really important, I mean really important

153.76 -> to be able to recognize what type of graph you're working with, and especially when you're

158.42 -> programming and trying to solve a particular problem. This first type of graph is an undirected

165.5 -> graph. It's the most simple kind of graph you'll encounter. And it is where edges have

170.959 -> no orientation. That is, if there's an edge from node you to node v, it is identical to

177.78 -> the edge from V to U. For instance, in the following graph, nodes are cities and edges

185.04 -> represent bi directional roads. Since if you drive from one city to another, you can always

190.629 -> retrace your steps by driving the other way. In contrast, to undirected graphs, there are

198.129 -> also directed graphs, sometimes called die graphs. In these graphs, you've guessed it,

205.26 -> the edges are directed. So if we have an edge from u to v, then you can only go from node

212.189 -> to node v, not the other way around. In this graph, you can see that the edges are directed

217.989 -> because of the arrowheads on the edges between nodes. This graph could represent people who

224.61 -> bought each other gifts. So an incoming edge represents receiving a gift and an outgoing

231.13 -> edge represents giving a gift. Therefore, person e in this graph bought person d a gift,

239 -> Person A bought themselves and Person B a gift, and person F, about nobody any gifts

246.01 -> and received none. So far, we've only seen unweighted graphs, but edges on graphs can

253.87 -> contain weights to represent arbitrary values such as cost, distance, quantity, you name

262 -> it. weighted graphs come in both directed and undirected flavors. As a side note, I

269.13 -> will usually denote an edge of a graph as a triplet u, v, w to indicate where an edge

276.68 -> is coming from, where it's going to and what its weight is. Of course, with this notation,

283.08 -> I also need to specify whether the graph is directed or undirected.

290.23 -> Next up, I just want to have a quick chat about special types of graphs and graph theory.

296.19 -> There are so many different types of graphs that I only had to Select a few which will

301.36 -> be most relevant for this upcoming video series. The most important type of special graph is

308.21 -> definitely the tree. A tree is simply an undirected graph with no cycles. There are several equivalent

316.729 -> definitions of a tree such as a graph with n nodes and n minus one edges. All the graphs

324.65 -> below are trees. Yes, even the leftmost one since it has no cycles. A related but totally

333.14 -> different type of graph is a rooted tree. The distinction here is that a rooted tree

339.21 -> has a designated root node, where every edge either points away from or towards the root

345.77 -> node. When edges point away from the root node. The graph is called an ABA ressence,

352.78 -> or an outreach and an anti arborescens or entry otherwise, out trees are by far more

359.56 -> common than entries. From what I've observed. It is also fairly common for people to refer

365.66 -> to a rooted tree simply as a tree instead of an in or out tree. But there is an important

372.639 -> distinction there. Next are directed acyclic graphs. These are graphs with directed edges

380.889 -> and no cycles. these graphs are very important and fairly common in computer science, actually,

388.12 -> since they often are present structures with dependencies, such as a scheduler, a build

395.25 -> system, a compiler, maybe, or perhaps more relatable University class prerequisites.

402.599 -> There are several efficient algorithms that we'll be looking at to deal specifically with

407.77 -> directed acyclic graphs, such as how to find the shortest path and produce a topological

413.47 -> ordering of nodes. A topological ordering of nodes is an ordering of nodes that tells

418.72 -> you how to process the nodes of the graph so you don't perform a task before first having

425.15 -> completed all its dependencies. For example, a topological ordering of class prerequisites

431.73 -> would tell you to take intro biology and intro chemistry before taking a class on say genomics.

440.55 -> This next type of special graph is a bipartite graph. It is one whose vertices can be split

447.83 -> into two independent groups, u and v such that every edge connects between u and v.

456.02 -> This is just a fancy way of saying that the graph is two colorable or that there are no

462.11 -> odd length cycles and graph often, a problem we like to ask is what is the maximum matching,

471.07 -> we can create on a bipartite graph? Suppose white nodes are jobs and red nodes are people

478.31 -> then we can ask how many people can be matched to jobs. In this case, there are a lot of

484.979 -> edges in each graph. So I think the answer for both is four. But in general, it's not

490.86 -> so easy if there are less edges, tougher constraints and more conflicts. bipartite graphs also

498.11 -> play a critical role in the field of network flow, which we will talk about later. This

505.331 -> last type of graph is a complete graph is one where there is a unique edge between every

511.4 -> pair of nodes in the graph. A complete graph with n vertices is denoted as the graph K

519.47 -> sub n, I have listed k one through k six on the bottom. And you can easily see how this

527.33 -> scales when we add more notes. Complete graphs are often seen as the worst case possible

533.87 -> graph you can possibly encounter, because of how many edges there are. So if you want

538.77 -> to test your algorithm for performance, a complete graph is an easy way to start. One

544.29 -> thing we're going to have to be really cognizant about is how we're actually representing our

551.05 -> graphs on the computer. This isn't just what type of graph it is, but what type of data

557.84 -> structure it

561.5 -> but what type of data structure are we representing our graph with. And this can have a huge impact

568.66 -> on performance. The simplest way is inside a 2d adjacency matrix. The idea is that the

578.34 -> cell MI j represents the edge weight of going from node i to node j. So in the graph below,

587.31 -> there are four nodes. So I create a four by four matrix and populate the graph with the

594.25 -> edge weights. If you look at the edge weight from node C to know D, you'll see that Hasn't

600.52 -> agitative to. So in a row three and column four of the matrix there is a value of two.

607.79 -> Note that is often assumed that the edge of going from a node to itself has a cost of

614.59 -> zero, which is why the diagonal of the matrix has all zero values. This matrix form has

622.27 -> several advantages. First, that it's very space efficient. For dense graphs. Those graphs

629.76 -> with a lot of edges, the ED Jwi lookup can be found in constant time, which is quite

636.91 -> nice. And lastly, I would argue that it is the simplest form of graph representation

642.82 -> you can have. On the downside however, the main reason people don't go for the adjacency

648.68 -> matrix as their first pick, is because it requires v squared space, which is, well a

656.1 -> lot of space. In practice graphs with 10,000 nodes or more started to become infeasible

663.21 -> very quickly. The other issue with the adjacency matrix is that it requires v squared work

670.31 -> to iterate over all the edges of your graph. This is fine for dense graphs with lots of

675.9 -> edges. But it isn't so great for sparse graphs, since most cells will be empty. The main alternative

683.26 -> to the adjacency matrix is the adjacency list, which is a way to represent a graph as a map

691.27 -> of nodes to list of edges. The idea is that each node tracks all of its outgoing edges.

698.95 -> For example, node C has three outgoing edges. So the map entry for C will track the edge

706.47 -> from C to A with costs for the Add from C to B, with cost one, and edge from C to D.

714.69 -> with cost to notice that, in the list of edges, we only need to track two things, the node

723.53 -> we're going to and the cost to get there, we don't need to keep track of where we came

729.1 -> from, because that's already implicitly known. The nice thing about adjacency lists is that

737.06 -> it is great for sparse graphs, because it only tracks the edges that you have, and doesn't

741.93 -> allocate additional memory that you might not use like the adjacency matrix does. This

747.181 -> also means it's efficient when iterating over all the edges. The main disadvantage to using

753.74 -> an adjacency list is that it is less space efficient on denser graphs. Another subtle

760.66 -> disadvantage is that it takes big O of E time to access a specific edges weight, although

768.38 -> in practice, you rarely or if ever actually need to do this.

773.9 -> The last representation

775.08 -> I want to talk about is the edge list. an edge list is a way to represent a graph simply

780.72 -> as an unordered list of edges. Basically, it's exactly what it sounds like a list of

786.29 -> edges. Assume that the notation for any triplet u, v w means the cost from node u to node

795.89 -> v is W. So for this graph, the edge list is simply a list of six edges represented has

805.65 -> those triplets. This representation is very simple. However, it lacks structure. And that's

813.24 -> why it is seldomly used. advantage to the Angeles is is great for sparse graphs. iterating

820.59 -> over all the edges is super easy, and the structure is simple. The downside is that

826.41 -> edge lookup can be slow, and you can run into memory issues on large graphs. Today, I'm

832.34 -> going to talk about common problems in graph theory. A lot of problems you will encounter

837.9 -> can often be reduced to a famous or well known problem or some variant thereof. So it's important

845.03 -> to be able to familiarize ourselves with common graph theory problems and their solutions.

853.68 -> Just before getting started falling off from what we learned in the last video about representing

859.77 -> graphs, I want you to think about how you would store and represent the graphs. And

866.16 -> the upcoming problems I'm going to describe, in particular, is the graph and the problem

872.02 -> I'm describing directed or undirected, are the edges of the graph. weighted or unweighted

878.87 -> is the common use case, a graph that is likely to be sparse or dense with edges? And lastly,

887.74 -> should I use an adjacency matrix and adjacency list and edge lists or some other structure

893.46 -> to represent my graph efficiently? So one of the most if not the most common common

901.01 -> problem in graph theory is the shortest path problem. Given a weighted graph, find the

906.66 -> shortest path of edges from node A to node B. So if we pretend this graph represents

914.13 -> a road system, and were at node A and want to get to note H, our shortest path algorithm

921.02 -> should be able to find us a list of edges to follow that will lead us from A to h with

926.38 -> a minimal cost. Lucky for us, many algorithms exist to solve the shortest path problem,

934.38 -> including a breadth first search for unweighted graphs. Dykstra is algorithm Bellman Ford,

940.16 -> a star and many more. As simple as it sounds, connectivity is a big issue in graph theory.

946.98 -> The problem can also be simplified to does there exist a path from node A to node B in

953.55 -> this scenario, we don't care about the minimum costs, we just want to know. Can one node

960.41 -> reach another node? A typical solution to this problem is to use a union find data structure,

967.93 -> or do a very basic search algorithm such as a depth first search or a breadth first search.

975.84 -> Another common problem is detecting negative cycles in a directed graph. Sometimes we're

981.67 -> dealing with graphs that have negative edge weights. And we need to know if a native cycle

987.8 -> exists, because if there does, it can throw everything off. In this graph nodes One, two

994.1 -> and three form a negative cycle. Because if you cycle through all the nodes, you end up

999.98 -> with a cost of negative one if you add up all the edge weights, in fact, you can cycle

1006.67 -> endlessly getting smaller and smaller costs. In the context of finding the shortest path

1013.1 -> a negative cycle is like a trap that you can never escape. However, there are some contexts

1020.05 -> where negative cycles are beneficial. Suppose we're trading currencies across an exchange

1026.14 -> or multiple exchanges. Currency prices try to remain consistent throughout the day across

1032.92 -> exchanges, such as trading USD to euros or Canadian t yen. But sometimes there are in

1039.27 -> consistencies in the currency exchange prices. This makes it possible to do something called

1045.05 -> an arbitrage, which cycles through multiple currencies exchanging one currency for another

1050.38 -> and coming back to the original currency with more money than you originally started, at

1056.14 -> a risk free gain. This is something we can use graph theory for, because it uses detecting

1062.76 -> negative cycles. There are two well known algorithms that can detect negative cycles.

1067.76 -> And those are Bellman Ford and Floyd warshall. Some thing that comes up now and again is

1074.9 -> finding strongly connected components within a graph. This is analogous to finding connected

1081.05 -> components of an undirected graph. But for directed graphs, when looking at strongly

1088.21 -> connected components, we're looking for self contained cycles within the graph where every

1093.76 -> vertex in a given cycle should be able to reach every other vertex in that same cycle.

1099.82 -> This is very useful in many algorithms as usually an intermediate step. So it's important

1105.86 -> to know how to find these strongly connected components. And there are many very elegant

1111.57 -> algorithms to do so such as Tarzan's algorithm, you probably won't go through your computer

1117.65 -> science career without hearing about the traveling salesperson problem. The tsp problem is the

1125.49 -> problem of having n cities and the distances between each of them and finding the shortest

1131.11 -> path that visits each city and comes back to the original city at minimum cost. For

1138.26 -> example, if your graph is the one on the left, a possible tsp solution is the graph on the

1143.8 -> right, which has a cost of nine. The tsp problem is NP hard, meaning it is computationally

1150.77 -> challenging problem. This is unfortunate because the TSP problem has several very important

1158.3 -> applications. Some famous algorithms we can use to actually solve this problem or the

1164.06 -> healed Karp algorithm with dynamic programming doing some kind of branching and bounding

1168.9 -> algorithm or you can use one of many many approximation algorithms such as the ant colony

1176.07 -> optimization. This next problem I want to talk about is finding bridges in the graph,

1182.3 -> which is something of a fascination to me. bridges are edges which if removed, increase

1188.28 -> the number of connected components in a graph. And this graph the edges highlighted in pink

1194.49 -> are bridges.

1195.93 -> bridges are

1196.93 -> important in graph theory because they often hint at what points, bottlenecks or vulnerabilities

1202.74 -> in a graph, think of your graph as a telephone network or a set of bridges between islands,

1209.01 -> you can immediately see the usefulness of detecting bridges related to bridges, but

1214.44 -> not the same articulation points, which are nodes that if removed, increase the number

1219.92 -> of connected components in the graph. In this same graph, you can see the three articulation

1226.16 -> points highlighted in pink. Next problem is finding the minimum spanning tree of a graph.

1234.33 -> A minimum spanning tree is a subset of the edges that connects all the vertices together

1239.79 -> without any cycles and with minimal possible cost. So in summary, it's a tree meaning it

1246.19 -> has no cycles, and it spans the graph at a minimal cost. Hence why we give it the name

1253.5 -> minimum spanning tree. For example, in the graph below, one of the possible minimum spanning

1259.92 -> trees is this graph with a least cost of 12. Note that all minimum spanning trees of a

1267.27 -> graph have the same minimal cost, but are not necessarily identical. minimum spanning

1274 -> trees are seen in lots of different applications in computer science, including designing and

1279.27 -> least cost network circuit design, transportation networks, you name it. There's also several

1285.96 -> approximation algorithms which rely on minimum spanning trees, which is pretty interesting.

1291.97 -> If you want to find a minimum spanning tree of a graph, I recommend using one of Kuru

1296.87 -> schools prims, or beruf cause algorithm. This last problem, I think, is the most fascinating

1303.23 -> and it is about finding the maximum flow

1306.06 -> through

1307.06 -> a special type of graph called a flow network. Flow networks are networks where edge weights

1313.66 -> represent capacities and some sense capacities might be things like the maximum amount of

1320.29 -> cars that fit on a road, or the maximum amount of volume that can flow through a pipe, or

1326.81 -> even the number of boats a river can sustain without destroying the environment. And these

1332.11 -> types of flow networks, we often find ourselves asking the question, with an infinite input

1339.39 -> source, that is cars, water boats, whatever, how much flow? Can I push through the network?

1347.85 -> Assuming we start at some source and try and make it to some sync node? This question is

1353.8 -> important, because at some point, there is bound to be a bottleneck somewhere in our

1358.93 -> flow graph that limits the amount of stuff we can have traveling on the network, making

1366.01 -> it from point A to point B, the maximum flow would then represent things like

1371.55 -> the volume

1372.76 -> of water allowed to flow through the network of pipes, the number of cards, the roads consisting

1379.02 -> and traffic or the maximum amount of boats allowed on the river. With these maximum flow

1385.83 -> problems, we can identify the bottlenecks that slow down the whole network and fix the

1391.55 -> edges that have lower capacities. We're moving on to talking about the depth first search

1397.65 -> algorithm, which plays a central role in several graph theory algorithms. So what is the depth

1404.71 -> first search? A depth first search is a core algorithm in graph theory, that allows you

1409.99 -> to explore nodes and edges of a graph. So it's a form of traversal algorithm. The nice

1416.75 -> thing about a depth first search is that it's really easy to code. And it runs in time complexity

1424.75 -> of big O of a V plus e,

1428.18 -> that is vertices plus edges, which is directly proportional to the size of your graph. By

1435.16 -> itself. A depth first search isn't all that useful. But when argumented to perform other

1441.45 -> tasks, such as count connected components, determine connectivity between nodes, or find

1449.01 -> bridges and articulation points, the depth first search algorithm really shines. So let's

1455.75 -> look at an example. As the name suggests, a depth first search plunges depth first into

1462.74 -> a graph without regard for which edge it selects next, until it cannot go any further at which

1469.7 -> point it backtracks and continues its exploration. So a depth first search has to start on a

1477.94 -> node. And I'm going to start our depth first search on node zero. And now we arbitrarily

1484.75 -> pick a node to go to some from node zero, we're going to go and do nine. Then from no

1492.19 -> nine, we only have one choice, which is to go to node eight, at node eight arbitrarily

1497.99 -> picking edge. So we're going to To go outwards to node seven, node seven, we have plenty

1504.11 -> of edges to choose from. So let's go to node 10, node 10, to node 11, and 11 to seven.

1511.14 -> So we don't want to revisit already visited nodes or nodes that are currently being visited.

1519.49 -> So we have to backtrack to indicate backtracking, I'm going to label edges and nodes as gray,

1526.73 -> so backtrack all the way back to node seven. So we're not finished exploring node seven,

1536.2 -> because there are still edges to be picked. So I'm going to go to node three, and node

1542.67 -> three, I'm going to go node to node two is a dead end. So we backtrack, then go to node

1549.58 -> four, node four is also a dead end. So backtrack from node four, back to node three, then pick

1556.55 -> node threes last edge to go in Node five, five to six, and six to seven, can't go to

1564.07 -> seven, because we're visiting seven currently, so backtrack all the way back to node eight.

1570.32 -> from node eight, we still need to visit its last edge, which goes to node one, node one

1578.37 -> back to node zero, we can't go to node zero, because we're currently exploring it, then

1584.309 -> backtrack all the way to zero, which completes our depth first search traversal of this graph.

1592.28 -> So this was one particular depth for search traversal. But as you saw, it could have gone

1596.39 -> a lot of different ways. So now let's look at some pseudocode. For this depth first search,

1606.47 -> to get a deeper understanding of how it works. The first thing we need to do is initialize

1615.45 -> these three variables, which are n the number of nodes in our graph, g the adjacency. List,

1624.56 -> representing the graph and visited a Boolean array containing true or false at index i

1631.05 -> depending on whether or not node i has been visited. In the beginning, this array should

1637.77 -> have all false values because we have not visited any nodes in the graph. Once that

1644.07 -> is set up. At the bottom, I define our starting node to be node zero and then called the depth

1650.57 -> first search method to do the exploration. The depth first search itself has one argument,

1657.11 -> the current node we are at which I have conveniently named at this method is recursive. So I checked

1664.54 -> the base case, which is whether we have already visited this node, if so we have no business

1671.08 -> here and can return otherwise, let's visit this node by marking it as true and exploring

1679.21 -> all of its neighbors to explore all the neighbors of the node, reach into the adjacency list

1685.179 -> and pull out all the neighbors of this node and explore them depth first by looping over

1690.679 -> each and recursively calling the depth first search method. And that's all a depth first

1696.85 -> search really is in a nutshell, let's look at another simple use case. For a depth first

1703.2 -> search. I want you to discuss finding connected components in a graph. First, let's understand

1709.85 -> what we mean by connected component. Sometimes the graph is split into multiple disjoint

1716.98 -> components, and it's useful to be able to identify and count these components. One way

1723.26 -> to identify these components might be to color them so we can tell them apart. But what does

1728.7 -> coloring nodes really mean to a computer

1731.87 -> coloring nodes

1732.87 -> is equivalent to labeling each node is a component with the same ID value. For example, every

1739.97 -> node in the purple component gets an ID of one, and every node in the green component

1746.05 -> gets an ID of three, we can use a depth first search to identify components this way. First,

1754.26 -> make sure all the nodes are labeled from zero to n non inclusive, where n is the number

1759.57 -> of nodes, the basic algorithm is to start a depth first search at every node, except

1766.09 -> if that node has already been visited, and mark all reachable nodes as being part of

1771.11 -> the same component using the same ID. So if we start at node zero, then we do a depth

1777.82 -> first search here and then every node in this component gets an ID of zero. So we go to

1784.35 -> eight, giving it an ID of zero, 14 gets zero 13 or so label it with a zero then backtrack

1792.48 -> like you do a depth first search, then explore note for given an idea of zero and then finish

1799.82 -> exploring. That component and then move on to the next node in order. So go to node one

1806.73 -> next, then node one, so depth for search there. So go node five, label it with a one, five

1814.91 -> goes to 17, label it with a one, backtrack, go to 16, also label it with a one, we're

1821.901 -> finished exploring this component, then we would go on to node two, wherever node two

1826.929 -> is, then explore that component, then node three, explore node three is component unless

1832.27 -> node three has already been visited, and so on. So you do this for every component. Eventually,

1838.15 -> we get to label all the components, and we use a depth first search to do that. Awesome.

1844.2 -> So that's how we find connected components using a depth first search.

1848.08 -> Now let's look

1849.08 -> at some pseudocode. For how we do this, first, we'll need a couple of things. We'll need

1853.85 -> everything from the previous code we looked at so n, the number of nodes in our graph,

1860.53 -> G, our adjacency list, and our visited array, but additionally, we'll also need a variable

1867.46 -> called count that tracks the number of connected components and components an integer array

1873.9 -> that holds the integer value of which component a node belongs to. Inside the find components

1881.17 -> method, we loop over every node and check if the current node has been visited or not,

1886.84 -> and then execute some logic. This depth first search variant differs slightly from the previous

1894.87 -> in that we execute a depth first search for every unvisited note, why do we actually do

1901.22 -> the depth first search, we visit nodes and mark them as visited. So we never revisit

1907.14 -> the same node more than once we either skip over a node because it's been visited in this

1913.429 -> for loop or start a depth for search there. If we start a new depth first search, we increment

1919.92 -> the count variable and keep track of how many depth first searches we have done. Inside

1925.791 -> the depth first search method itself, the two things we do are mark the current node

1929.94 -> as visited, and set the current node to be part of the component equal to the value of

1935.11 -> count, then simply iterate over every neighboring node that has not yet been visited, and call

1941.799 -> the depth for search method to explore them as well. Back inside the find components method,

1947.809 -> simply return the number of components and the components array that contains the information

1953.1 -> about which component each node belongs to. So we've covered two of the things you can

1959.73 -> use the depth for search for doing a simple traversal and determining connected components.

1966.16 -> But we can argument a depth first search to do so much more, such as computer graphs,

1973.11 -> minimum spanning tree, detect and find cycles in the graph. Check if a graph is bipartite

1979.429 -> find strongly connected components topologically sought your graph. Find bridges and articulation

1984.59 -> points find augmenting paths in the flow network generate mazes, and many many more applications.

1990.14 -> So a depth research is super versatile, and can be extended to do a whole ton of things.

1997.67 -> Today's topic is the breadth first search graph traversal algorithm. Alongside the depth

2004.179 -> first search the breadth first search algorithm is another one of those fundamental search

2008.99 -> algorithms used to explore nodes and edges of a graph. It runs in a time complexity of

2015.429 -> big O of v plus e that is vertices plus edges, and is often used as a building block in other

2022.97 -> algorithms. It differs from a depth first search in the way that explores the graph.

2030.309 -> The breadth first search algorithm is particularly useful for one thing, finding the shortest

2035.61 -> path on an unweighted graph. A breadth first search starts at a node in the graph and explores

2042.37 -> its neighbor nodes first before moving on to explore the next level of neighbors in

2048.02 -> the sense of breadth first search explores nodes in layers. So if we start breadth first

2053.429 -> search at zero, we would visit zero first, then visit all zeros neighbors, then we would

2060.29 -> visit all zeros neighbors, the nodes in yellow before moving on to the next layer of notes,

2066.55 -> then we would visit all their neighbors and so on.

2072.149 -> So as you saw a breadth first search expose a graph in a layered fashion, it does this

2077.8 -> by maintaining a queue of which node it should visit next, this is most easily seen with

2084.159 -> an example. Let's begin a breadth first search at node zero once more. So let's add zero

2090.7 -> to the queue on the left. I will denote the current node. in red. This zero is the current

2098.24 -> node and we want to add Explore all zeros unvisited neighbors and add them to the queue.

2104.96 -> So we would add nine to the queue seven to the queue and 11 to the queue. So zero has

2110.27 -> no more unvisited neighbors. So we move on. So nine is next up in the queue. So we add

2117.619 -> all of nines unvisited neighbors to the queue. So that is 10, and eight, then there are no

2124.45 -> more neighbors of nine to visit. So we move on to the next node in our queue, which is

2130.14 -> seven, then we add all of sevens unvisited neighbors to the queue. So we try and visit

2136.69 -> node 11. But note 11 is already in the queue, so we don't want to add it again. So we skip

2142.44 -> it, then we will add six to the queue and three to the queue. Then this process goes

2148.369 -> on and on until we run out of nodes in the queue. So I will let the animation play.

2174.7 -> And that's how you do a breadth first search in a nutshell. In the previous animation,

2181.02 -> we relied on a queue to help us track which node we should visit next. Upon reaching a

2188.15 -> new node, the algorithm adds it to the queue to visit it later. The queue dish structure

2193.86 -> works like a real world queue, such as a waiting line in a restaurant, people can either enter

2201.08 -> the waiting line that is get in queued or get seeded D queued. Let's look at some pseudocode

2210.73 -> for the breadth first search. First things first, we'll need to variables and the number

2217.72 -> of nodes in our graph, and G the adjacency list representing our unweighted graph. This

2225.97 -> breadth first search function takes two arguments s and E the start and end node indices of

2233.869 -> the search. The return value for this function is the shortest path of nodes from S to E.

2241.82 -> I've divided the function into two methods for simplicity. First, we solve the problem

2247.69 -> by executing the breadth first search and then we reconstruct the path from S to E.

2253.84 -> So let's take a look at the solve method. So here we are inside the solve method. The

2259.38 -> first thing I do is initialize the queue data structure that we'll need and add the starting

2265.04 -> node to it. This queue should support at minimum the end q n dq operations I just talked about,

2275.109 -> then initialize a Boolean array with all false values and mark the starting node as visited.

2284.34 -> This array tracks whether or not node AI has been visited. If the value at index is true,

2291.2 -> then the node has either been visited or is being visited and is on the queue. And the

2297.57 -> animation this corresponds to the gray and yellow nodes. The last thing we'll need is

2303.54 -> an array called prev, which will help us reconstruct the shortest path from the start to the end

2310.19 -> node. Initially, this array should be initialized with all null values. This array tracks who

2317.02 -> the parent of node i was, so we can reconstruct the path later. Let's loop while the queue

2324.18 -> is not empty and plot the top node from the queue by issuing a dq operation, then reach

2332.01 -> inside the adjacency list and get all the neighbors have this node loop over each unvisited

2338.47 -> node. Once we find a next unvisited node and queue it to the queue market as visited and

2346.66 -> keep track of the parent node of the next node in the prev array. Once the queue is

2352.82 -> empty, and our breadth first search is complete, simply returns the prev array. Back inside

2359.59 -> the breadth first search method take the output of the solve method which gave us the prev

2365.79 -> array and call the reconstruct path method. Here we are inside the reconstruct path method.

2373.25 -> The first thing we do is actually reconstruct the path by looping backwards from the end

2378.52 -> node and making our way back to the start node. That is assuming we can reach it. The

2384.85 -> reason the prep array had to be initialized to all null values is because that is the

2391.8 -> way and checking whether or not the for loop should stop since we loop through the prev

2397.99 -> array backwards, starting With the end node, we need to reverse the order of the nodes

2403.55 -> so that the path starts at the start node and ends at the end node. Last but not least,

2410.56 -> we actually have to make sure the path between nodes s and E exists, it might not be possible

2417.29 -> to reach node II from node s. If the graph is disjoint. If this is the case, then simply

2424.33 -> return an empty path. Today we're going to talk about using a breadth first search to

2429.109 -> find the shortest path on a grid. This is going to be a really fun video because we're

2433.3 -> going to solve a problem. And I'm going to teach you a bunch of handy tricks when doing

2438.34 -> graph theory on grids. The motivation behind why we're learning about grids in this video

2444.369 -> is that a surprising number of problems can easily be represented using a grid, which

2450.8 -> a lot of the times turns into a graph theory problem. grids are interesting because they're

2456.25 -> a form of implicit graph, which means that we can determine a nodes neighbors based on

2462.48 -> our location within the grid. For instance, finding a path through a maze is a form of

2467.63 -> a grid problem you're trying to get from one side of the maze to the other. Well, you need

2473.1 -> to find a path that's a pathfinding problem. Or perhaps you're a person trying to navigate

2479.19 -> your way through obstacles such as trees, rivers, and rocks to get to a particular location.

2484.88 -> And this can be modeled using a grid, and in turn, we end up using graph theory to navigate

2490.21 -> around. A common approach to solving graph theory problems on grids is to first convert

2496 -> the grid to a familiar format, such as an adjacency list or an adjacency matrix, so

2502.14 -> we can easily work with them. However, this isn't always the most efficient technique,

2507.51 -> but we'll get to that. Suppose we have a grid on the left, and we want to represent it as

2513.41 -> both an adjacency list and in the adjacency matrix, what do we do first, first, you should

2519.52 -> label all the cells in the grid with the numbers zero through n non inclusive, where n is the

2527.24 -> product of the number of rows and columns. So in this grid, on the left, there are six

2533.46 -> cells. So I labeled each cell with the numbers zero through six not inclusive, then we actually

2540.85 -> want to construct an adjacency list and an adjacency matrix. Based off this grid, the

2547.02 -> adjacency list doesn't require any setup because it's simply a map that we initialize, but

2551.81 -> the adjacency matrix requires us to initialize a matrix of size six by six to represent our

2558.05 -> graph, there are six rows and six columns in the new adjacency matrix, because it's

2563.73 -> how many nodes that are in the grid we're trying to represent. Assuming edges are unweighted,

2569.82 -> and cells connected left, right up and down. Node zero connects with node one and node

2576.09 -> two, which we reflect in the adjacency list, and adjacency matrix on the right, then node

2583.8 -> one connects to node zero and node three, node two to node 03, and four, node three,

2591.66 -> with nodes one, two, and five,

2595.76 -> and so on. And that's basically how you convert a grid to an adjacency list or an adjacency

2606.119 -> matrix. Once we have an adjacency list or an adjacency matrix, we are able to easily

2612.05 -> run whatever specialized graph algorithm we need to solve our problems such as finding

2617.18 -> the shortest path finding connected components, etc. However, transformations between graph

2623.5 -> representations can usually be avoided due to the structure and the nature of a grid.

2629.72 -> Let me explain. Suppose where the red ball in the middle and we know we can move left,

2636.19 -> right up and down to reach adjacent cells. Well, mathematically, if we're the red ball

2642.64 -> at the row column coordinate r comma C, we can add the row vectors minus 101 comma 00,

2652.32 -> comma one and zero comma minus one to reach all the adjacent cells. If the problem you're

2659.82 -> trying to solve allows moving diagonally, then you can also use the row vectors minus

2665.74 -> one minus one minus 1111 and one minus one. Using row vectors makes it easy to access

2674.04 -> neighboring cells from the current row column position. First, define the direction vectors

2679.55 -> for north south east and west broken down into their row column components. Then what

2687 -> we want to do is loop over each direction vector and add it to the current position

2692.79 -> here I iterate I from zero to four non inclusive because we only have four directions, then

2699.57 -> add the Row direction to the current row to make

2703.24 -> our our

2704.24 -> the variable representing the new row, and then add the column direction to the current

2710 -> column to make cc the new column position. So the new position on the grid, our comma

2716.98 -> cc is an adjacent cell. However, it might not be an adjacent cell if we're on the border

2724.22 -> of the grid, and the new position is out of bounds. So we check that the new coordinate

2729.42 -> is within our grid by making sure that the new row column position is greater than or

2735.57 -> equal to zero and doesn't exceed the number of rows and columns of our grid respectively.

2742.91 -> So if those two checks pass, then we know that the new position r r comma CC, is a neighboring

2749.42 -> cell of our current position where the red ball was our car seat. So in summary, this

2756.74 -> technique is really nice, really easy to code and actually naturally extends to higher dimensions.

2765.08 -> So let's solve a shortest path problem on a grid using the direct Shin vector technique

2771.63 -> we just learned about. So here's an abridged problem statement that you might encounter

2779.13 -> during an interview or in a programming competition. And it goes as follows suppose you're trapped

2786.43 -> inside a 2d dungeon and need to find the quickest way out. The dungeon is composed of unit cubes,

2793.24 -> which may or may not be filled with a rock. It takes one minute to move one unit north,

2799.06 -> south, east, or west, you cannot move diagonally and the maze is surrounded by solid rock on

2805.14 -> all sides. This problem statement is an easier version of the problem Dungeon Master on the

2811.94 -> caddis online judge see the problem link in the description. The dungeon is a grid of

2818.369 -> size R by C and you start at the node with an S character. And there's an exit at the

2827.35 -> cell with an IE a cell full of rock is indicated by a pound sign or a hashtag, and empty cells

2834.78 -> are represented using a.in. This particular setup it's possible to escape the dungeon

2840.92 -> using this particular route highlighted in green. Remember that we want the shortest

2846.34 -> path to escape dungeon, not just any path, our approach is going to be to do a breadth

2851.369 -> first search from the start node until we reach the end node and count the number of

2855.869 -> cells we traverse during that process. However, it might not be possible to exit the dungeon

2861.49 -> if we cannot reach the exit, so we'll have to be mindful of that. So like in any breadth

2867.16 -> first search, we need to start by visiting our start node and adding it to the queue.

2872.88 -> Assuming we've already found the coordinate of our starting node within the grid we've

2876.89 -> added to the queue. Then we visit the adjacent unvisited neighbors and add them to the queue

2882.52 -> as well. And continue this process all the while avoiding adding rock cells to the queue.

2889.27 -> So I'll let the animation play And meanwhile, try and predict which cells will be visited

2894.88 -> next. All right, after we find our end cell, we know how many steps it takes to get from

2919.25 -> the start to the end. Notice that we didn't even visit all the cells in the grid. The

2924.39 -> bottom right cell is still unvisited, so it's possible that we terminate early. If you're

2931.31 -> interested in actually finding the path itself rather than just the number of steps it takes

2935.79 -> to escape the dungeon, then you'll need to keep track of the previously visited node

2940.89 -> for each node. Go in and re watch the last video. If you need a refresher on how to do

2946.39 -> that. I want to talk a little bit about the way we are representing states in our breadth

2953.01 -> first search. So far, we have been storing the next x y position in the queue as an XY

2961.339 -> pair. This works well but requires an array or an object wrapper to store the coordinate

2967.38 -> values. In practice, this can require a lot of packing and unpacking of values to and

2974.14 -> from our queue. Let's look at an alternative approach which also scales well in higher

2979.95 -> dimensions, and in my opinion requires less setup and effort. So the alternative approach

2986.369 -> I'm suggesting is to use one cue for each dimension. So in a three dimensional grid,

2993.69 -> you would have one q for each of the x, y and z dimensions. Suppose In queueing the

3000.96 -> coordinate x one y one Zed one, then we would simply place each coordinate in their respective

3007.619 -> queues. So the x coordinate goes in the x q, the y goes in its own y, q, and so on.

3015.25 -> As we need to keep in queueing different positions, we simply keep filling up these queues this

3020.94 -> way. This contrasts the approach of simply having one queue with each of the components

3027.89 -> packed away inside an object. The one thing we have to be mindful about, however, is that

3033.56 -> when we either end keyword dq elements, you need to mq and dq elements from each of the

3041.29 -> queues all at the same time. So when I dq or pull elements from the queue, I need to

3047.37 -> remove an element from each of these queues. I prefer this representation when working

3055.64 -> with multi dimensional coordinates, which is why I want to share it, try it out and

3061.099 -> see if it works for you. So now that we have all the knowledge we need, we can solve the

3066.02 -> dungeon problem, let's look at some pseudocode. Assume that I have already read in the input

3071.109 -> matrix into memory and did some pre processing to find the coordinate of the starting node.

3078.05 -> The first two variables are the constants R and C the number of rows and columns in

3083.32 -> the input matrix following this is m, the input character matrix of size R by C. Next

3090.53 -> are two variables s, r and s

3094.18 -> see

3095.18 -> the row column position of the starting node. We'll need this to start our breadth first

3100.18 -> search our Q and c q r to Q data structures that represent the row Q and the column q

3107.71 -> will be enqueuing and D queuing elements from during the breadth first search. This next

3114.51 -> set of variables is to keep track of the number of steps taken to reach the exit move count

3120.43 -> will actually track the number of steps taken nodes left in layered tracks how many nodes

3127.34 -> we need to dq before taking a step and nodes in next layer tracks how many nodes we added

3134.55 -> in the breadth first search expansion so that we can update nodes left and layer accordingly.

3140.55 -> In the next iteration, this will make more sense soon reached and tracks whether or not

3146.849 -> we have reached the N cell marked with an E. We're also going to make use of a visited

3152.9 -> matrix the same size as the input grid to track whether or not a cell has been visited

3158.609 -> since we do not want to visit a cell multiple times. And lastly, I defined the north south,

3164.89 -> east and west direction vectors. To solve the dungeon problem. This is all the code

3170.49 -> we'll need to execute our breadth first search and reach the exit. The first thing I do is

3175.88 -> add the start cells row and column values to the row Q and column Q, then don't forget

3183.02 -> to mark the start cell as visited because we don't want to go there again, we're not

3188.01 -> done our breadth first search until both of our cues are empty. I checked the size of

3193.54 -> the row q is greater than zero, but you can also check the size of the column q is greater

3200.93 -> than zero since their sizes should always be in sync. Then since I know the queues aren't

3206.9 -> empty, I can dq the current position from the queues as the row position R and the column

3213.18 -> position C, then I check if we've reached the dungeon exit by checking if the current

3218.72 -> character in the grid is an eat. If it is then mark that we've reached the exit and

3223.78 -> break out early. Otherwise, we're not done exploring and we want to add all the valid

3229.39 -> neighbors of the current node to the queue, I wrote a function called explore neighbors

3234.76 -> they'll do just that. Let's have a look. Here we are inside the Explore neighbors method.

3241.11 -> This is where we'll be using the direction vector technique we learned about earlier.

3248.089 -> Since cells have four directions we care about north south, east and west I loop I from zero

3254.7 -> to four non inclusive, compute the new coordinate r comma CC by adding the direction vector

3262.37 -> to the current position, make sure the new position is actually within the grid because

3267.109 -> we could end up with positions like zero comma minus one which is out of bounds. Even if

3274.33 -> the new position is within the grid that does not guarantee that is a valid position. The

3279.36 -> position might already have been visited previously, or it could be a blocked off cell such as

3285.89 -> a cell that isn't traversable and full of rock. If both of those conditions aren't true

3291.65 -> then we can en que the new position to visit it later. When en que a new position we are

3298.28 -> going to visit Make sure to mark it as visited now, so that it doesn't get added to the queue

3305.07 -> multiple times in the future. Also increment the number of nodes in the next layer, which

3311.31 -> we'll be needing shortly. This next block of code is used to track the number of steps

3319.19 -> we took. Getting to the dungeon exit. Every time we finish a layer of nodes, we increment

3325.4 -> the number of steps taken, we know how many nodes are in each layer. Because we kept track

3330.43 -> of that in the Explore neighbors method. When the number of nodes in the current layer reaches

3336.13 -> zero, we know to increment the move count. At the end, if we are able to reach the exit,

3343.19 -> we return the move count, otherwise, we return minus one to indicate that the dungeon exit

3349.8 -> was not reached. So in summary, things we learned in this video are how to represent

3357.079 -> a grid as an adjacency list and an adjacency matrix, how to use direction vectors to visit

3365.78 -> neighboring cells, we explored an alternative way of representing multi dimensional coordinates

3372.96 -> with multiple queues. And lastly, we looked at how to use a breadth first search on a

3378.46 -> grid to find the shortest path between two cells. Today's topic is topological sort,

3386.31 -> also called

3387.31 -> top source for short, we're going to discuss what is top sort where it's used, and how

3393.18 -> to find a topological ordering with some animation. The motivation for top sort is that many real

3401.88 -> world situations can be modeled as some graph of nodes, and directed edges where some events

3409.3 -> have to occur before others. Some simple examples include school class prerequisites, program

3416.951 -> dependencies, event scheduling, assembly, instruction ordering, and much, much more.

3423.82 -> Let's begin with an example. Suppose you're a university student, and you really want

3429.619 -> to take class Ah, well, before you can enroll in class H, you must first take classes D

3437.42 -> and E. But before taking Class D, you must also take classes A and B which have no prerequisites.

3448.02 -> So in some sense, there appears to be an ordering on the nodes of the graph. If we needed to

3455.03 -> take all the classes, the top sort algorithm would be capable of telling us the order in

3461.41 -> which we should enroll in classes, such that we never enroll in a course, which we do not

3468.77 -> have prerequisites for another canonical example of an application of top sort is for program

3477.22 -> build dependencies. A program cannot be built unless all its dependencies are first built.

3484.87 -> For example, consider this graph where each node represents a program. And the edges represent

3492.74 -> that one program depends on another to run. Well, if we're trying to build program j on

3500.45 -> the right hand side, then we must first build program H and G. But to build those we also

3507.26 -> need EMF. But to build those we also need and so on. The idea is to first build the

3514.06 -> programs without dependencies and then move on with from there. How do we find a valid

3519.97 -> ordering in which to build all the programs? Well, this is where top sword comes into play.

3525.73 -> One possible ordering might be to start by building a then building C, B, the F, E, G,

3538.06 -> H, and then J. Notice that there are unused dependencies in this case, and that will happen

3545.38 -> from time to time which is fine. So in conclusion, top sort is an algorithm which will give us

3553.93 -> a topological ordering. On a directed graph. A topological ordering is an ordering of nodes

3561.71 -> for which each edge from node A to node B. node A appears before node B in the ordering.

3571.91 -> If it helps, this is just a fancy way of saying that we can align all the nodes in that line

3578.63 -> and have all the edges pointing to the right. An important note to make is that topological

3585.82 -> orderings are not unique. As you can imagine there are multiple valid ways to enroll in

3592.17 -> courses, such that you can still graduate or to compile a program and its dependencies

3598.88 -> in a different order. Than you previously did. Sadly, not every type of graph has a

3605.391 -> topological ordering. For example, any graph with a directed cycle cannot have a valid

3612.39 -> ordering. Well think of why this might be true. There cannot be an order if there is

3619.48 -> a cyclic dependency. Since there was nowhere to start, every node in a cycle depends on

3625.45 -> another. So any graph with a directed cycle is therefore forbidden. The only graphs that

3634.56 -> have valid topological orderings are called directed a cyclic graphs, that is grass directed

3643.089 -> edges and no cycles. So a natural question to ask is, how do I verify that my graph does

3651.66 -> not contain a directed cycle? One method is to use Tarzan's strongly connected component

3659.44 -> algorithm which can detect these cycles. Another neat thing definitely worth mentioning is

3666.92 -> that every tree has a topological ordering. Since by definition, trees do not have any

3674.28 -> cycles.

3676.61 -> and easy way to find a topological ordering with trees is to iteratively pick off the

3682.91 -> leaf nodes. It is like you're cherry picking from the bottom it doesn't matter the order

3688.63 -> you do it. Once the root of a subtree has all grayed out children, then it becomes available.

3696.89 -> This procedure continues until there are no more nodes left. So we know how it works with

3708.19 -> trees. But how about general directed a cyclic graphs? Well, the algorithm is also simple,

3716.65 -> just repeat the following steps. First finding unvisited node, it doesn't matter which from

3723.839 -> this node, do a depth first search exploring only reachable unvisited nodes. On the recursive

3731.92 -> callback, add the current node to the topological ordering in reverse order. And that's it.

3739.54 -> Let's do an example. And things will become much clearer. Here's a directed acyclic graph

3746.71 -> that we want to find one of many topological orderings for as the algorithm executes, I'll

3753.109 -> be keeping track of the call stack on the left hand side. And in case you're curious,

3758.43 -> I will also be posting the current topological ordering at the bottom of the screen. The

3766.51 -> first step is going to be to pick in an visited note, I'm going to pick node h arbitrarily.

3773.119 -> Now we do a depth first search out towards from H and all possible directions exploring

3779.08 -> where we can. Let's go to node j. Now that I might know j, I'm going to keep exploring.

3789.99 -> And so let's go to

3791.74 -> m.

3793.05 -> Now that we're at, there's nowhere left to go so we backtrack and add as last element

3800.77 -> to the topological

3803.06 -> ordering

3804.119 -> still at j and we still need to explore L. Now we're at L. Now backtrack because there's

3812.859 -> nowhere left to go. Also backtrack j and add it to the ordering. Notice that the stack

3820.849 -> frames getting popped off the call stack as I recurse. Now we're at h and we still need

3825.99 -> to visit node i. So now we're at node i and from node i, we try and visit node L. But

3835.74 -> then we figure out that note L is already visited so we don't go there, backtrack, backtrack,

3843.339 -> again add AI to the ordering and mark it as explored. And finally we're back at h

3851.81 -> as you saw

3852.81 -> selecting a random unvisited node made us visit a subsection of the graph. We continue

3858.991 -> this process until all nodes are visited. The next node I'm going to randomly pick is

3865.63 -> going to be node E in the interest of time and simplicity. I will let the animation run

3872.16 -> and you can follow along. Note that if you try and predict the next few values and topological

3878.819 -> ordering, he may not get the same values as me. Because topological orderings are not

3884.319 -> unique. However, this does not mean you are incorrect. All right, I will let the animation

3890.18 -> play and try and follow along

3919.02 -> So that's it for that sub section of the graph. The next note I'm going to pick is going to

3923.839 -> be node C to visit. So we start node C and explore this sub section of the graph.

3936.1 -> Now that all nodes are visited, we have a valid topological ordering at the bottom of

3941.06 -> the screen. So now that we understand how the algorithm works, what does the code actually

3946.64 -> look like? Here's some pseudocode. For top sort. Let's walk through it real quick. The

3953.33 -> first thing I do as I get the number of nodes from the graph, which I assume is passed in

3958.54 -> as an adjacency list from the function, that I declare an array called v short for visited,

3965.3 -> which tracks whether a node has been visited or not. The next array called orderings, is

3972.85 -> the result that we'll be returning from this function. This is equivalent to the ordering

3979.23 -> at the bottom of the screen. In the last slides associated with the orderings array is the

3986.339 -> index i, which tracks the insertion position of the next element the topological ordering.

3993.54 -> As you have been seeing in the slides, we insert elements backwards, which is why I

3998.88 -> start at n minus one. Next, we're ready to enter a for loop to iterate over all the nodes

4007.68 -> in our graph. The loop variable called at tracks the ID of the node we're currently

4015.18 -> processing. I then check if we're on a visit and node, because those are the only ones

4021.369 -> we care about. Then I started depth first search, notice that before I do, I initialize

4029.109 -> an array called visited nodes, which I pass into the depth first search method to add

4037.319 -> nodes as we find them. Then, after that's done, after the depth first search is finished,

4044.91 -> I look at the notes we found in our visited nodes array and then add them to the ordering.

4052.45 -> Now the last bit we need to look at is the depth first search method itself. The depth

4059.74 -> first search method is very simple. All I do is I mark the node we're currently at to

4064.839 -> be visited. Then for each edge going outwards from the node we're at, I make sure the destination

4072.82 -> node is visited, then call the method again. But this time on the destination node. On

4080.869 -> the callback when the method returns. This is when we're stuck and need to backtrack.

4086.619 -> So this is where I add the current node to the visited nodes array, which is essentially

4092.49 -> the output for this method. Back to the top sorting method, now that we understand how

4099.79 -> the top sort algorithm works, there's a neat optimization we can do to prove the performance

4106.199 -> in terms of both time and space. Notice that every time we enter the inner if statement

4113.959 -> block, we need to allocate memory for an array, that array gets filled with node IDs and then

4120.469 -> we iterate over them to place them inside the orderings array. But how about we just

4125.889 -> directly insert found node inside the orderings array, instead of allocating memory and doing

4132.25 -> this additional work? Well, that's exactly what we're going to do. Here I got rid of

4139.049 -> the unnecessary array and modify the depth for search method to return the next valid

4144.4 -> insertion position in the orderings array. Now we need to pass in the index i and the

4150.139 -> orderings array so that it can be filled directly inside the depth first search method inside

4157.059 -> the new depth first search method, one thing that changed is that now we have a return

4162.579 -> value, and we're passing in some additional variables. Notice that instead of adding the

4168.139 -> current node to the visit and notes array as we were doing before, now, we simply insert

4173.739 -> that note directly inside the orderings array. The last thing to do is to return i minus

4180.589 -> one, because the index of the current insertion position is no longer index i index i minus

4190.589 -> one. So related to the topic of topological orderings is the topic of shortest and longest

4199.849 -> path. On directed a cyclic graphs, recall that a directed acyclic graph is a graph with

4209.09 -> directed edges and no cycles. By definition, this means that all trees are automatically

4216.969 -> directed acyclic graphs, since they do not contain any cycles. Here is a graph. My question

4224.429 -> to you is, is this graph a directed acyclic graph? And the answer is yes. But what about

4233.539 -> this structure? I'll give you a moment to think about it.

4238.679 -> The answer is no. Because this graph has undirected edges as opposed to directed edges. The graph

4246.31 -> may be a tree, but directed edges are a requirement for a directed acyclic graph. What's really

4256.949 -> great about working with directed acyclic graphs is that the single source shortest

4262.409 -> path problem can be solved very efficiently. In fact, in linear time, the next fastest

4269.179 -> single source shortest path algorithm is dextrous algorithm, which may not work if there are

4276.699 -> negative edge weights. This algorithm I'm about to show you is faster and doesn't care

4283.01 -> about positive or negative edge weights. The essence of the algorithm is that it finds

4289.019 -> a topological ordering on the graph using the top sort algorithm we saw in the last

4294.65 -> video, and processes each node sequentially to get the shortest path by relaxing each

4301.989 -> edge as it is seen. Relaxing edge simply means updating to a better value if a shorter path

4310.55 -> can be obtained using the current edge. Suppose this is the graph we're working with, you

4318.449 -> can verify that it is in fact a directed acyclic graph. What we wish to do is find the shortest

4326.309 -> path from node A to all other nodes in the graph. In order to do this, the first thing

4333.619 -> we'll want to do is generate a topological ordering of the nodes of this graph. Using

4340.039 -> the top sort algorithm. Below I have selected and arbitrary topological ordering, which

4348.079 -> is the order we will process the nodes in this graph. I'm also displaying the current

4353.889 -> best distance to each node and bond the screen, which are all currently set to infinity, the

4360.84 -> first step of the algorithm is to set the distance to the starting node to be zero.

4366.479 -> In this case, since a is the starting node, its initial distance is zero, because we're

4372.07 -> already there. From a we want to visit all reachable nodes starting with node B, and

4380.599 -> update the value to be if it is better than what was already there. This is the edge relaxation

4388.33 -> step of the algorithm, we noticed that

4391.3 -> a value of three

4392.3 -> is better than infinity, so we update the best value of b to be three, then the best

4399.84 -> value to see to be six. And now we've explored of his edges and want to move on to the next

4407.439 -> node node topological ordering which is B and explore all of its edges. So the first

4414.84 -> edge brings us to node E and we update its best value to 14. Because the best value add

4421.539 -> node B was three plus the edge weight to get e was 11 for a total of 14. Notice that edges

4431.019 -> get grayed out as they're being processed. Next, we update the best value to D to B seven.

4442.36 -> Now, we've reached the first instance where it is best and not to update the value of

4448.3 -> the destination node, since a better path already exists to where we want to update.

4456.46 -> Now we move on to the next known or topological ordering and keep repeating the same steps

4462.199 -> where we explore all the nodes trying to relax each edge and then move on to the next node

4468.73 -> and the topological ordering. If we repeatedly do this the right way the bottom of the screen

4473.88 -> will contain the shortest path from node A to each node. I will let the animation play

4481.829 -> and you can try and determine the shortest path to all the remaining nodes which have

4486.86 -> not yet been processed. Okay, we're done processing all the nodes and know the shortest distance

4510.039 -> to every note, let's verify our algorithm computed the correct values by finding the

4517.11 -> shortest path to node H. Indeed, if we look at the path and some of the values along the

4525.739 -> edges, you will find that they do indeed sum up to 11, which is the shortest path in our

4532.84 -> array for node H. There's a similar problem to the shortest path problem, which is finding

4541.61 -> the longest path in the graph. This problem is actually NP hard on general graphs, but

4548.909 -> can actually be solved in linear time on a directed acyclic graph. The trick is going

4556.34 -> to be to multiply each edge by minus one, find the shortest path, and then multiply

4565.03 -> all the edge values by minus one again, to take the previous graph, we had to find the

4571.449 -> longest path, simply negate all the edges, then find the shortest path and multiply the

4578.269 -> answer by minus one. And there we go. That's all you need to do. Okay, now let's have a

4586.369 -> look at some source code, you can find the code I'm about to show you on [email protected].

4592.019 -> Slash William is that slash algorithms. Here I am on GitHub, and we're looking at some

4600.8 -> code for the shortest path on a directed acyclic graph. Here's our method directed acyclic

4609.709 -> graphs shortest path, and it returns the distance to each node, stored in an integer array.

4619.789 -> For some starting node, and as input, we give it the graph we're working with as an adjacency

4628.309 -> list. Of course, the starting node and lastly, the number of nodes in our graph. So what

4637.7 -> we do is find the topological ordering for our nodes, I covered this in the last video,

4646.13 -> then initialize our distance array, and then set the starting nodes distance to be zero.

4652.199 -> And all we do is we loop through each node, starting at the first node, looking at what

4660.989 -> our node index is Four Tops or so this is the first node we need to visit and then check

4669.07 -> if that node is not equal to No, and then grab all the edges for that node, so we reach

4678.239 -> in our graph, and then pull out all the edges for the node index.

4684.88 -> Make sure there actually are some edges. And then for each edge, in the edges that we got,

4692.21 -> which were the adjacent edges, then all we do is the relaxation step, which is just this.

4698.289 -> So we compute the new distance. So this is the distance to the node, we're currently

4705.11 -> at plus the edge weight. So this is like the competing distance the distance we were trying

4712.929 -> to improve upon, then we check, okay, has there ever been a distance set to where we

4721.28 -> want to go? This is basically the equivalent of infinity. And if so then we just want to

4727.59 -> give the new distance. Otherwise, we're going to take the minimum of the distance that's

4733.57 -> already there, and our new competing distance, which is

4737.679 -> this over here, and then we just do this over and over again, processing nodes and topological

4746.249 -> order, because we're pulling them out of the top sorted array. And at the end, we just

4752.389 -> return that distance array. And we can get the distance from our starting node to any

4759.34 -> other node in the graph, just through a lookup and this array. And guys, this is super simple

4767.59 -> algorithm. And that's all there is to shortest paths on directed acyclic graphs. Today, we're

4774.03 -> going to tackle Dykstra shortest path algorithm. This is one of the most important algorithms

4779.989 -> in graph theory for finding the shortest path on a graph. So without further ado, let's

4785.97 -> dive right in. The first thing to mention about Dykstra algorithm is that it is a single

4792.689 -> source shortest path algorithm for graphs. This means that at the beginning of the algorithm

4799.409 -> you need Specify a starting node to indicate a relative starting point for the algorithm.

4805.269 -> Once you specify the starting node and execute the algorithm, Dykstra can tell you the shortest

4811.84 -> path between that node and all other nodes in your graph, which is pretty sweet. So depending

4818.119 -> on how you implement your Dykstra s and what data structures you use, the time complexity

4823.199 -> is typically big O of E log V, which is fairly competitive against other shortest path algorithms

4831.619 -> we see around. However, before we go crazy, trying to find shortest paths on various graphs,

4838.57 -> you need to know which graphs we are allowed to run dextrous algorithm on the one main

4843.809 -> constraint for Dykstra is that all edges of the graph need to have a non negative edge

4849.76 -> weight. This constraint is imposed to ensure that once a node has been visited, it's optimal

4855.8 -> distance from the story node cannot be improved any further by finding a shorter path by taking

4862.15 -> edge with a negative weight. This property is especially important because it enables

4867.67 -> the extras algorithm to act in a greedy manner by always selecting the next most promising

4873.119 -> note. For this slide deck. My goal is to help you understand how to implement dichos algorithm

4879.719 -> and also how to implement it very efficiently. We're going to start by looking at the lazy

4886.099 -> implementation because it's by far the most common and then we'll look at the eager implementation

4891.599 -> of Dykstra has algorithm which uses an indexed priority queue alongside the decrease key

4897.159 -> operation. And lastly, I want to briefly mention how we can use other types of heaps in particular

4903.329 -> the D airy heap to further boost performance of the algorithm. At a high level, these are

4908.929 -> the steps required in executing Dykstra algorithm. Note that there are two bits of information

4915.099 -> we'll need. The first is an array called dist that keeps track of the shortest distance

4921.919 -> to every node from the start node. Initially, this array can be populated with the value

4927.979 -> of positive infinity, except for the index of the starting node, which should be initialized

4934.09 -> to zero. Additionally, we'll also need to maintain a priority queue of key value pairs,

4940.04 -> the key value pairs will be node index distance pairs, which tells us which node to visit

4946.699 -> next, based on a minimum sorted value. At the start of the algorithm, we will begin

4952.209 -> by inserting the key value pair s comma zero into the priority queue, then we'll loop while

4959.15 -> the priority queue is not empty, pulling out the next most promising node index distance

4964.449 -> pair as we go. After that, for each node we visit, we will want to iterate over all the

4970.039 -> outwards edges and relax each edge appending a new node index distance key value pair to

4976.699 -> the priority queue upon every successful relaxation. We do this until our priority queue is empty,

4983.289 -> at which point the shortest distance to each node will be stored in the disk array we are

4988.94 -> maintaining. So that explanation may have sounded a little bit abstract. Now let's look

4994.919 -> at an example with some animation to put all the pieces together. In all these examples,

5002.3 -> assume node zero is always the starting node. Although any node is perfectly fine. boxed

5010.559 -> in red is the distance I will be using it to track the optimal distance from the start

5016.84 -> node to every node in the graph. In the beginning, the distance to every node is initialized

5022.32 -> to have the value of positive infinity. Since we assume that every node is unreachable if

5028.09 -> at the end of the algorithm, there's still a value of infinity at a certain index, then

5033.32 -> we know that that node is unreachable. On the right I will be maintaining key value

5038.519 -> pairs corresponding to a nodes index and the best distance to get to that node. This priority

5044.829 -> queue will tell us which node we should visit next, based on which key value pair has the

5051.159 -> lowest value. Internally priority queues are usually implemented as heaps, but I'm not

5056.77 -> going to show that visualization here. To start with assign a distance of zero to the

5062.409 -> start nodes index, which is index zero in the distance array. Also insert the key value

5068.239 -> pair zero comma zero into the priority queue to indicate that we intend on visiting node

5073.36 -> zero with a best distance of zero, then the algorithm actually starts and we look inside

5079.429 -> the priority queue for the first time and we discover that we should visit node zero

5083.749 -> from node zero we can visit node one by using the edge with a cost of four. This gives us

5088.699 -> a best distance of four so we can update the best distance from infinity to four and the

5093.679 -> dist array. Also add this information to the priority queue. Next, we can visit node two

5099.179 -> from node zero Just like the last note, we can update the optimal distance to reach no

5104.099 -> to from infinity to one. Additionally, add that node to is reachable with a distance

5109.83 -> of one to the priority queue. So that concludes visiting all the edges for node zero. To decide

5115.889 -> which node we should visit next day shows always selects the next most promising node

5121.989 -> in the priority queue. To do this, simply pull the next best key value pair from the

5126.829 -> priority queue. node two is the next most promising node because it has a distance of

5131.559 -> one from the start node, while node one has a greater value of four. from node two, if

5138.239 -> we take the upwards edge, we can improve the best distance to node one by taking the current

5142.579 -> best distance from node two, which is one plus the edge cost of two to get to node one

5148.51 -> for a total cost of three, this is better than the previous value of four. For every

5153.539 -> time we find a better distance like this, we insert that information into the priority

5157.76 -> queue, then we improve the best distance to node three to be six.

5163.849 -> The next most promising node is node one, we can improve the best distance to node three

5170.44 -> by taking the edge from node one to node three with a cost of one. The next most promising

5180.559 -> node is node one with value four, but we have already found a better route to get to node

5185.919 -> one. Since the disk array at index one has a value of three. Therefore we can ignore

5192.409 -> this entry in the priority queue. Having these duplicate key entries in the priority queue

5197.9 -> is what constitutes to making this implementation of Dykstra is the lazy implementation because

5203.57 -> we leisurely delete outdated key value pairs. Next up is no three, update the best distance

5211.15 -> to node four to be seven. We already found a better route to node three, so skip this

5221.4 -> entry in the priority queue. Finally, visit node four. And that's all for the lazy implementation

5228.579 -> of dynatrace. There are only a few moving parts, but enlarge the only things to really

5234.099 -> keep track of is the distance array, which contains the best distance so far from the

5240.15 -> start node to every other node and the priority queue which tells us which node we should

5245.239 -> visit next, based on the best value found so far. Let's look at some pseudocode. For

5252.099 -> how this works. I'll be covering the real source code in the next video. For those interested,

5257.809 -> this pseudocode runs a Shor's algorithm from a start node and returns the distance array

5265.289 -> which tells us the shortest distance to every node in the graph. However, it will not tell

5271.269 -> you which sequence of edges to follow. To achieve that optimal distance, this is something

5276.969 -> that we will need to maintain some additional information for which I will cover as well.

5282.829 -> So in terms of the variables we'll need in the function definition, I specify three things

5288.03 -> first is G, the adjacency list of the weighted graph and the number of nodes in the graph.

5294.76 -> And s the index of the start node inside the function I begin by initializing two arrays

5300.219 -> to keep track of the information we'll need first is a Boolean array I called V is short

5305.58 -> for visited which tracks whether node AI has been visited or not. Then I initialize dist

5312.249 -> the distance array which will be the output of the function make sure you feel the distance

5317.219 -> array with positive infinity except for the start node which should be set to zero after

5323.03 -> this initialize a priority queue that will store the node index best distance pairs sorted

5329.15 -> by a minimum distance, you should be able to use the built in priority queue in whatever

5333.88 -> programming language you're using. Remember to insert the start nodes index paired with

5339.619 -> a distance of zero into the priority queue to kickstart the algorithm. If you're wondering

5344.289 -> why there are two sets of brackets, that's because the pair x comma zero is meant to

5350.01 -> represent a tupple or an object with two values a key and a value. So while the priority queue

5356.659 -> is not empty, remove the next most promising index minimum distance pair and mark that

5362.94 -> node as visited then loop over all the neighbors of the current node and skip visited neighbors

5368.63 -> so that we don't visit them again. Then simply perform the edge relaxation operation. First,

5375.559 -> compute the distance to the new node which is calculated by adding the best distance

5379.749 -> from the start node to the current node which is found the distance array plus the edge

5383.869 -> cost of getting to the next node. Once you know that compare it against the best distance

5388.59 -> for the next node I update the value if it's better. Then finally insert a new key value

5394.019 -> pair inside the priority queue. So we visit that node in the future. So in practice most

5398.86 -> standard priority queues do not support a decreased key operation for the built in barbecue.

5405.44 -> You can think of a decreased key operation as an operation which updates the value of

5410.02 -> a key and the party cue. A way to get around this is to add a new note index best distance

5416.559 -> pair every time we need to update the distance to a node. As a result, it's possible though

5421.469 -> have duplicate node indices in the priority queue like we saw in the animation. This is

5426.94 -> not ideal. But inserting a new key value pair in logarithmic time is much faster than searching

5433.159 -> for the key, we want to update in the priority queue, which actually takes linear time. Yes,

5439.07 -> searching for a key in a priority queue takes linear time because the heap is sorted by

5444.11 -> the keys values, not the keys themselves. So effectively, it's like searching in an

5449.619 -> unordered list for a particular element.

5453.639 -> And neat optimization we can do which ignores stale, outdated index min distance pairs in

5462.959 -> our priority queue is to skip them immediately. As we pull them from the priority queue. We

5469.26 -> can do this by checking if the value in the distance array is better than the value in

5474.9 -> the priority queue. And if it is, then we know we have already found a better path routing

5479.849 -> through other nodes before we got to processing this note, I'll let that sink in for a little

5484.929 -> bit. But this is definitely a neat optimization you'll want to keep around. Now I want to

5491.34 -> talk about finding the shortest path itself and not just the shortest distance to get

5497.09 -> there. And to do that, we'll need to keep track of some additional information. In particular,

5502.01 -> we'll want to keep track of the index of the previous node we took to get to the current

5507.76 -> node. The way to do this is to maintain a previous array I call prev. In this slide,

5514.229 -> this array tracks the index of the node you took to get to node I initially the previous

5520.269 -> array should be filled with a sentinel value such as now or minus one. But as you perform

5527.559 -> edge relaxation operations, you want to update the previous array to say that the node you're

5532.86 -> going to came from the node you're currently at, then at the end instead of returning the

5539.139 -> distance right also return the previous array which we will use soon. In another method

5546.25 -> of perhaps called find shortest path provide all the same arguments with the addition of

5552.219 -> the end node index and execute Dykstra has to obtain the distance array in the previous

5557.449 -> array with these two bits of information, we can reconstruct the shortest path first

5561.969 -> check that the end node is reachable by checking that its value in the distance array is not

5566.86 -> infinity, then start at the end node and loop backwards through the previous array until

5572.539 -> you make it back the start node. You know you made it back to the start node when the

5577.07 -> value of null is reached. Since the start node does not have a parent node index from

5581.699 -> which came from the resulting path of node indices to follow for the shortest path from

5587.09 -> the start node to the end node will be in a reverse order because we started at the

5590.959 -> end node and worked backwards. Therefore, we want to make sure we reverse this array

5596.349 -> before returning the result.

5600.07 -> Now I want to talk about a few optimizations we can use to make dexterous algorithm more

5605.659 -> efficient. Sometimes we know the index of our destination node and don't necessarily

5610.849 -> need to know the optimal distance to every node in the graph, just that one particular

5615.469 -> node. So the question is do we still have to visit every node in the graph just to figure

5620.829 -> out the distance of that one particular node we want to get to? The answer is yes, we do.

5627.03 -> But only in the worst case, which depending on your graph can be somewhat rare the key

5632.84 -> realization will need to make is that it is possible to stop early once we have finished

5637.63 -> visiting the destination node. The main idea for stopping early is that tech shows algorithm

5642.829 -> processes each next most promising node in order. So if the destination node has already

5648.699 -> been visited, its shortest distance will not change as more future nodes are visited. In

5654.659 -> terms of code, all we have to do to stop early is check if the current node index is the

5660.73 -> end node and return early. This can prove to be a very substantial speed up depending

5666.019 -> on how early you encounter the end node while processing the graph. Our current implementation

5672.76 -> of dices is what we call the lazy implementation because it inserts duplicate key value pairs

5679.349 -> and leisurely deletes them. This is done because it's more efficient to insert a new key value

5686.239 -> pair in logarithmic time into the priority queue than it is to update an existing keys

5690.959 -> value in linear time. The lazy approach works but it is inefficient for dense graphs because

5696.709 -> we end up with all these stale outdated key value appears in our priority queue. The eager

5701.969 -> version of dank shows aims to solve this by avoiding duplicate key value pairs and supporting

5707.63 -> efficient value updates in logarithmic time using an indexed priority queue. And index

5715.849 -> priority queue is a priority queue variant which allows access to key value pairs within

5721.539 -> the priority queue in constant time, and updates in the log time if you're using a binary heap.

5728.3 -> This type of priority queue is extremely useful in many applications, and I highly recommend

5732.649 -> you watch my video on the index priority queue to become enlightened. I'll make sure I leave

5737.949 -> a link in the description. But in the meantime, we'll just assume that we have access to an

5743.55 -> indexed priority queue. Now we're going to take a look at the eager version of dichos

5748.28 -> algorithm where we don't have duplicate keys and priority queue. So to start with, assign

5753.469 -> a distance of zero to the start node at index zero in the distance array. Also insert the

5759.34 -> key value pairs zero comma zero into the priority queue to indicate that we intend on visiting

5763.88 -> node zero with a best distance of zero, then the algorithm starts and we look inside the

5768.679 -> priority queue for the first time and we discover we should visit node zero. from node zero,

5774.03 -> we can visit node one by taking the edge with cost five, this gives us a distance of five

5779.32 -> so we update the best distance from infinity to five and the distance array also add this

5784.619 -> information to the priority queue. Next, we can visit node two node zero just like the

5790.329 -> last node, we can update the optimal distance to reach node two from infinity to one. Additionally

5795.969 -> add node two to the priority queue with a distance of one. That includes visiting all

5802.289 -> the edges for node zero to decide which node to visit next dextra selects the next most

5808.769 -> promising node in the priority queue. So pull the next best key value pair from the priority

5814.219 -> queue. node two is that next most promising node because it has a distance of one from

5820.01 -> the start node, which is better than five. from node two, we can take the sideways edge

5826.559 -> to improve the best distance to node four to be 13 by taking the current best distance

5831.599 -> from node two, which is one plus the edge cost of 12. To get to node four, for a total

5836.429 -> cost of 13. We can update the best distance to node one by taking the upwards edge from

5843.949 -> node to notice that I did not insert a new key value pair with a value of one comma four

5850.989 -> inside the party queue, but rather simply updated the existing value in the party queue

5856.61 -> from five to four. This is the main difference between the lazy and the eager version.

5865.409 -> The next most promising node is node one. When taking the downwards edge from node one

5871.78 -> to node two would discover that node two has already been visited. So we cannot improve

5876.929 -> it's already best distance. We also cannot improve the best distance note form by taking

5883.59 -> the diagonal downwards edge since the total cost of 24 outweighs the best distance of

5889.159 -> 13, which is already known for that note, however, we can improve the best distance

5895.09 -> node three by taking the edge from node one to node three with a cost of three. I'll let

5900.499 -> the animation play. And as it does try and predict what the next move for the algorithm

5906.01 -> will be.

5927.949 -> So that's the ingar version of Dykstra algorithm, which I would say is the more proper way of

5933.4 -> implementing texturas algorithm. Now let's look at some pseudocode and see what needs

5938.409 -> to change. First, notice that we're using an indexed priority queue instead of a regular

5944.039 -> priority queue. Also, notice that we no longer need to wrap our key value pairs as tuples

5949.96 -> in an object because index partly queues have first class support for key value pairs, as

5956.809 -> opposed to a priority queue, which you would find in your programming languages standard

5962.309 -> library. The other thing that needs to change is how we insert key value pairs into the

5966.809 -> queue. If the key or note index does not yet exist in the index primary queue inserted

5972.809 -> otherwise invoke the decrease key operation to update the best distance to that node in

5977.83 -> pirna queue. The operation is called decrease key instead of update because it only updates

5984.249 -> the value if it is strictly less than the current value in the priority queue. All right,

5991.26 -> we've looked at several Dykstra optimizations already, but there's one key last optimization

5997.929 -> I want to talk about and that is is improving the heap we're using. Currently, we're probably

6004.079 -> using an indexed binary heap for our priority queue. But we can do better. The thing to

6010.26 -> notice is that when executing dextrose, there are a lot more update operations, especially

6015.349 -> on dense graphs than there are removal operations. A dare heap is a heap variant in which each

6022.739 -> node has at most D children instead of two, this speeds up the decrease key operation

6028.699 -> at the expense of more costly removals. So if we look at an example, real quick, this

6035.989 -> is a dairy heap with D equals four. Suppose we want to perform an update operation, say

6043.649 -> we want to perform decreased key for node at index six with a value of one, then we

6050.489 -> can do the update. And then we reposition the key value pair inside the heap. So we

6057.28 -> bubble it up, and we bubble it up again. And now it's in the correct position. So that

6062.13 -> only took a total of two operations. While in contrast, suppose we want to remove the

6068.499 -> root node, then we swap it with the bottom right node. And then we need to reposition

6075.209 -> the purple node so that it's in position. So we need to look at all the children find

6080.53 -> the one with the least value and swap it and the purple node is still not in its correct

6086.389 -> position. So again, we need to look at all the children find the one the smallest value

6090.44 -> and swapping. So that took a total of eight operations, which is clearly more expensive.

6097.21 -> But remember, there are a lot more decreased key operations and Dykstra than there are

6102.191 -> removals. So this might be beneficial overall. So the question then becomes what is the optimal

6109.909 -> dairy heap degree actually used to maximize the performance of Dec Shor's algorithm? And

6116.65 -> the answer in general is that the value of d should be equal to the number of edges divided

6121.81 -> by the number of nodes. This is the best degree to use the balance removals against decreased

6128.429 -> key operations. In turn, This improves Dykstra time complexity to be a big O of E times log

6136.13 -> base e divided by V of V, which is better, especially for dense graphs, which have a

6142.96 -> lot of decreased key operations. The last thing I want to mention is the current state

6149.579 -> of the art when it comes to choosing the right heat for dextrose algorithm. And right now,

6156.87 -> the best heap we know of is the Fibonacci heap, which gives Dykstra has algorithm Believe

6163.329 -> it or not a time complexity of big O of E plus v log V, which is extremely fast. However,

6171.269 -> in practice, the Fibonacci heap is very difficult to implement, and also has a large constant

6178.079 -> amortized overhead. So it makes them slightly impractical in practice, because your graph

6184.119 -> has to be very large, or you to see the benefit. I have yet to implement one of these. So I

6190.26 -> cannot say whether they're that good, but this is just what I've read from other sources.

6196.829 -> Today we're going to have a look at some source code for Dykstra is shortest path algorithm.

6201.739 -> All right, here we are in the source code for Dec shows shortest path algorithm implemented

6208.019 -> in the Java programming language, let's have a quick run through. So in this class, I define

6216.149 -> an edge class which represents a directed edge, you'll notice that this directed edge

6221.809 -> has a certain cost of taking this edge. And it also has a destination node, which I call

6229.369 -> to the node which this edge comes from will be implicitly represented in our adjacency

6236.849 -> list. So we don't need to take care of that. So when you go and create an instance of this

6242 -> class, you need to specify the number of nodes that are going to be in this graph. And that's

6249.32 -> the variable n. Once you know the number of nodes in your graph, you can go ahead and

6254.03 -> create an empty graph, this simply initializes our adjacency lists. So as you see here, I

6262.149 -> create an empty ArrayList with n nodes. And then for each position in the list, I create

6271.78 -> another list. So this is just an empty recency list. This will help us add edges to our graph,

6278.84 -> which you can do by calling this add edge method. So when you want to add an edge, the

6284.459 -> graph you specify the node, the edge starts at the node the edge ends at and the cost

6290.63 -> of taking that edge. Remember that the cost of taking edge cannot be a negative value.

6296.98 -> All right, and then there's the just this convenience method to retrieve the constructed

6303.07 -> graph. If ever you want to have a look at that, then here comes the interesting method,

6307.94 -> which actually executes Dykstra is shortest path algorithm. So in this implementation,

6314.28 -> I provide a start node and the end node. This means we're going to try and go from a starting

6322.09 -> node index to an end node index. Note that we can also modify dextrose to give us the

6328.07 -> shortest distance to every node and not just a specific end node. So there is a way we

6334.439 -> can just remove this parameter, we don't really need it. But providing the end node allows

6339.099 -> us to do a small optimization, which is to stop early if we know we've reached that end

6345.699 -> node. So let's keep it in for now. So in the slides, in the last video, I mentioned that

6351.75 -> we can use an indexed priority queue to speed up dexterous algorithm. And this is exactly

6357.439 -> what I'm doing below, I have an implementation of a min index Dr. heap, which allows us to

6364.32 -> avoid duplicate nodes in our priority queue, I won't be going over the details of the min

6371.32 -> indexed theory per se, because I already have another video on that in my data structure

6377.13 -> series, I'll make sure to have a link to that in case you want to check it out. But to construct

6382.59 -> a min index theory heap, I compute the degree of how many children each node should have

6389.989 -> in the heap by dividing the edge count by the number of nodes. And finally, inserting

6396.189 -> that the optimal distance to the start node at the beginning of the algorithm has a distance

6401.63 -> of zero, which makes sense, then I just initialize a few arrays. So this is the distance array,

6409.69 -> which is going to keep track the minimum distance to each node. So initially, I fill that with

6415.77 -> a value of positive infinity, and I set that optimal distance to the start node has a value

6421.969 -> of zero, perfect. And then these are just two supporting arrays that track whether node

6428.749 -> II has been visited. And this prep array is going to be used to reconstruct the shortest

6436.69 -> path should we ever need to Alright, let's look at this while loop which contains the

6442.039 -> bulk of the Dykstra algorithm implementation. So while the priority queue is not empty,

6448.55 -> we're going to first get the ID of the node with the shortest distance. And we're also

6456.499 -> going to get the minimum value associated with that node. So while we're at it, we're

6463.329 -> going to mark that this node is visited so that we don't revisit it again in the future,

6468.639 -> this line right here, which says that the min value if the minimum value we got from

6474.309 -> the priority queue is greater than the already optimal distance, and the distance array for

6480.869 -> the node we currently pulled out of the queue, then we can continue this is because we already

6486.969 -> found a better path of routing through another set of nodes. Before we got to processing

6493.159 -> this node, which is fine. The next thing we want to do is get all the edges going outwards

6499.169 -> from this node. So we can reach into our adjacency list and get all the edges coming out of the

6505.479 -> current node, then we check if the node this edge wants to go to has already been visited,

6513.09 -> then we can skip that we don't want to revisit an already visited node, then we compute the

6520.57 -> new distance of going from the current node to the destination node. And we do this by

6526.84 -> reaching into the distance array grabbing the already optimal distance for that node

6531.86 -> and adding the edge cost then we try and relax the edge. So we check if the new distance

6537.63 -> is better than the distance already in the distance array at the node we want to go to

6543.82 -> remember that originally, all the indices in the distance array are set to positive

6548.51 -> infinity. So the first time we visit any node, this condition will always be true, then we

6553.91 -> just do some housekeeping stuff. So Mark that the optimal distance to get to a certain node

6560.76 -> came from the current node we're at and also update the distance arrayed have the new optimal

6566.969 -> distance, then we update our index priority queue. We do this by inserting the cost of

6574.909 -> going to a node for the first time or we try and employ a decrease key operation to update

6581.969 -> the current best distance to that node to be even better than after that loop. We can

6589.239 -> check if we've reached our end node and if we have we can return the optimal distance

6595.749 -> to it. So this is the optimization of returning early Otherwise, if we've reached the end

6604.34 -> of the algorithm, and the while loop has terminated and the priority queue is empty, then return

6610.59 -> a positive infinity. The rest of this class contains the reconstruct path method in the

6618.229 -> event that you want to actually reconstruct the shortest path from the start node to the

6623.999 -> end node. And this is pretty straightforward, simply give it the start node you want to

6628.26 -> start at the end node index, then run Dykstra algorithm, make sure that the end node is

6636.19 -> actually reachable from the start node, then simply loop through the previous array and

6642.429 -> reverse the path and return it as simple as that of all the shortest path algorithms in

6649.019 -> graph theory. Bellman Ford is definitely one of the simplest, yet I struggled as an undergrad

6655.469 -> to trying to learn this algorithm, which is part of the reason I'm making this video.

6661.539 -> So what is the Bellman Ford algorithm? In short, it's a single source shortest path

6666.949 -> algorithm. This means that it can find the shortest path from a starting node to all

6672.679 -> other nodes in the graph. As you can imagine, this is very useful. However, Bellman Ford

6679.389 -> is not ideal for single source shortest path algorithms, because it has a much worse time

6686.879 -> complexity than Dykstra his algorithm. In practice, Bellman Ford runs in a time complexity

6693.499 -> proportional to the product of the number of edges and the number of vertices, while

6698.059 -> de show can do much better at around big O of E plus v log V with a binary heap.

6707.599 -> So when would we ever use the Bellman Ford algorithm? The answer is when Dykstra does

6712.88 -> fails. And this can happen when the graph has negative edge weights. When a graph has

6719.51 -> negative edge weights, is possible that a negative cycle can manifest itself. And when

6726.05 -> it does, it is of critical importance that we are able to detect it. If this happens,

6731.959 -> and we're using Dykstra is to find the shortest path, we'll get stuck in an infinite loop

6736.669 -> because the algorithm will keep finding better and better paths. A neat application of Bellman

6743.59 -> Ford and negative cycles is in finance and economics when performing an arbitrage between

6749.649 -> two or more markets, I'm not an expert. But this is when prices between different markets

6755.61 -> are such that you can cycle through each market with a security, such as a stock or a currency,

6761.749 -> and end up with more profit than you originally started with, essentially getting risk free

6767.26 -> gains. Let's look at how negative cycles can arise. Because this seems important. Here

6775.809 -> is a graph I made with directed edges, some of which are negative. I've labeled our starting

6783.3 -> node to be node zero. And our goal would be to find the distance from zero to every other

6789.29 -> node in a single source shortest path context. But right now, we are only interested in detecting

6797.78 -> negative cycles, I will label blue nodes as regular nodes, red nodes as nodes directly

6804.469 -> involved in a negative cycle, and yellow nodes as those reachable by a negative cycle. One

6812.54 -> way negative cycles can emerge is through negative self loops. What happens is that

6818.869 -> once we reach a self loop, we can stay in that loop for a near infinite amount of time

6824.749 -> before exiting. As a result, everywhere reachable by the cycle has a best cost of negative infinity,

6832.719 -> which depending on your problem may either be good or bad. In this graph, nodes 234 and

6841.599 -> five are all reachable by node one. So they all have a best cost of negative infinity

6848.419 -> with regards to the single source shortest path problem. Let's look at another example.

6855.499 -> In this graph, a negative cycle manifests itself but not as the result of a negative

6862.579 -> self loop. Instead, through a cycle of nodes whose net gain is less than zero. If you add

6869.3 -> up the edge values one four and minus six, attached to the nose, one, two, and three,

6876.929 -> the net change is minus one. If we look at where this cycle can reach, we can see that

6883.199 -> the entire right side of the graph is affected. So hardly any notes are safe from this negative

6891.19 -> cycle. Now let's look at the actual steps involved in the Bellman Ford algorithm. First,

6896.709 -> we'll need to define a few variables. Let e be the number of edges of the graph. Let

6903.71 -> v be the number of vertices. let S be the ID of the starting node. In this case, S is

6911.809 -> short for start. And lastly, let D mean an array of size v that tracks the best distance

6921.07 -> from S to each node. The first thing we'll want to do is set every entry in D to positive

6928.78 -> infinity. This is because the distance to each node is initially unknown, and we have

6935 -> no idea how far each node is. Next, we'll want to set the distance to the starting node

6941.09 -> to be zero, because we're already there. The last part of the algorithm is to relax each

6947.84 -> edge v minus one times relaxing edge simply means taking an edge and trying to update

6954.479 -> the value from where the edge starts to where it ends. In terms of code, this is all we

6960.479 -> need to do. We loop v minus one times, then for each edge, we relax the edge. In the relaxing

6969.209 -> step, what we do is we look at the value of where the edge starts at the edge cost and

6974.439 -> see if that's better than where we're trying to go. And if so, update with the shorter

6978.809 -> path value. To actually detect negative cycles, we don't do anything too special, all we do

6987.19 -> is run the algorithm a second time, what we're doing in the second pass is checking for any

6993.53 -> nodes that update to a better value than the known best value. And if they do, then they're

7000.51 -> part of a negative cycle, and we want to mark that node as having a cost of negative infinity.

7009.36 -> Let's look at a full example. Here is a graph I made. Again, we will start on node zero,

7015.439 -> and find the shortest path to every other node. On the right, I have illustrated the

7020.999 -> distance array D. Watch the values in this array change as the algorithm executes. Right

7028.331 -> now, all the values in the array are set to positive infinity, as per the first step of

7034.849 -> the algorithm. In the second step, we set the starting nodes value to zero. Now the

7040.78 -> algorithm starts and we are on the first iteration where we attempt to relax each edge. But before

7048.019 -> we continue, I have an important note at the bottom of the screen, which says that the

7053.439 -> edges do not need to be processed in any particular order, I may process the edges with one ordering,

7060.869 -> and you process them with another ordering. And we may not end up with the same values

7065.86 -> on each iteration. But we will get the same result in the distance right at the very end,

7072.86 -> I will highlight the edge currently being processed in orange and update the distance

7078.429 -> array whenever appropriate. Right now, the best value to node one is five, because a

7085.189 -> distance of five is better than a distance of positive infinity, then nerd who gets its

7092.599 -> value updated to 25, because node one had a value of five from the last term, and the

7099.999 -> edge from node one to node two is 20 for a total of 25. Then a similar thing happens

7107.219 -> to node five, and node six as well. By the way, an edge is dark gray if it has already

7114.729 -> been processed in this iteration. Next up, node three gets its value updated from infinity

7122.869 -> to 35. Because the best value in node two, so far as 25 plus the edge cost of 10 is 35,

7132.499 -> then the edge from two to four updates to a best value of 100. Up next is an interesting

7139.599 -> edge because it was able to update node two's best value from 25 to 20 by taking the value

7147.489 -> in Node three, which is currently 35, adding a weight of minus 15 and giving us a better

7154.479 -> value of 20. So this is all the algorithm does, it processes each edge performing relaxation

7162.17 -> operations. I'll let the animation play for the rest of this iteration.

7184.3 -> So iteration one is over and there are eight more iterations to go. But for simplicity,

7189.84 -> I'll just play one more iteration. To give you an idea of how the algorithm works. We

7196.09 -> reset all the edges and start processing the edge Again, you'll notice that a lot less

7202.979 -> updating happens in the distance array this round, particularly because I unintentionally

7209.079 -> selected the edges to be processed in a more or less optimal way. So that's the end of

7217.26 -> the second iteration. If we fast forward to the end, here's the resulting distance array.

7223.709 -> However, we're not done, we still need to find the negative cycles. Let's execute the

7229.629 -> algorithm a second time, same procedure, as usual, just relax each edge. But when we are

7235.659 -> able to relax edge, update the nodes value to negative infinity Instead, let's process

7242.439 -> some edges until something interesting happens. So it appears that when we went from node

7252.309 -> two to node three, we are able to relax the edge and obtain a better value for note three

7258.53 -> than was previously there. So note three is part of some negative cycle, therefore, I

7264.489 -> will mark it as red. Similarly, note four is connected to a negative cycle, although

7272.081 -> indirectly, in the distance table, I do not distinguish between nodes which are reachable

7278.92 -> by a negative cycle, and those which are primarily involved in one. So those no way to tell them

7285.489 -> apart, feel free to add some logic in the Bellman Ford algorithm. If you need to make

7290.05 -> this distinction. Continuing on, new two is also trapped in the cycle. And the last node

7297.3 -> also affected by the cycle is no nine on the right. Let's finish up with this iteration

7303.389 -> by processing the rest of the edges.

7313.099 -> So that's it. For the first iteration, there are another eight iterations to perform. In

7318.67 -> this example, we happen to detect all cycles on the first iteration. But this was a coincidence.

7326.38 -> In general, you really need another eight iterations. This is because you want the negative

7331.67 -> cycle minus infinity values to propagate throughout the graph. The propagation is highly dependent

7338.559 -> on the order in which the edges are being processed. But having v minus one iterations

7344.26 -> ensures that this propagation occurs correctly. Alright, now I want to have a look at some

7350.749 -> source code, you can find a link in the description below. Or you can go to github.com slash William

7357.789 -> fiza slash algorithms. Here we are on GitHub in my algorithms repository. Now if you scroll

7365.769 -> down and look for Bellman Ford, under the graph theory section, you can see that currently

7373.38 -> there are two different implementations, one for graph represented as an edge list, another

7380.639 -> one for a graph represented as an adjacency list. Today, we'll have a look at the edge

7387.559 -> lists implementation. So in the edge lists implementation, first thing I do is I define

7398.659 -> a directed edge. And a directed edge simply consists of an edge that goes from a node

7407.429 -> to a node with a certain cost. Next, let's have a look at the actual algorithm itself.

7418.05 -> So from Bellman Ford, what we need is, well, a graph. So since this is an edge list, we

7425.129 -> just pass in all the edges. I'll also need the number of vertices in the graph and some

7431.359 -> starting node. And what we're returning is that distance array. All right, so let's initialize

7440.26 -> the distance array, and then populate it with this special value double dot positive infinity,

7449.17 -> then set dist of start to be zero. And then just as the pseudocode said, just loop v minus

7457.86 -> one time, then for each edge, just to relax the edge. So that's what we're doing. And

7466.011 -> here. Now this second pass of the algorithm is to detect negative cycles. So run the algorithm

7476.419 -> a second time, so loop the minus one times for each edge, relax the edge, but this time,

7483.929 -> instead of updating the edge to a value, we set the value two double negative infinity.

7491.86 -> And this is a special value defined in Java that represents negative infinity and no matter

7497.959 -> what value you add to double dot negative infinity, it will still be negative infinity.

7504.119 -> Unless you add double dot positive infinity then I think gives you double dot, not a number

7511.28 -> or something like that. And that's the entire algorithm, then we just return the distance

7516.71 -> array. If you look in the main method, it shows you how to actually create a graph,

7524.249 -> add some edges, and then run Bellman Ford and find the distance from a starting node

7530.139 -> to all other nodes in the graph. And that is Bellman Ford. Today's topic is the Floyd

7535.699 -> warshall. All pairs shortest path algorithm, we will be covering how the algorithm works,

7541.879 -> how to reconstruct shortest paths, the handling of negative cycles, followed by some code.

7547.55 -> So let's get started. In graph theory, the Floyd warshall algorithm is an all pairs shortest

7553.8 -> path algorithm. This means it can find the shortest path between all pairs of nodes.

7559.999 -> This is very important for many applications across several fields. The time complexity

7565.97 -> to run Floyd warshall is big O of V cubed, V being the number of vertices in the graph.

7573.269 -> This makes the algorithm ideal for graphs with no more than a couple 100 nodes.

7580.849 -> Before we dive too deeply into the Floyd warshall algorithm, I want to address when you should

7586.619 -> and should not use this algorithm. This table gives information about various types of graphs

7592.619 -> and or constraints in the leftmost column, and the performance or outcome of common shortest

7599.63 -> path algorithms. For example, you can see in the second row that a breadth first search,

7605.8 -> and Dykstra is can handle large graphs with lots of notes, while Bellman Ford and Ford

7611.53 -> warshall not so much. I suggest you pause the video and really go through this table

7618.079 -> and make sure you understand why each cell has the value it does. What I want to highlight

7623.969 -> is the rightmost column since we're talking about the Floyd warshall algorithm, the void

7629.039 -> washout algorithm really shines in three places. And those are on small graphs, solving the

7635.38 -> all pair shortest path problem and detecting negative cycles, you can use the algorithm

7641.219 -> for other tasks, but there are likely better algorithms out there with Floyd warshall.

7647.05 -> The optimal way to represent our graph is with a two dimensional adjacency matrix, which

7652.289 -> I will denote as the letter M. The cell m ij represents the edge weight of going from

7660.719 -> node i to node j. So in the image below, I transformed the graph with nodes A, B, C,

7667.869 -> and D into an adjacency matrix on the right. And important note, I should mention is that

7675.949 -> I assumed that the distance from a node to itself is zero, which is usually the case.

7682.55 -> This is why the diagonal has all zero values. When there is no edge between nodes i and

7691.75 -> j, set the value in the matrix M ij. To be positive infinity. This indicates that two

7699.34 -> nodes are not directly connected to each other. A very important note to make is that if your

7705.73 -> programming language doesn't support a special constant in its standard library for positive

7711.23 -> infinity, such that infinity plus infinity equals infinity, and infinity plus x equals

7717.679 -> infinity, then you should avoid using two to the power of 31 minus one as infinity.

7724.729 -> If you do so, then you will likely get integer overflow, simply use a large constant instead,

7731.489 -> as we will see the main idea behind the Floyd warshall algorithm builds off the notion that

7738.389 -> you want to compute all intermediate routes between two nodes to find the optimal path.

7745.119 -> Suppose our adjacency matrix tells us the distance from a node A to a node B is 11.

7752.57 -> Now suppose there exists a third node C, if the distance from A to C and then C to B is

7761.749 -> less than a distance from A to B, then it is better to go through node C. Again, the

7769.059 -> goal is to consider all possible intermediate paths between triplets of nodes. This means

7775.869 -> we can have something like this where the optimal path from A to B is first going to

7782.979 -> C, then going from C to B, but in the process, we actually route through another node, which

7788.53 -> I labeled with a question mark, because we've already computed the optimal path from C to

7794.189 -> B and I know that it involves an intermediate node. Similarly, we can get through Longer

7800.201 -> paths with more intermediate nodes between A and C and C and B with a smaller cost. We

7809.629 -> are also not just limited to one intermediate node in between A and C, and C and B, we can

7816.699 -> have several like in the graph below.

7818.789 -> Now the question comes up, how do we actually compute all intermediate paths? The answer

7826.059 -> is we will use dynamic programming to cache previous optimal solutions. Let dp be a three

7834.11 -> dimensional matrix of size n by n by n, which acts as our memory table, we're going to say

7842.159 -> that the cell dp at K IJ in our table gives us the shortest path from node i to node j,

7851.459 -> routing through nodes zero through Kyt. What we'll do is start by computing k equals zero,

7859.289 -> then k equals one, then k equals two and so on. This gradually builds up the optimal solution

7864.789 -> rounding through zero, then all optimal solutions writing through zero and one, then all optimal

7870.48 -> solutions writing through 01, and two, and etc. Up until we covered all nodes, at which

7878.57 -> point we have solved the all pairs shortest path problem. Let's talk a bit more about

7883.709 -> how to populate the DB table. In the beginning, the optimal solution from i to j is simply

7890.699 -> the distance given to us in the adjacency matrix. So when k equals zero, dp of K ij

7899.749 -> is equal to m ij, the value of the edge from i to j. Otherwise, in general, dp, k, i j,

7910.179 -> can be summed up with the following recurrence relation, I'm going to break it down so that

7915.369 -> we can understand all its components. Because this may look scary to some people. The left

7921.96 -> hand side of the recurrence simply says, reuse the best distance so far from itj, routing

7928.579 -> through nodes, zero to k minus one, it's important to note that the solution using nodes, zero

7936.349 -> to k minus one is a partial solution. It is not the whole picture. This is part of the

7942.46 -> dynamic programming aspect of the Floyd warshall algorithm. The right hand side of the recurrence

7948.32 -> finds the best distance from i to j, but routing through node k, reusing the best solutions

7955.659 -> from zero to k minus one. If we analyze the right side of the min function in English,

7963.28 -> it basically says, Go from itk then go from k to J. Visually, this is what it looks like.

7971.729 -> You start at I route through some notes and get to K and then from K route back to J.

7977.781 -> Currently, our algorithm uses big O of V cubed memory. Since our memo table dp has one dimension

7988.989 -> for each of k, i and j. This isn't particularly great. Notice that we will be looping over

7997.129 -> k starting from zero, then one, then two, and so forth. The important thing to note

8002.51 -> here is that previous results build off the last, since we need the state of k minus one

8010.189 -> to compute state. Okay. That being said, it is possible to compute the solution for K

8018.149 -> in place, saving us a dimension of memory and reducing the space complexity to big O

8024.409 -> of v squared. Now we have a new recurrence relation which no longer involves the K dimension.

8032.219 -> This has been replaced by the fact that we're computing the k plus one solution in place

8038.329 -> inside our matrix.

8040.92 -> Okay, that's all the theory we need. For now, let's get our hands dirty and look at some

8045.96 -> pseudocode. Below is the function that actually solves the Floyd warshall algorithm or rather

8052.84 -> executes a Floyd warshall algorithm. But before we get into that, let's look at some of the

8059.769 -> variables I have defined in the global or class scope, which I will be using throughout

8065.55 -> these functions. The first variable is the number of nodes in our graph, then is the

8071.959 -> 2d memo table that will contain our all pair shortest path solution. Last is the next to

8079.09 -> D table that we will use to reconstruct our shortest paths. Now moving on to the Floyd

8085.939 -> warshall function, you see that it takes one parameter. This is the 2d adjacency matrix

8091.789 -> representing our graph. The first thing I do in the method is call the setup function.

8097.769 -> So let's take a look at that real quick. So here we are inside the setup function, the

8101.88 -> first thing I do is I allocate memory for our tables, the DP matrix should have the

8108.139 -> same type as the input adjacency matrix. What I mean by this is if your edges in your input

8114.199 -> matrix are represented as real numbers, then your dp matrix should also hold real numbers,

8121.34 -> the next matrix will contain indexes of nodes to reconstruct the shortest paths found from

8128.869 -> running the Floyd warshall algorithm. It is important that initially this matrix be populated

8135.269 -> with null values inside the four loops, all I do is copy the input matrix into the DP

8142.53 -> matrix. Think of this as the base case or rather the K equals zero case. For the next

8149.28 -> matrix, if the distance from i to j is not positive infinity, then the next node you

8155.269 -> want to go to from node i is node j by default. Now we're back inside the Floyd warshall function.

8164.32 -> In here after the setup, loop over k on the exterior loop, it's important that k is on

8170.71 -> the exterior loop. Since we want to gradually build up the best solutions for k equals zero,

8176.78 -> then k equals one, then k equals two and so on. Followed by this loop over all pairs of

8182.999 -> nodes i and j. Inside the main body actually tests for our condition to improve the shortest

8190.05 -> path from itj going through K and update the value at dp ij. If there's a better route

8198.88 -> through K, also inside here, update the next array at ij. to point to the next index. At

8207.429 -> next ik, the last thing I want to do is to detect and propagate negative cycles. This

8216.29 -> is an optional step if you know that negative cycles will not manifest themselves within

8221.61 -> your graph. Although I still recommend you keep this function around. But before we get

8227.34 -> too far, I want to discuss negative cycles and what they entail because it isn't entirely

8233.04 -> obvious. So consider the following graph. There are basically two types of nodes to

8239.26 -> consider here. Nodes directly involved in negative cycles, and nodes unaffected by negative

8246.03 -> cycles. This red node is the cause of a negative cycle because it can endlessly loop on itself

8253.52 -> and obtain smaller and smaller costs. While these blue nodes are not directly in a negative

8259.69 -> cycle. This however, doesn't mean they're not necessarily safe from negative cycles.

8265.28 -> As we will see, negative cycles can also manifest themselves as groups of nodes working together

8273.219 -> like the following. So an important thing to ask ourselves is does the optimal path

8280.28 -> from node i to node j go through a red note. If so, the path is affected by the negative

8288.251 -> cycle and is compromised. For example, the shortest path from zero to five is negative

8295.58 -> infinity. Because I can go from zero to node two, an indefinitely loop in the negative

8302.2 -> cycle consisting of nodes, one, two and three, obtaining better and better costs before eventually

8310.099 -> going to five. This is a consequence of traversing a red node on the way to five. Some shortest

8316.58 -> paths however, avoid red nodes altogether, consider the shortest path from four to six.

8325.11 -> This doesn't involve any red nodes, so we can safely conclude that the shortest path

8329.849 -> from four to six is indeed two. So to identify whether or not the optimal path from i to

8339.17 -> j is affected by a negative cycle, rerun the Floyd warshall algorithm second time, if the

8345.1 -> best distance is better than the already known best distance stored in our table dp, then

8353.05 -> set the value in the matrix from it j to be negative infinity, also mark the index at

8360.71 -> ij in the next matrix with a minus one to indicate that the path is affected by a negative

8368 -> cycle. We will use this shortly. Back in the Floyd warshall function, all we need to do

8373.679 -> is return the matrix dp which contains the shortest distance from any node to any other

8381.05 -> node. This is the solution to the all pairs shortest path problem. The last thing I want

8386.04 -> to cover is how to reconstruct the shortest path between any two pairs of notes. This

8391.2 -> method returns the shortest path between the start and end nodes specified or know if there

8398.05 -> is a negative cycle for Check the distance between the start and end nodes is positive

8404.46 -> infinity, if so then return an empty path. Then to reconstruct the path, I create a variable

8410.97 -> called act to track the current node. And then I loop through the next array, adding

8416.55 -> the current node to the path as I go. During this process, I check if the current node

8421.93 -> has the value minus one. If it does, then this means that the optimal path encountered

8428.11 -> a red node and is trapped in a negative cycle. So return null. Notice that in reality, this

8435.29 -> method has three key returned values, and empty path, which means that the start and

8440.85 -> end nodes are not connected, a null value meaning a negative cycle was encountered.

8446.88 -> And lastly, a non empty path or node indices to mean an actual shortest path was found.

8453.07 -> Today, we're going to be looking at some source code for the Floyd warshall. All pairs shortest

8459 -> path algorithm. Here we are in the source code for the Floyd warshall algorithm. So

8466.48 -> let's get started. Let's start by looking at an example of how to use this Floyd warshall

8474.36 -> solver class to actually find all pairs shortest path. So here in the main method, the first

8482.37 -> thing I do is I actually initialize a graph with n nodes, where n is set to seven. And

8489.94 -> I create our adjacency matrix by calling the Create graph method. And if we look at up

8496.58 -> here, this is the Create graph method. And all it does is it initializes a matrix of

8502.36 -> size n by n, it fills the matrix with the special constant positive infinity. And it

8510.55 -> also sets the diagonals have all zero nodes, by default, because I assume that this is

8516.771 -> the behavior you want. If it's not, then that's not an issue, because you can just override

8522.25 -> it when you add some edge values to your adjacency matrix. Alright, so we created a matrix, we

8529.88 -> added some edge weights. And then what you'll want to do is create an instance of the solver,

8537.17 -> give it our adjacency matrix, and then call get all pair shortest path matrix function,

8544.98 -> which will return the all pair shortest path matrix as a matrix called just for a distance.

8552.42 -> And then here, all I do is I loop over all pairs of nodes i and j. And I print what the

8557.46 -> shortest path from node i to j is. Here's a sample output of what that looks like. So

8566.33 -> there can be roughly three different kinds of outcomes, we get a concrete shortest path,

8572.41 -> there does not exist the path between the two nodes, they'll be infinity, and we encounter

8578 -> a negative cycle. So that is negative infinity. Similarly, if we want to reconstruct the paths,

8585.521 -> this is how we're going to do it. Don't be scared by any of this, it's just text being

8589.66 -> printed on the screen. So here, I want to reconstruct the shortest path between all

8593.96 -> pairs of nodes. So I loop through all pairs of nodes i and j. And then on the server,

8600.38 -> again, I call reconstruct shortest path from itj. And that returns a list of nodes. And

8608.41 -> here, I just print three different options depending on when I get back. If the path

8615.44 -> is no, then there does not exist. Or rather, sorry, there exists an infinite number of

8620.19 -> solutions. If the path has zero length, there is no solution. And otherwise, I just do a

8627.58 -> pretty formatting of the output. And this is what that would look like. So just prints

8634.89 -> what the path would be between all pairs of nodes. So for instance, the shortest path

8641.85 -> from node to our node zero to no two in our graph, goes through nodes, 01, and two, and

8649.3 -> it does, it just prints all this information for all nodes in our graph, which is really

8654.16 -> useful. Okay, so what is this Floyd warshall solver actually doing and that's what we're

8662.37 -> going to look at right now.

8665.05 -> So inside that class, we have for instance variables, and the number of nodes in our

8672.11 -> adjacency matrix, a boolean value called solve, which just tracks whether we've solved the

8678.16 -> all pair shortest path problem or not our dp matrix, and a next matrix which is used

8688.131 -> to reconstruct the paths, and, oh, there's also this constant, which I just initialize

8696.131 -> to minus one so we can identify when we've reached naked cycles. Okay, so looking at

8703.1 -> the constructor, you just pass in the input adjacency matrix, and then I do some initialization.

8709.57 -> So simply allocate memory for our matrices that we're going to need, and then populate

8715.811 -> the DP matrix with whatever is given to us for our input. And also make sure to initialize

8726.311 -> the next matrix to contain j as the next value going from i to j. And that's all you need

8735.21 -> to do for the setup, nothing too complicated. Let's look at some of the methods that are

8742.1 -> provided in this class. The first one is get all pair shortest path matrix, which is the

8747.851 -> first method we called. And what that does is it looks if we've solved the all pair shortest

8754.05 -> path problem already, and if not a call is the solver. The reason I do this is so that

8760.73 -> if we want to get the all pairs shortest path matrix, multiple times that we don't want

8766.34 -> to run the solve method several times. So the solve method is what actually solves or

8774.59 -> rather executes the Floyd warshall algorithm. And here's what we're going to do to compute

8780.271 -> all pairs of shortest paths. First, we iterate through k on the exterior loop. And then we

8788.96 -> loop through all pairs of nodes, and then we check for our condition. So if the path

8796.01 -> going from i to k and then k back to j is less than the path from i to j, then update

8803.38 -> the value of i to j to route through that node k. And a while doing this also update

8811.65 -> the next matrix so that we can reconstruct the path later on. So it's is now shorter

8817.771 -> to go through igk than i to j. So update the indices for ij.

8828.35 -> This next loop is if you want to identify negative cycles, identifying negative cycles

8834.03 -> means that we need to propagate the value of negative infinity throughout our graph

8839.73 -> for every part of the graph that reaches a negative cycle. So basically, if we can improve

8847.09 -> upon the already optimal solution, then we know that we are reaching a negative cycle

8854.64 -> somehow, and that that particular edge is compromised, so simply market with negative

8861.61 -> infinity. That is again one of the special constants provided by Java. Similarly update

8867.83 -> the next matrix to also mark the node as being contaminated by negative cycle. But since

8875.3 -> next stores integer values, we can't give it the value negative infinity, which is a

8881.37 -> double. So give it the value minus one stored in reaches negative cycle. And once that is

8889.57 -> done, we have fully executed the Floyd warshall algorithm. And we can mark our boolean value

8895.71 -> of salt as true. Now if we look at reconstructing the shortest path, from the start node to

8903.13 -> some ending node, what we want to do is if we haven't done so already, run the solver

8910.26 -> and then initialize a value called path to an empty ArrayList. Look at if it's even possible

8917.7 -> to reach the end node from the start node. And if it's not return an empty path. Otherwise,

8925.41 -> populate the path with the current node which I noted. Note denoted as act and for each

8933.22 -> current node, check if we reach into a negative cycle. And if we do return null, because the

8940.17 -> best value or sorry, the shortest path doesn't exist, because there are an infinite number

8945.581 -> of shortest paths. And also make sure to check the edge case where the last note is part

8953.07 -> of an infinite loop and simply return the shortest path. Today we're going to talk about

8960.2 -> how to develop an algorithm to find bridges and articulation points in an undirected graph

8966.76 -> from a computer science perspective. For starters, let's talk about what a bridge is a graph.

8975.96 -> bridges are sometimes also called cut edges. Essentially, if you have a graph, which is

8981.511 -> a connected component, a bridge is an edge which if removed, increases the number of

8988.051 -> connected components in the graph. The name bridge makes sense because if you think about

8993.881 -> connected components as islands, then a bridge is what separates them. So For example, in

9001.09 -> this graph below, there would be three possible bridges, which are those edges in pink, because

9008.561 -> if you remove any of them, the graph is divided into two components. And articulation point,

9016.83 -> also called a cut vertex is very similar to a bridge, and that the criteria for being

9023.7 -> an articulation point is that it needs to be any node whose removal will increase the

9030.24 -> number of connected components. As an example, on this graph, there will be three articulation

9038.4 -> points, since removing any of these vertices will divide the graph in tip. As we start

9046.44 -> to think more about bridges and articulation points, we realize how important they are

9052.46 -> in graph theory. In a real world situations, bridges and articulation points often hint

9060.03 -> at bottlenecks, or vulnerabilities or weak points in a graph. Therefore, it's important

9070.47 -> to be able to quickly find and detect where these occur. We'll begin by investigating

9076.881 -> how to find bridges and then slightly modify that algorithm to also find articulation points.

9086.16 -> In the simplest way I can explain it. This is the algorithm we'll be following up to

9091.181 -> find bridges in an undirected graph. First, start at any node in the graph and begin doing

9097.431 -> a depth first search traversal labeling nodes with an increasing ID as you encounter them.

9104.53 -> During the traversal, you will need to keep track of two variables. The first is the nodes

9109.49 -> ID, which I just mentioned, and the other is the nodes low link value. During the depth

9115.92 -> first search and bridges will be found. Where the idea of the node your edge is coming from

9122.3 -> is less than the low link value of the node, the edge is going to

9129.54 -> the lowest value of a node is defined as the smallest node ID reachable from the node you're

9137.771 -> currently at when doing the depth first search, including the ID of the node itself. This

9143.171 -> is an interesting concept we'll get back to later. For now, let's look at an example.

9150.69 -> Suppose we have the following graph we've been looking at and we want to find out where

9155.29 -> all the bridges are. Let's begin our depth first search on the node at the top left corner.

9163.671 -> As we do our first search, we're going to label each node with a unique ID which I will

9169.86 -> place inside the node i will also mark nodes which are visited as yellow and the nodes

9177.341 -> which are blue as unvisited. So let's finish off our depth first search. So explore all

9184.38 -> nodes transforming undirected edges into directed ones and marking off edges or nodes rather

9192.11 -> as visited.

9204.92 -> So that will conclude our depth first search, I want to take a moment to think about what

9210.84 -> all the low link values would be for these notes. As a reminder, the low link value of

9217.54 -> a node is defined as the smallest ID reachable from that node. For now initialize all lowing

9227.92 -> values to be equal to each nodes ID. I placed the low link value of each node on the exterior

9236.63 -> of that node. If you inspect node one, you will notice that it's low link value should

9244.36 -> be zero because there exists a path of edges going from node one to node zero and node

9252.8 -> zero has an ID of zero. So we can update node one's low link value to zero. Similarly, node

9264.55 -> two is low link value should also be zero. Because node two to node zero there exists

9272.08 -> a path however, nodes three, four and five are already at their optimal low link value

9279.83 -> because there's no other node they can reach with a lower ID. However, node sixes lowering

9291.04 -> value can be a bit to five since there is a path from node six to node five via these

9300.021 -> sequence of edges. And we can also update node seven and eights low link value by the

9309.36 -> same logic. So in general, when we look at all the directed edges we have traversed,

9317.341 -> the ones which form bridges in our graph are the ones where the ID of the node you started

9322.4 -> that is less than the low link value of the node, you're going to take a moment to think

9327.85 -> about why this is true. Let's look at where these bridges actually occur. In each instance,

9336.8 -> the idea of the node with a directed edge started at is less than the loading value

9342.34 -> of the node, it's going to rephrasing that in another way, it means there was no edge

9350.32 -> connecting back to the start of the component, which is really the definition of what a bridge

9357.41 -> is. Otherwise, if there was an edge connecting backwards to the start of the component, the

9365.5 -> loading value of where the edge is pointing to, would be at least as low as the idea of

9373.28 -> the node, you started that because it would be reachable. For example, if I have an edge

9379.65 -> from node eight to node two, suddenly, the edge from node two to node five is no longer

9387.05 -> a bridge, because the loading value on node five got updated to and our bridge property

9395.61 -> highlighted in teal no longer holds. Let's take an aside and think of the time complexity

9403.47 -> of the algorithm I just presented. Right now we're doing a depth first search to label

9409.42 -> all the nodes plus v more depth first searches to find all the low length values for roughly

9418.42 -> v times v plus e in the worst case, if you're really pessimistic and careless about your

9427.61 -> programming.

9428.98 -> Luckily, however, we can do much better than this, and instead update all the loading values

9436.45 -> in one pass for a linear time complexity. Let's look at some pseudocode. on how to do

9445.86 -> this in linear time. I'll show you some actual code in the video that follows. But let's

9451.87 -> get started. In the global or class scope, I define three variables. The first is ID,

9459.63 -> which I use to label each node with a unique ID number, then I have an undirected graph

9468.311 -> G. The last is n, which is the number of nodes in the graph. Following the top level variables

9478.24 -> are three arrays, which tracked information about each node in the graph, index i in each

9485.801 -> of these arrays represents node i in the graph. So the first array tracks the ID of node i.

9494.07 -> The second array tracks the load link value of node i, and the visitor array keeps track

9499.7 -> of whether or not we have visited note I. Moving on the Find bridges function is what

9506.69 -> actually finds the bridges. In the method I iterate over all the nodes which have not

9512.46 -> yet been visited by our depth first search. This is to ensure that we find other bridges

9517.92 -> in our graph even if our graph consists of multiple connected components. Let's dive

9524.78 -> into the depth first search method which is where the real work is happening. The first

9530.13 -> argument is the current node you're at, which is node i then is the parent node which I

9538.081 -> set to minus one because there is no previous node. And last is the bridges array which

9545.67 -> we are populating. So here we are in the depth first search method itself, the arguments

9552.561 -> to the method or just as I describe them to you. The first variable is at which is the

9559.01 -> current node ID, then comes parent, the previous node ID, and the array bridges, which stores

9569.62 -> pairs of nodes which form bridges in a flat rate. In the first three lines of the method,

9577.67 -> I simply do some housekeeping stuff which is mark the current node as visited increment

9583.221 -> the ID value variable and assign the current node to have a default ID and low land value.

9590.61 -> Then we get into the actual depth first search traversal bit. So we iterate over each edge

9596.82 -> from the node we're at and attempt to go to To each node, which I've labeled two, since

9604.98 -> this is an undirected graph, there is bound to be an edge that directly returns to the

9612.3 -> node we were just previously at, which is the parent node, which we want to avoid doing.

9618.96 -> So we continue on those cases. If the next node we're going to is not visited, then we

9627.1 -> recursively call the depth first search method. The two key lines in this method are the main

9634.81 -> functions which differ ever so slightly, the first one happens on the callback, and is

9640.67 -> what propagates the low link values, while the second one is when you try to visit an

9647.46 -> already visited node, which has a chance of having a lower ID than your current low link

9654.38 -> value. Then the last bit just checks if our bridge condition is met, and appends a pair

9665.36 -> of node IDs to the bridges array. All right, now let's look at an example of all this in

9673.54 -> action. Suppose we have the following graph again, and we start our depth first search

9678.44 -> somewhere. Let's start at node zero and explore from there. So now is the first instance of

9689.1 -> something interesting happening, we're trying to visit an already visited node. Since node

9694.99 -> two is able to reach node zero from where it is, we can update its loading value. And

9704.37 -> that was the second main statement executing. Continuing on our depth first search, which

9713.851 -> takes us downward. Now we get to explore the other branch we have not visited.

9725.64 -> Again, we have in an edge which reaches out to find a node with the lower ID, so we need

9734.891 -> to update our loling value for the current node, which is node eight. Now we can update

9742 -> node sevens loling value to five, on the callback of the death for search method. This is an

9748.59 -> instance of the first main function actually doing something just to put everything into

9755.59 -> context. The red box is the line which was just invoked. And now that statement, we just

9766.3 -> saw, it gets executed for every node on the call back all the way back to the root node.

9779.5 -> And now we have the same result as before, but we did it with just one pass. So again,

9787.88 -> here are all the bridges that we found.

9794.17 -> Perfect. Now let's move away from bridges and started discussing how we can find articulation

9803.49 -> points by modifying the algorithm for bridges. A first simple observation we can make about

9811.46 -> articulation points is that on a connected component with three or more vertices, if

9818.5 -> an edge UV is a bridge, then either u or V is an articulation point. This is a good starting

9827.74 -> point because it allows us to easily find where articulation points occur. For example,

9835.4 -> consider the following graph, you will notice that there is a bridge between nodes zero

9841.921 -> and one, meaning that either node zero or node one is an articulation point. Unfortunately,

9851.511 -> this condition is not sufficient to capture all articulation points. There exists cases

9858.56 -> where there is an articulation point, but there is no bridge nearby. For example, in

9864.42 -> the following graph, node two is an articulation point because its removal would cause the

9870.21 -> graph to split into two components. So the new question is, when do these cases occur?

9877.91 -> And the short answer is that it has to do with cycles in the graph. To understand why,

9883.58 -> let's look at an example. Suppose you're traversing a graph and eventually, you somehow arrive

9891.67 -> a node zero. Initially, suppose node zero has a low link value also zero and like in

9899.33 -> any depth, Research, you would continue on to explore the graph. And eventually, if you

9908.56 -> ever encountered the node that started the cycle with an edge, its ID gets propagated

9914.381 -> throughout the cycle during the call back. So the depth first search. This is the case

9919.92 -> because we're reassigning the new loling value to equal the men of the current loading value

9926.071 -> and the ID of the node we were just visiting. You see now that node five has a loading value

9933.88 -> of zero, acquired from the ID of node zero. This gets spread or propagated as I like to

9942.601 -> say, throughout the cycle.

9948 -> Now, what you'll notice is that the ID of the node you started that is equal to the

9956.01 -> loading value of where it's going to this indicates that there is a cycle. What is key

9963.28 -> here is that the presence of a cycle implies that the node is an articulation point. This

9970.95 -> is because a cycle and a graph corresponds to a strongly connected component. and removing

9977.59 -> the node which started the cycle, who is also connected to another component will sever

9983.28 -> the graph in two. However, there's just one exception to this. And this is when the starting

9991.3 -> node you choose has either no outgoing edges, or as part of a cycle and only has one outgoing

9998.98 -> edge. This is because either the node is a singleton standalone node. That is the case

10006.56 -> with zero outgoing edges, or the notice trapped in a cycle where it only has one outgoing

10013.131 -> edge. To be an articulation point, you need to have more than one outgoing edge. For example,

10021.42 -> in the graph on the right, we start a node zero the green node and is not an articulation

10028.33 -> point, despite our condition of the ID equaling the low link value. However, as soon as we

10036.99 -> add another edge to our starting node, it becomes an articulation point. So this is

10043.53 -> something to watch out for and is unique to the starting note. Let's now take a quick

10048.851 -> look at the changes we need to do to our finding bridges algorithm to find articulation points.

10055.99 -> To begin with, we'll need a way to track the number of outcoming edges, the storing node

10063.22 -> has so I define a new variable called out edge count. Next I define a Boolean array

10069.82 -> called is art, which has true or false depending on whether or not note i is an articulation

10076.75 -> point. Ultimately, this will be the return value of the find art points function. In

10084.16 -> the body of the find art points function, I reset the edge count variable for every

10089.68 -> connected component. And after the depth first search mark, the starting node is either an

10097.11 -> articulation point or not based on how many outcoming edges were found. Inside the depth

10104.92 -> first search method, all I added was an if statement to increment the number of outcoming

10110.67 -> edges from the starting node. Besides that, I added the equals case to drag articulation

10118.23 -> points found via cycles and kept the less than keys to find articulation points found

10124.771 -> via bridges. In a real implementation, you can merge these two if statements into a single

10131.44 -> clause. However, I want to distinguish finding articulation points from bridges via those

10139.7 -> from cycles. In today's video, we're going to look at the algorithm to find articulation

10145.03 -> points in bridges, but this time with actual source code. All right, here we are in the

10149.59 -> source code find bridges, we will look at the source code to find articulation points

10154.71 -> shortly. So this source code is written in the Java programming language. And here I

10161.51 -> have a class which will find all the bridges and an undirected graph stored as an adjacency

10169.46 -> list. But before I get into the details of the code actually want to show you how the

10174.83 -> code works and how we're supposed to use it. So this is the main method that will set up

10179.101 -> the graph. But before we even do that, I'm just going to scroll down here and look at

10185.22 -> some of the methods used to actually create the graph and make something useful. So this

10191.7 -> first method will create a graph with n nodes. So I create a list of lists. type integer,

10201.46 -> which is basically an adjacency list with directed edges. So all I do is I create a

10208.59 -> new ArrayList. And then fill that list of lists with empty lists, and then return the

10214.94 -> graph. That's our graph for now. And then later on, what we'll do is we'll call this

10220.34 -> add edges method to add directed edges into the graph. So you see, first we add an edge

10229.42 -> from a node to a node and then to that node, from that node, the naming is a little confusing

10236.53 -> from into, I use from to mean the the node, the edge starts out and to to be the node

10243.22 -> the edges going to.

10247.05 -> So in this example, I have a graph with nine nodes. So I initialize n to be nine, then

10256.05 -> I create the graph and then add all my edges, you will notice that this graph is actually

10261.75 -> the graph from the slides in the last video. And then what we're going to do is we're going

10269.21 -> to pass this graph and the number of nodes into the class above, which is going to be

10278.771 -> our solver. And then the solver is going to be able to find all the bridges and return

10284.64 -> all the bridges as a list of integers. Then once you have this list of bridges, and bridges

10294.28 -> are going to be stored as pairs. So every two integers that are adjacent and, and pairs

10302.69 -> are going to be bridges. So I pull those out, and I print them, and this is the result you

10311.72 -> would expect for this graph. Alright, great. Now you're wondering how does the magic happening

10318.76 -> here. So let's scroll up to the constructor. And, actually, let's look at the instance

10325.13 -> variables. So we have n, which is the number of nodes in the graph, ID, which is that ID

10332.1 -> number to label each node. So we're gonna give each node A unique ID. And we need to

10337.12 -> keep track of well, what was the last ID, then I have two arrays, which track information

10344.17 -> about the nodes. So low is for the low link values and IDs is to track the ID of each

10352.31 -> node, we gave a node with the ID variable, then just a Boolean array to track whether

10358.71 -> or not the node was visited. And finally, the graph. So in the constructor, of course,

10364.771 -> get the graph and the number of nodes, it checks some conditions to make sure the graph

10370.2 -> is legit. Okay, so now we've constructed the object, or the solver object. And the method

10378.65 -> we're interested in is find the bridges. So the find bridges just initializes all of our

10386.741 -> variables. So set ID to be zero, initialize or allocate some memory for the low link values

10392.4 -> and the ID values and the visited array. It's good practice not to do this work into the

10398.66 -> constructor, just because if you just create a bunch of these objects, but never use them,

10404.57 -> you might surprise the person initializing the object, why they're having so much memory

10410.84 -> usage, then initialize the bridges array to be initially empty. And then we pass that

10418.24 -> into the depth for search method. It gets populated and then returned afterwards. So

10426.521 -> for each node, or node ID right now, loop through all the nodes. And if that node hasn't

10432.42 -> been visited, yet started depth first search on that note, and called depth first search

10437.941 -> method with eyes The first argument so the current node minus one for the parent, and

10444.54 -> then pass in the bridges array. So some housekeeping stuff, like any usual depth first search,

10452.34 -> visit the node, and then we're going to do is we're going to initialize the load link

10457.37 -> value, and the ID of that node to just be a unique ID, which we increment. All right,

10466.63 -> then we visit from from the current node, all the nodes, we can reach and skip, skip

10476.92 -> the node that we were just at. So that is the parent node. So we don't want to do our

10482.671 -> depth first search and then immediately returned to the node we just visited. So continue on

10486.61 -> those cases. And we'll do this because we have an undirected graph, remember. So if

10492.6 -> you haven't visited the node yet, then recursively call our depth first search method and keep

10498.1 -> probing While if you have, if you're trying to visit a node you've already visited, then

10505.01 -> you want to take the minimum of the current link value and the ID of the node you're going

10510.811 -> to.

10512.85 -> Otherwise, on the callback of depth first search method is the other low link command

10520.46 -> statement, which differs from this one slightly in that we have, we're taking the minimum

10524.94 -> now not of the idea of the node, but the low link of the other node. And, as you saw in

10531.34 -> the slides, the condition for bridge is if the ID of the node we're at is less than the

10537.25 -> low link of the node, we're going to this means we have a bridge and removing that bridge

10543.78 -> will cause the number of connected components to increase. So append both at and two, which

10551.8 -> are the node IDs of the bridge, and put them in the bridges array, and fill that up, and

10559.85 -> then eventually return that down here. So that is all for bridges. Now let's look at

10565.72 -> articulation points, which is really almost the same algorithm. So if we look at this,

10575.181 -> the only thing that's really different is we have a variable to track the number of

10579.03 -> outcoming edges from the start node or what I call the root node in this script. And other

10586.97 -> than that, differences are that we have another Boolean array called his articulation point

10594.73 -> instead of the bridges array to track bridges. And that we have to reset the number of upcoming

10601.44 -> edges for every depth first search we do. That makes sense. What else is different?

10608.271 -> Oh, yes, we have a less than or less than or equal to, as opposed to just less than

10615.12 -> two track cycles as well and mark off those as articulation points. And I think those

10625.32 -> are the major differences for didn't forget anything between articulation points and bridges.

10630.021 -> Oh, of course, we have to count the number of upcoming edges from the root. That's pretty

10636.28 -> important. And here's the same graph as before, but instead of printing bridges, it prints

10642.12 -> articulation points. So some very subtle differences between finding articulation points in bridges,

10648.15 -> but still very important ones. Today, I want to talk about a fascinating topic, and that

10654.97 -> is strongly connected components, and how we can use Tarzan's algorithm to find them.

10662.561 -> So what are strongly connected components or es CCS? I like to think of them as self

10669.44 -> contained cycles within a directed graph, where for every vertex in a given cycle, you

10676.601 -> can reach every other vertex in the same cycle. For example, in the graph below, there are

10682.47 -> four strongly connected components. I've outlined them here in different colors. If you inspect

10689.86 -> each strongly connected component, you'll notice that each has its own self contained

10696.53 -> cycle and that for each component, there's no way to find a path that leaves a component

10702.22 -> and comes back. Because of that property, we can be sure that strongly connected components

10708.4 -> are unique within a directed graph. To understand Tarzan's strongly connected components algorithm,

10716.33 -> we're going to need to understand the concept of a low link value. Simply put, a low value

10723.31 -> is the smallest node ID reachable from that node including itself. For that, to make sense,

10729.87 -> we're going to need to label the nodes in our graph using a depth first search. Suppose

10737.351 -> we start at the top left corner and label that node with an ID of zero. Now we continue

10743.15 -> exploring that graph until we visit all the edges and have labeled all the notes.

10752.43 -> Alright, now that we're done labeling the nodes inspect the graph and try and determine

10760.6 -> the low link value of each node. Again the low link value of a node is the smallest node

10766.81 -> ID reachable from that node including itself. For example, the loading value of node one

10774.23 -> should be zero since node zero is reachable from node one via some series of edges. Similarly,

10781.851 -> node for us low link value should be three since node three is the lowest node that is

10787.28 -> reachable from note four. So if we assign all the loading values, we get the following

10794.53 -> setup. From this view, you realize that all nodes which have the same loading value Do

10800.58 -> you belong to the same strongly connected component? If I now assign colors to each

10807.17 -> strongly connected component, we can clearly see that for each component, all the low end

10812.66 -> values are the same. This seems too easy, right? Well, you're not wrong, there is a

10818.62 -> catch. The flaw with this technique is that it is highly dependent on the traversal order

10826.58 -> of the depth first search, which for our purposes, is at random. For instance, in the same graph,

10834.79 -> I rearranged the note IDs, as though the depth first search started at the bottom middle

10841.98 -> node. In such an event, the loling values will be incorrect. In this specific case,

10849.04 -> all the low link values are the same. But there clearly are multiple strongly connected

10855.21 -> components. So what is going on? Well, what's happening is that the link values are highly

10861.73 -> dependent on the order in which the nodes are explored in our depth first search. So

10867.41 -> we might not end up with a correct arrangement of node IDs for our loling values to tell

10873.95 -> us which nodes are in which strongly connected component. This is where Tarzan's algorithm

10880.84 -> kicks in with its stack invariant to prevent strongly connected components from interfering

10887.39 -> with each other's low link values. So to cope with a random traversal order of the depth

10895.181 -> first search, Tarzan's algorithm maintains a set often as a stack of valid nodes from

10903.63 -> which to update low link values from how the stack works is that nodes are added to the

10910.34 -> stack as nodes are explored for the first time, and nodes are removed from the stack

10915.34 -> each time a strongly connected component is found. Taking a step back if the variables

10923.04 -> u and v are nodes in our graph, and we are currently exploring No Do you then our new

10930.87 -> low link update condition is that to update node use loading value to node V's low link

10938.53 -> value, there has to be a path of edges from u to v and node v must be on the stack. Another

10948.21 -> small difference we're going to make to finding the correct loading values is that instead

10953.3 -> of finding all the loading values after the fact, we're going to update them as we do

10959.93 -> our depth first search on the fly, if you will. This will allow us to obtain a linear

10966.39 -> time complexity.

10969.4 -> We'll be doing an example in the following slides. But this is Tarzan's algorithm nutshell.

10975.591 -> Start out and mark each node as unvisited start the depth first search somewhere and

10981.4 -> don't stop until all the nodes are visited. Upon visiting a node, assign it an ID and

10987.37 -> a low link value. Additionally, also mark the node as visited and add it to the scene

10993.56 -> stack. On the depth first search callback after the recursion comes back. If the previous

11000.23 -> node is on a stack than men, the current nodes is low link value with the last node is low

11006.19 -> and value. This is essentially what will allow loling values to propagate throughout cycles.

11013.47 -> After visiting all nodes neighbors, if the current nodes started the strongly connected

11019.95 -> component, then pop of all nodes from the stack which are in the strongly connected

11025.61 -> component. You know, a node started a strongly connected component if its ID is equal to

11031.3 -> its loling value. I'll let you think about that a bit more, and it'll start making sense.

11037.83 -> Let's do an example. I'm going to mark unvisited nodes as blue nodes for which the depth first

11044.37 -> search is still exploring some neighbors as orange and nodes, which the depth first search

11050.83 -> has explored all of its neighbors as gray. Note that if a node is orange, or gray, then

11058.21 -> it is on the stack and we can update its loading value. I will also be tracking the nodes which

11064.271 -> are on the stack in the left column. So keep your eyes peeled on that as well. So let's

11070.48 -> start our depth first search. So just randomly pick a node and start there. as we explore

11078.31 -> unvisited nodes give each node an ID and a low link value equal to the ID. So now we're

11087.23 -> at node two and our only option is to now visit node zero. Since node zero is already

11095.01 -> visited, we don't want to visit it again. So now we backtrack. All the backtracking.

11102.851 -> Since node zero is on the stack, we take the minimum of the current nodes, low link value

11109.33 -> and node zeros low link value. Similarly, now min, the low link value of the node we

11116.42 -> were just at, which is node one with node two. And also the same for node zero. Upon

11128.15 -> returning back to node zero, we realize that we've actually finished a strongly connected

11133.48 -> component. Since we visited all the neighbors have node zero and its ID is equal to its

11140 -> low link value. This means we need to remove all the nodes associated with a strongly connected

11145.61 -> component from the stack. However, we're not done exploring the graph, so pick another

11154.38 -> node at random. Let's start at node three. And go right. Now, our only option is to go

11166.42 -> down. Now we're at node five, let's take the edge to node zero. So node zero is already

11173.24 -> visited. So we can't go there. On the callback, we notice that node zero is not on the stack

11180.86 -> at the moment. So we can't min node five is loling value against node zero. This is actually

11187.461 -> very, very good, because if we did, then we would contaminate the strongly connected component

11194.19 -> node five as part of with a lower low link value, which node zero has to offer. So let's

11201.11 -> go to node six. So now we have three edges to choose from. Let's take the one on the

11207.67 -> right, node two is not on stack. So don't men with its low like value. Now let's take

11216.4 -> the left edge to node four, node four is on the stack. So we can make this low link value,

11222.88 -> giving node six also a low link value of four that the last edge we need to visit is the

11229.94 -> one going to node zero. This is a situation where node zero is not on the stack, so we

11236.62 -> can't min with its low link value. On the callback node five can min with node six is

11243.311 -> low and value because it is on the stack. Similarly, for node four. Coming back to node

11250.19 -> four, we've visited all its neighbors and its ID is equal to its lowest value. So it

11255.48 -> marks the start of a strongly connected component. So we now have to remove all associated nodes

11262.01 -> in this strongly connected component from the stack, these would be all of the purple

11267.44 -> nodes.

11273.37 -> Now coming back to node three, we cannot min its loling value with node four, because we

11278.98 -> just removed node four from the stack. You will also notice that node threes ID is equal

11285.48 -> to its loling value. So it should be the start of a strongly connected component. However,

11293.17 -> we have not finished visiting all of node threes neighbors, so we cannot make that assessment

11299.67 -> just yet. Now see the downward edge to visit node seven. Now take the edge to node five.

11310.92 -> On the callback, notice that node five is not in the stack, so we don't mean with its

11315.13 -> low link value. Now up to node three. On the callback, we can min with no threes low link

11322.92 -> since node three is on the stack. Also man with node seven. So now we've finished with

11331.01 -> the last strongly connected component, all we need to do is remove all associated nodes

11335.74 -> from the stack. And that's how tyrosianse algorithm works to find a strongly connected

11342.73 -> components. Very beautiful, isn't it? Let's look at some pseudocode. For how this works,

11350.921 -> I think it will solidify your understanding. To get started in the global or class scope,

11358.21 -> I define a few variables that we'll need. The first is a constant to represent unvisited

11365.5 -> nodes, then comes n the number of nodes in the graph, and G an adjacency list of directed

11374.17 -> edges. Both n and g are inputs to this algorithm. Then comes two variables ID to give each node

11384.14 -> an ID and s cc count to track the number of strongly connected components. After I define

11391.811 -> a few arrays which store auxilary information about the nodes not graph. The first array

11398.46 -> is IDs which As the ID of each node, then is low to store the loling values. And finally

11406.51 -> on stack to track whether or not a node is on the stack, finally is the stack data structure

11413.791 -> itself, which should at minimum support, push and pop operations. Inside the find es CCS

11422.95 -> method. The first thing I do is assign the ID of each node to be unvisited. The IDs array

11431.32 -> will be serving to track whether or not a node has been visited, as well as what a nodes

11437.53 -> ID is. In the following loop, I iterate through all the nodes in the graph. There I start

11446.47 -> a depth first search on node i, if node AI has not yet been visited, at the end, I return

11453.43 -> the array lo an array of Boolean values, which will be the final output of the algorithm.

11460.79 -> Now let's look at what's happening inside the depth first search method which is really

11464.75 -> where all the magic happens. So this is the inside of the depth first search method. The

11471.56 -> input argument to the depth first search method is a variable called at which I use to denote

11477.27 -> the ID of the node we are currently at. On the first three lines, I do some housekeeping

11483.18 -> stuff, which is add the current node to the stack, mark the current node as being on the

11489.14 -> stack, and give an ID and a little link value to the current note thing comes to the part

11496.351 -> where I visit all the neighbors of the current node. To do this, I reach into our graph store

11502.47 -> as an adjacency list and loop over a variable called two which represents the ID of the

11509.86 -> node we're going to the next line says that if the node we're going to is unvisited, then

11517.56 -> visit ID. Remember, the IDS array tracks the ID of note I, but also whether or not node

11525.69 -> AI has been visited. This next slide is very important. In fact, it's probably the most

11532.99 -> important line on the slide. The first thing to notice is that this line happens after

11539.13 -> the recursive call to the depth first search method, meaning that this line gets called

11544.52 -> on the call back from the depth first search line says that if the node we just came from

11551.19 -> is on stack, than men, the current loling value with a node we were just at this is

11557.72 -> what allows the loling values to propagate throughout a cycle. So after we finish the

11564.4 -> for loop that visited all the neighbors of the current node, we need to check if we're

11569.65 -> at the start of a strongly connected component. To check if we're at the start of a strongly

11574.15 -> connected component check if the ID of the current node is equal to the low link value

11579.54 -> for that node. After we have identified that we're at the beginning of a completed strongly

11586.33 -> connected component, pop off all the nodes inside the stripe connected component from

11591.29 -> the stack. As we're popping nodes from the stack also mark off nodes as no longer being

11597.68 -> on stack. One more critical thing we need to do while we're removing nodes from our

11603.66 -> stack is make sure that all nodes which are part of the same strongly connected component

11609.06 -> have the same ID. So here I just assigned each note have the same ID as the ID of the

11616.7 -> node which started the strongly connected component. Last things are to start popping

11622.101 -> off nodes from the stack once we reach the start of the strongly connected component,

11628.01 -> and also increment the strongly connected component count. If you want to track the

11632.96 -> number of connected components that were found. Today, we will be looking over some source

11637.95 -> code for Tarzan's strongly connected components algorithm. Here we are in the source code

11644.84 -> for Tarzan's algorithm to find strongly connected components. You'll notice that this source

11650.12 -> code is written in the Java programming language. So to get started, let's have a look at the

11656.68 -> constructor for the class. And you'll notice that it takes a graph as an adjacency list

11662.67 -> as an argument. But before we get into the details of this actual algorithm, I want to

11667.9 -> show you how the algorithm actually works in practice if you're going to execute it.

11673.47 -> So if we look at the main method, you'll notice that here is how the algorithm is meant to

11680.42 -> be used. You set up the graph and then you run the solver. So to begin with, you declare

11686.721 -> a variable called n which is the number of nodes that are going to be in your graph.

11692.311 -> Then you create the graph. So this initializes, the adjacency list for n nodes. If we look

11699.62 -> at that Create graph method up here, all the does is it initializes, the adjacency list,

11705.87 -> and then populates that with empty lists, so that we can be ready to add edges to our

11714.12 -> directed graph. So if you want to add an edge to the graph, then you would call this method,

11719.1 -> give it the graph, give it the directed edge. So from a node to another node, and then it

11726.11 -> will add that edge to the graph. So I believe this graph is the one from the slides, the

11733.9 -> very last graph, if I recall correctly.

11737.84 -> So to actually find the strongly connected components, you create an object of the solver,

11743.6 -> you give it the graph, and then you run the solver on the graph. So this is what actually

11750.21 -> finds the strongly connected components. And this will return the the array of low link

11756.181 -> values, then what I do is I dump all of these inside a multi map so we can know for each

11765.3 -> connected component, which are the nodes associated with that connected component. And then all

11771.36 -> I do is I print out which groups which nodes are part of. So you notice that I print that

11781.02 -> there are three connected components. And here are the nodes and what connected components

11787.39 -> they belong to. So that's how you use the algorithm. Let's see what it's doing. So we

11796.811 -> already went over the constructor, which passes in the graph extracts the size of the graph,

11803.97 -> and caches the adjacency list. as other instance variables, we have a boolean variable with

11814.21 -> tracks whether or not we have already solved the problem, then two variables to count the

11822.91 -> number of strongly connected components and assign an ID to each node, an array to track

11829.05 -> whether or not a node is on the stack, and then two integer arrays to track the ID of

11834.551 -> each node and the low link values of each node. And finally, a stack. So if we look

11840.26 -> at the SEC count method, it runs the solver if has not yet been solved and simply returns

11848.6 -> the number of strongly connected components. The get sccs method also simply runs the solver

11855.79 -> if it has not yet been run, and returns the loling values array. Now let's look at the

11862.76 -> solver itself. So it returns if it's already been solved, because we don't want to do more

11868.252 -> work than we need to. Inside the solve method, I initialize all our arrays, I also fill the

11876.311 -> IDS array with the unvisited token. So we can know whether or not a node has been visited

11882.45 -> or not. Recall that the IDS array keeps track of the ID of a node, but it also keeps track

11889.4 -> of whether or not a node has been visited. So iterate through each node and if node i

11895.681 -> is unvisited then start a depth first search at node i finally, mark that we have solved

11902.82 -> the strongly connected components for this graph. Inside the depth first search method,

11910.85 -> it's almost exactly like the slides. So do the housekeeping stuff, which is like push

11917.27 -> the current node on the stack, mark the current node as being on the stack, give the current

11923.9 -> node and ID and the loading value because the first time we're visiting it, then iterate

11930.25 -> over all the neighbors of this node, do a depth first search if the node we're going

11938.57 -> to is unvisited on the call back check if it's on the stack, and men that's low link

11945.64 -> value with where we were just at. And back here after we've visited all the neighbors

11954.18 -> of the node, then we check if we're at the start of a strongly connected component. And

11961.31 -> if we are we want to pop off all the nodes associated with that strongly connected component

11966.92 -> which are on the stack. So I start with a first node and my condition has to pop until

11974.19 -> I return back to the start of that strongly connected component. And as I'm popping off

11980.8 -> nodes from the stack, I mark the node as no longer being on the stack. And I also assign

11987.19 -> every node part of that strongly connected components have the same ID as the node which

11993.39 -> started the strongly connected component. Just so that we know after the fact which

11998.811 -> nodes belong to it strongly connected component, finally, increment the number of strongly

12004.211 -> connected components in case we are interested in that. And that's basically intelligence

12009.521 -> algorithm in a nutshell. Hello, and welcome to this tutorial on how to solve the Traveling

12017.46 -> Salesman Problem with dynamic programming. Today, we're going to look at two things.

12023.44 -> First is how to find the cost of the best tour, and then how to actually find that tour.

12029.98 -> All right, so let's get started. What is the Traveling Salesman Problem? In a nutshell,

12036.7 -> it's when you're given a list of cities and the distances between each pair of cities,

12042.11 -> and you want to find the shortest possible route that visits each city exactly once and

12048.58 -> then returns to the city of origin. In some other words, we can say that the problem is

12055.97 -> given a complete graph with weighted edges, what is the Hamiltonian cycle of minimum cost?

12064.93 -> A Hamiltonian cycle is simply a path which visits each node exactly once. In practice,

12072.35 -> you will probably want to represent whatever graph you have as an adjacency matrix for

12079.771 -> simplicity, if an edge between two nodes does not exist, simply set the edges value to be

12086.95 -> positive infinity. So in the graph I had, you can see that one optimal tour consists

12095.02 -> of going from A to D to C to B, and then finally, back to a, with a minimum cost of nine. Note

12103.481 -> that it is entirely possible that there are many possible valid optimal tours, but they

12110.98 -> will all have the same minimum cost. As it turns out, solving the Traveling Salesman

12118.27 -> Problem is extremely difficult. In fact, the problem has been proven to be NP complete,

12124.51 -> meaning it's very difficult to find an optimal solution for large inputs. However, numerous

12130.551 -> approximation algorithms exists, if you want to get an algorithm that runs very quickly,

12137.351 -> even for large inputs. So the brute force way to solve this problem is to actually compute

12145.86 -> all possible tours. And this means we have to try all permutation of node orderings,

12152.17 -> which will take big O of n factorial time, which is very slow. But as you can see, I've

12159.28 -> listed all the permutation of nodes and highlighted the ones which yield the optimal solution.

12167.6 -> The dynamic programming

12168.811 -> solution we're going to develop today is able to improve on this naive approach, by reducing

12175.71 -> the complexity to big O of n squared times to the end. At first glance, this may not

12182.34 -> seem like a substantial improvement. However, it now makes graphs with roughly 23 nodes

12190.661 -> give or take feasible for modern home computers. Here's a table of n factorial versus n squared

12199.44 -> to the N. At first, you notice that n factorial is optimal for small numbers. But this quickly

12206.72 -> changes favor to n squared to the n, which can give a significant improvement over n

12213.061 -> factorial. You can already see that how large the numbers get for n factorial when we hit

12220.14 -> n equals 15 versus the n squared to the N. All right, time to talk about how to solve

12229.65 -> this problem using dynamic programming. The main idea is going to be to compute the optimal

12236.65 -> solution for paths of length n, we will have to reuse information from paths of length

12244.06 -> and minus one. But before we get started, there's some setup information we need to

12250.49 -> talk about. The first thing we're going to need to do is pick a starting node s, it doesn't

12257.81 -> matter which notice picked, just make sure that this nodes index is between zero and

12264.48 -> n non inclusive. Suppose we have this graph with four nodes, and we choose our starting

12273.36 -> node to be node zero. The next thing we need to do is store the optimal value from s the

12281.86 -> starting node to every other node. This will solve the Traveling Salesman Problem for all

12289.431 -> paths with exactly two notes. The optimal value for paths with two nodes is given in

12297.16 -> the input through the adjacency matrix. And this is all the setup we need to do. Visually,

12304.52 -> if you want to look at it, we can see that we store the value from zero to one, zero

12313.69 -> to two, and finally zero to three. In the last slide, I talked about storing the solution

12322.36 -> for n equals two. But what is it we really need to store, there are two key things. The

12331.3 -> first is obvious. And that's the set of visited nodes and the partially completed tour. The

12337.49 -> other is the index of the last visited node in the path. For each partially completed

12344.55 -> tour, we need to save which node was the last node we were on so that we can continue extending

12352.85 -> that partially completed Tor. From that node we were on and not from some other node. This

12360.22 -> is very important. So together these two things, the set of visited nodes, and the index of

12367.47 -> the last visit node forum, what I call the dynamic programming state. Since there are

12376.94 -> n possible last nodes, and to the power of n node subsets, our storage space is bounded

12385.15 -> by big O of n times to the n. An issue we're going to face when trying to store the DP

12394.101 -> state is representing the set of visited nodes. And the way and I mean, the way to do this

12402.57 -> is to use a single 32 bit integer. The main idea is that if the eighth node has been visited,

12411.74 -> we flip on the eighth bit to a one in the binary representation of the integer. The

12417.982 -> advantage to this representation is that a 32 bit integer is compact, quick and allows

12424.63 -> for easy caching in a memo table. For example, on the leftmost graph, we have visited the

12434.1 -> zeroeth and first nodes, so the binary representation is 0011, if the least significant bit is on

12443.36 -> the right, similarly, the binary representation of the middle graph is 1001, or just the number

12452.33 -> nine in decimal since nodes zero and three have been visited. Now, suppose we're trying

12460.58 -> to expand on our previous state. One particular instance of a two node partial tour is shown

12467.83 -> below.

12469.23 -> What we want to do from our last node, which in this graph is no three is expand to visit

12476.73 -> all other unvisited nodes. These are the gray nodes one and two, to make our partial tour

12484.1 -> a little longer, with three notes. For this particular stage, we were able to generate

12492.311 -> an additional two states. But we would also need to do this for all states with two nodes,

12499.42 -> not just this one with zero, and three. In total, this process would result in six new

12507.98 -> states four partial tours with three nodes. This process continues with gradually longer

12516.12 -> and longer paths until all paths are of mine. And the last thing we need to do to wrap up

12525.061 -> the Traveling Salesman Problem is to reconnect the tour to the designated starting note s.

12534.44 -> To do this loop over the N state in the memo table for all possible end positions, excluding

12540.76 -> the start node, and minimize the lookup value plus the cost of going back to s. Note that

12549.12 -> the end state is the one where the binary representation is composed of all ones, meaning

12556.39 -> each node has been visited. It's finally time to look at some pseudocode. For the Traveling

12562.76 -> Salesman Problem. Just a heads up to everyone who's still a beginner. The following slides

12568.55 -> make use of advanced bit manipulation techniques. So make sure you're comfortable with how binary

12575.83 -> shifts ands ORS and x ORS work. Here's the function that solves the Traveling Salesman

12583.49 -> Problem. It takes two inputs. The first is a two dimensional adjacency matrix representing

12593.36 -> the input graph and s the index of the starting node. The first thing we do Get the size of

12600.82 -> the matrix and stored in a variable called n, which tells us how many nodes there are.

12607.381 -> Then we initialize the two dimensional memo table, the table should have size n by n to

12614.271 -> the power of n, I recommend filling the table with null values, so that programmatic errors,

12620.95 -> throw runtime exceptions. Then we're going to call four functions set up, solve, find

12629.11 -> min cost and find optimal tour. Let's begin by looking at what happens inside the setup

12637.41 -> method. The setup method is very easy, it simply does what I illustrated a few slides

12644.341 -> ago. by storing the optimal value from the start node to every other node, you loop through

12651.89 -> each node skipping over the start node. And then you cache the optimal value from S to

12659.261 -> AI, which can be found in the distance matrix. The DB state you store is the end node as

12668.36 -> I and the mask with bits s and I set to one, hence the double bit shift. Visually, the

12679.05 -> green node is the start node and the orange node is node i, which changes with every iteration.

12686.52 -> You notice now that the orange node is never on top of the green node, which is why I have

12693.041 -> a continue statement to skip that case. Now let's look at how the solve method works.

12704.23 -> The solid method is by far the most complicated, but I've broken that down to be easy to understand.

12711.931 -> The first line in the method loops over r equals three up to n inclusive. Think of R

12718.9 -> as the number of nodes in a partial tour. So we're increasing this number one at a time.

12727.04 -> The next line says for a subset in combinations, the combinations function generates all bit

12735.11 -> sets of size and with exactly our bits set to one. For example, as seen in the comments.

12743.98 -> When calling the combinations function with R equals three and n equals four, we get four

12750.24 -> different bits sets, each distinct and with three ones turned on. These are meant to represent

12760.39 -> a subset of visited nodes.

12764.52 -> Moving on, notice that I enforce the node s to be part of the generated subset. Otherwise,

12772.421 -> the subset of nodes is not valid since it could not have started at our designated starting

12778.97 -> node. Notice that this if statement calls the not in function defined at the bottom

12787.261 -> of the slide. All it does is it checks if if the bit in the subset is a zero. Then we

12797.24 -> loop over a variable called next, which represents the index of the next node. The next node

12805.16 -> must be part of the current subset. This may sound strange, but know that the subset variable

12812.81 -> generated by the combinations function has a bit which is meant for the next node. This

12819.73 -> is why the variable state on the next line represents the subset excluding the next node.

12828.2 -> This is so we can look up in our memo table to figure out what the best partial tour value

12835.72 -> is when the next node was not yet in a subset. Being able to look back and reuse parts of

12842.72 -> other partially completed tours is essential to the dynamic programming aspect of this

12850.17 -> algorithm. The following variable to consider is E short for end node. Because I ran out

12858.36 -> of room this variable is quite important because while the next node is temporarily fixed in

12866.86 -> the scope of the inner loop, we try all possible end nodes of the current subset and try to

12874.601 -> see which end node best optimizes this partial tour. Of course, the end node cannot be any

12883.51 -> of the start node, the next node

12886.73 -> or

12888.78 -> not be part of their current subset that we're considering. So we skip all those possibilities.

12895.16 -> So we compute the new distance and compare it to the minimum distance. If the new Distance

12900.66 -> is better than the minimum distance, then we update the best minimum distance. afterwards.

12907.23 -> Once we've considered all possible end nodes to connect to the next node, we store the

12913.88 -> best partial tour in the memo table. And this concludes the solve method. The only unanswered

12921.48 -> question in this slide is how the combinations method works. And I do not mean to leave this

12930.89 -> unanswered. So let's see how this gets done. This method is actually far simpler than you

12939.88 -> might imagine, for what it does. What the first combinations method does is it fills

12946.32 -> up the subsets array using the second combinations method, and then returns that result. So what

12953.3 -> does the second the combinations method do? I already covered this in more detail in my

12960.07 -> tutorial, backtracking the power set if you want more detail, but I'll give you a quick

12966.14 -> rundown of what this recursive method does, basically, starting with the empty set, which

12971.93 -> is zero, you want to set r out of n bits to be one for all possible combinations. So you

12980 -> keep track of which index position you're currently at, and then try and set the bid

12986.17 -> position to a one and then keep moving forward, hoping that at the end, you have exactly our

12993.25 -> bits. But if you didn't, you backtrack flip off the bit, you flipped on and then move

12999.73 -> to the next position. This is a classic backtracking problem, you might want to research. This

13007.39 -> is how you solve it. But I don't want to focus on this in this video, per se. So I want to

13014.65 -> get back the Traveling Salesman Problem. Watch my backtracking video on the power set for

13020.17 -> more guidance. If you're lost, I'll try to remember to put a link in the description.

13027.49 -> So right now in our memo table, we have the optimal value for each partial tour with a

13033.91 -> nose. So let's see how we can reuse that information to find the minimum Torah value. The trick

13043.56 -> is going to be to construct a bitmask for the end state and use that to do a lookup

13049.66 -> in our memo table. The end state is the bitmask with n bits set to one which we can obtain

13057.55 -> by doing a bit shift and then subtracting one.

13062.21 -> Then

13063.21 -> what we do is look at each end node candidate and minimize over the Tor costs by looking

13070.45 -> at what's in our memo table, and the distance from the end node back to the start node s.

13076.41 -> The last method we need to look at is the find optimal tour function because what good

13083.78 -> is our algorithm if it cannot actually find you what the optimal tour is. For this method,

13092.23 -> what we're going to do to find the actual tour is work backwards from the end state

13097.43 -> and do lookups in our memo table to find the next optimal node, we will keep track of the

13104.49 -> last index we were at. And the current state which begins with all visited nodes, then

13111.83 -> we loop over I from n minus one to one which tracks the index position for the tour. to

13120.921 -> actually find the next optimal node going backwards, we're going to use a variable called

13126.13 -> index which will track the best note the inner loop loops over j which represents all possible

13135.69 -> candidates for the next node, we must ensure that j is not the starting node that is part

13143.311 -> of the state meaning it has not yet been visited. If this is the first valid iteration, the

13151.729 -> variable index will be set to minus one. So sell it to J otherwise compare the optimal

13158.88 -> values of the best distances between nodes index and J and update index if node j is

13166.67 -> better. Once the optimal index is found, store that as part of the tour and flip off the

13173.53 -> bit in the state which represents the index note. Finally set the first and last nodes

13180.311 -> of the tour to be as the starting node because the tour needs to start and end. On that note,

13188.67 -> then simply return the tour. And that is how you solve the Traveling Salesman Problem with

13196.74 -> dynamic programming. Hello and welcome to This video on the Traveling Salesman Problem

13202.311 -> with dynamic programming. Today we're going to have a look at some source code. All right,

13207.88 -> here we are in the source code for the Traveling Salesman Problem with dynamic programming.

13213.63 -> This is the iterative implementation. If you look in the repository, you should see that

13220 -> there is also a recursive implementation if you are interested in that this implementation

13226.33 -> is in the Java programming language, but you should be able to translate it pretty easily

13230.47 -> to any programming language. So let's get started. So if we want to solve this problem,

13236.99 -> we're going to have to create this object called TSB dynamic programming iterative and

13244.78 -> it has two constructors, one with a distance matrix as an input. And the other optional

13253.95 -> constructor is the distance matrix, but also with a designated starting node. So by default,

13264.88 -> I have the starting node to be zero, but you can set that to be whichever node you like.

13271.48 -> And then I simply store how many nodes are in the graph. And then check for some edge

13278.79 -> cases, I haven't supported n equals two yet, but that should be pretty trivial to do. And

13288.31 -> then just check some edge cases, make sure the matrix is square, you know, just that

13293.02 -> kind of stuff. And then I cache the start position and the distance in these instance

13300.811 -> variables. And then here are the two methods that you will be interested in the first called

13307.811 -> Get a tour, and it returns a list of integers representing the optimal tour for the input

13316.021 -> graph. And this other method called get tour cost returns the minimum tour cost. And notice

13325.82 -> that they both call the solve method if the solver has not been run yet. I could call

13332.93 -> the solve method in the constructor. But that's generally considered bad practice to do work

13339.07 -> in the constructor. So I leave it up to the methods to call the solve method. Or you can

13345.18 -> explicitly call it yourself doesn't matter. So the solid method is what basically solves

13351.37 -> the traveling salesman person problem.

13355.14 -> So the first thing I do is I initialize a variable called the end state. And this is

13358.86 -> the state with all nodes visited. So all bits are set to one. Then I initialize a memo table

13368.78 -> of size and times to the end, and it takes type double. So initially, this entire table

13378.02 -> is filled with null values. And then I do an initialization step, where I add all edges

13387.86 -> from the starting node to every other node, which is not start node. So this is like the

13396.271 -> first step in the slides, if you remember correctly, and then you set it equal to the

13403.06 -> value in the adjacency matrix. Then we start the phase where we're trying to create tours

13413.36 -> of path that are one a longer. So R is once again, the number of nodes in the partially

13423.24 -> completed tour. Then we loop through all subsets with our bits set produced from our combinations

13434.96 -> function, which is below. I guess I'll jump to that right now. So that's right here. And

13442.62 -> this method basically generates all the bit sets of size and where our bits are set to

13448.46 -> one. And then you can see that the result is returned in this variable called subsets.

13455.1 -> So this is the combinations method and then this calls the private combinations method

13461.04 -> down here. So ignoring this part, which is just an optimization, if r is zero, meaning

13470.15 -> we've selected exactly our elements, then we find found the ballots subset and then

13475.59 -> add it to our subsets array. Otherwise, we flip on the ice bit recursively call the method

13483.88 -> and then backtrack and flip off the ice bit. Alright, so going back over here. Now we make

13496.26 -> sure that the starting node is Inside the subset, otherwise, we're not going to be able

13503.31 -> to create a valid tour.

13506.58 -> Next,

13508.34 -> Next, we loop over the variable called next, from zero to n. And the next node is going

13516.729 -> to be our next slide target nodes, the one we're trying to expand to, if you will. So

13524.271 -> we make sure that the next note is not the starting node. And we also make sure that

13530.41 -> it is in the subset produced by the combinations function. Otherwise, we're not interested

13537.12 -> in it. Then we generate the mask, which is called subset without next. And this is the

13546.9 -> the state or the partially completed tour without that next node. So we basically flip

13556.62 -> off the next node and set it to zero. And this allows us to do a lookup in our memo

13564.66 -> table later on. So we can compute the new distance. But before that, we initialize a

13573.81 -> variable called min distance, which I initialize to positive infinity. And this is the variable

13579.11 -> we're trying to minimize for the next node. Then, for every possible end node, every possible

13589.69 -> end node, which is not the start node, or the end node, and is part of our subset, we

13595.951 -> calculate the new distance from the end node using the subset without next, and then from

13606.66 -> the end node to the next node. And then if that new distance is less than the global,

13612.91 -> or sorry, the just the min distance we declared up here, then just update the min distance.

13620.79 -> And finally cache that in the memo table. So this is the bulk of the algorithm right

13628.07 -> here. But we're not done yet. We still want to calculate the minimum cost, like the overall

13636.74 -> minimum cost of the optimal tour. And to do that, we simply loop from I zero to n, skipped

13646.01 -> over the starting node. And, and then do a look up in our table for that end node, I

13656.601 -> and the state and state. So we finished a tour. And the tour ended on node i and then

13665.27 -> go from I, which we ended on back to the start node. So that's the tour cost. Now we just

13672.28 -> minimize over this variable and update the mentor costs, which if we go back, you can

13679.5 -> see was one of our instance variables, which had set the positive infinity. So we're minimizing

13685.771 -> this. And this is why it gets returned on the get tore cost function. All right. So

13694.979 -> this finds the minimum tour cost. And this section you see right here finds what the

13704 -> actual torque is, which is really useful, and does that by looking inside the the memo

13710.96 -> table at the values we've computed. So we initialize a variable called the last index,

13718.771 -> and it initialized the starting node, because that's essentially the very last node if you

13724.99 -> want, when we do the tour, we end up at the start node again. And the state is the end

13730.07 -> state. So we're working our way backwards. So we start at the end state and then we're

13734.37 -> going to slowly, I guess, reduce our tour until we're back to the starting node. So

13744.16 -> So in our tour, we add that starting node. And then we're going to loop n minus one times.

13755.86 -> And this variable i is just for, for counter. So it's not, it's not used anywhere in here.

13767.14 -> So we loop n minus one times and for this index variable, so this is like the, the node

13775.69 -> we want to to go to next. So it's the best as the index of the best next node. But define

13785.34 -> that next best node, we need to look at where we were last, which is the last index and

13793.86 -> go to the next best node which is going to be j so we loop over We're all possible j

13803.12 -> nodes, if you will start j at zero and loop up to n, and then skip over when j is equal

13811.4 -> to the start or is not in the state, because we would have already visited a node otherwise.

13819.65 -> And if index is minus one, then it's the first valid node we encounter. So set index equal

13826.881 -> to J. Otherwise, look at the previous distance. So for the node at index versus the node j,

13839.08 -> and then if selecting node j gives us a smaller value, then we know we want to update index

13847.93 -> to be J. And, and doing this for all of the nodes will find us the next best node going

13858.12 -> backwards, then we want to add that nodes index to the tour, and then toggle the that

13867.88 -> bit off, and then set the last index to be the current index. So we're going backwards

13876.28 -> and basically starting from a fully completed tour and like shrinking the tour down to just

13884.28 -> the starting node again. And at the very end, we want to add that starting node to the tour,

13892.02 -> and then reverse the order of the tour. This is because we're going backwards, we're starting

13897.79 -> at the end state and then working our way backwards. So our tour is in effect in reverse

13905.96 -> order. So we want to reverse the tour order. And then we can mark the solver as completed.

13915.08 -> And tour if we look up here was just a list of integers. And tour is the variable we return

13922.76 -> when we call get tour. The only thing I did not cover was this not in function, which

13932.96 -> just checks if a bit or the element was not set in subset, so you check if that bit is

13944.21 -> equal to zero. Today we're going to talk about oil arian paths and circuits. From a computer

13951.05 -> science perspective. We're going to start with discussing what Euler paths and circuits

13956.12 -> are, how to determine their existence, how to find them. And lastly, we're going to look

13961.33 -> at some code to wrap things up. Let's begin with what an oil arian path

13967.161 -> is.

13968.311 -> an Euler path, also called an oil area and trail is a path of edges in a graph that visits

13975.479 -> every edge exactly once. Suppose we have the undirected graph below and we want to find

13983.03 -> an Euler return path. First off not every graph has an oil arian path this one does,

13989.55 -> but even still, we need to be careful about which node we start our path at. Suppose we

13995.741 -> begin the path at the middle rate node and decide to follow the path left Down, up up

14002.55 -> began and finally left this completes the oil arian path. However, suppose we start

14009.45 -> at the top node. What happens if we decide to find a path from this node? If we take

14016.41 -> the edge going down, you'll notice that we are now stuck. We cannot go anywhere else

14022.51 -> from this node since there are no edges left to follow. More importantly, the issue is

14028.53 -> that we have unvisited edges that we still have not used or traversed. So we'll see how

14035.67 -> to resolve or rather avoid this issue altogether later so that we always find an oil arian

14043.51 -> path when we no one exists. Moving on let's talk about oil arian circuits, also called

14050.42 -> oil arian cycles and oil arian circuit is an oil layer in path which starts and ends

14057.229 -> on the same vertex. So similar to oil arian paths, not every graph has an oil arian circuit,

14066.021 -> but the following graph does. If you know your graph has an oil arian circuit, then

14071.75 -> you can begin the circuit at any note. I'm going to begin the circuit on the orange note

14078.16 -> and also end it on the orange note.

14091.77 -> And that's the full circuit if your graph does not contain an oil arian circuit, you

14096.931 -> may not be able to return to the start node or you will not

14100 -> Be able to visit all the edges of the graph. For example, let's start another circuit starting

14105.83 -> from the same node on this slightly modified graph.

14116.15 -> So by randomly selecting edges to traverse, we weren't able to make it back to the starting

14121.521 -> node. Furthermore, we also have unvisited edges, so that's double bad. Luckily for us,

14128.53 -> we don't have to guess whether or not a graph contains an oil arian path or an oil arion

14134.17 -> circuit, we can inspect the graph we're dealing with by counting the in and out degrees of

14139.86 -> each node to determine whether or not the graph meets one of the conditions. In this

14146.14 -> table. There are four flavors of or Larian paths and circuits that we care about. And

14154.07 -> those are whether the graph is directed or undirected, and whether or not we want to

14158.86 -> find an Euler in path or an ordinary circuit. All of these variants talk about no degrees.

14165.55 -> So I want to have a quick look at that before coming back to this table. The degree of unknown

14169.979 -> means different things depending on whether the graph we're dealing with is directed or

14175.44 -> undirected. In an undirected graph, the node degree is simply how many edges are attached

14181.66 -> to a particular node, the blue node in this picture has three edges attached to it. So

14186.69 -> it's degrees three in a directed graph, there are two forms have no degrees there are in

14192.62 -> degrees and out degrees. Because the edges are directed the end degree is the number

14198.66 -> of incoming edges to a node and the out degree of a node is the number of outgoing edges

14204.4 -> from that node. So in the example on the right, the end degree of the notice to while the

14209.43 -> out degree is one pretty simple. Coming back to the table, you should be able to understand

14215.52 -> the constraints required for each variant of the oil arian path and oil arian circuit

14222.021 -> problem. However, let's go over them one by one anyways, the simplest cases when we have

14227.48 -> an undirected graph and we want to find an oil layer in circuit the requirement for this

14232.9 -> is that every node in the graph has an even degree. The oil arian path problem on an undirected

14239.36 -> graph is very similar, except that in addition that every vertex has an even degree you can

14246.75 -> also have exactly two vertices which have an odd degree those two vertices, if they

14253.03 -> exist, would be the start and end nodes of the oil arian path I a directed graph, you

14258.98 -> can have an Euler circuit if every vertex has an equal internet degree. This is the

14264.77 -> counterpart to the undirected graph version. The last variant is finding an Euler path

14271.21 -> on a directed graph for there to exist in over there and path on a directed graph. at

14276.42 -> most one vertex has an out degree minus and in degree, which is equal to one and at most

14282.58 -> one vertex as an indie GRI minus out degree equal to one and all other verse vertices

14288.33 -> have equal internet degrees. So it's now Quiz time, and I'm going to make sure you've been

14293.95 -> paying attention. I'm going to present to you various graphs, and you need to determine

14299.53 -> whether the following graph has an oil layer in path and over there in circuit or both.

14305.27 -> So we'll start with undirected graphs, and then later move on to directed graphs. Please

14310.251 -> feel free to pause the video to think things over. So this graph has no oil arian path

14319.97 -> or circuit you can tell because there are too many nodes with an odd degree. How about

14326.79 -> this graph? Again, feel free to pause the video. This graph has an oil arian path and

14336.04 -> the green nodes represent the valid Start and End notes

14340.19 -> for

14341.19 -> the oil arian path. What about this graph? This graph has both an oil arian path and

14351.15 -> an oil arian circuit. As a side question, true or false, if a graph has an oil arian

14358.53 -> circuit, it also has an oil arian path like give you a moment to think about the answer

14366.65 -> is true. Any circuit is an oil arian path. Here's another one, are there any paths or

14373.8 -> circuits in this graph? This one is a bit of a trick question, but there are no order

14384.521 -> in paths or circuits here and the additional requirement I have not yet mentioned is that

14389.81 -> when finding paths and circuits is that all vertices with nonzero degree need to belong

14397.08 -> to a single connected component and here We have two connected components. So we cannot

14403.25 -> have an overlay or in path or circuit. Now let's have a look at an example with a directed

14409.82 -> graph. Does the following graph have any or they're in paths or circuits? I'll give you

14415.1 -> a moment to think about it. Yes, this graph has both an Euler path and an Euler in circuit

14425.229 -> because all in and out degrees are equal. What about this graph? This graph has no oil

14437.01 -> arian paths or circuits. The red nodes either have too many incoming or outgoing edges for

14443.78 -> an oil arian path or circuit to exist. What about this graph, I'll give you a bit more

14448.931 -> time because there are a lot of edges.

14458.761 -> This graph only has an Euler path, but no Euler in circuit, it also has a unique start

14465.2 -> and end node for the path. Note that the singleton node has no incoming or outgoing edges, so

14473.561 -> it doesn't impact whether or not we have an oil arian path. Today we're talking about

14480.13 -> how to algorithmically find oil, Larian paths and circuits on graphs. So finding oil arian

14489.229 -> paths and or they're in circuits are actually very similar problems for both directed and

14494.51 -> undirected graphs. If you have an algorithm that finds an oil arian path, finding oil

14500.4 -> arian circuit comes for free, all you need to do is feed the graph with the oil layer

14505.41 -> in circuit into the oil area and path algorithm and outcomes the oil arian circuit. For that

14511.761 -> reason. Today we'll be looking at an algorithm that finds an oil arian path on a directed

14517.229 -> graph. So the first step to finding an oil arian path is to verify that one exists, because

14524.02 -> maybe it's impossible to find an oil arian path that traverses all the edges of your

14529.34 -> graph. And it's good to know that before you actually find your ordering path. So recall

14534.981 -> that for now, or they're in path to exist, at most one vertex has a degree minus in degree

14541.41 -> equal to one and at most one vertex has in degree minus out degree equal to one and all

14546.8 -> other vertices have equal in and out degrees, we're going to count the in and out degrees

14552.37 -> of each node. By looping through all the edges, we'll be needing two arrays, which I've called

14558.18 -> in and out to track the in and out degrees of each node. So for each edge, increment

14566.02 -> the integral of a node if a node has an incoming edge and increment the out degree if it has

14572.83 -> an outgoing edge, and

14582.08 -> so on for all the other edges. Once we've verified that no node has too many outgoing

14589.261 -> edges, or too many incoming edges, and there are just the right amount of Start and End

14596.08 -> nodes, we can be certain that our oil arian path exists, the next step is to find a valid

14603.54 -> starting node. Because we can't start the algorithm at any node we choose necessarily.

14610.181 -> node one is the only node with exactly one extra outgoing edge, so it's our only valid

14616.94 -> starting node. Similarly, node six is the only node with exactly one extra incoming

14622.93 -> edge so it will end up being our ended node. Note that if all in and out degrees are equal,

14630.51 -> then we have an oil arian circuit. And we can choose to start the algorithm at any node

14636.27 -> which has a nonzero degree. So we have everything we need to find an oil arian path. Let's see

14642.81 -> what happens if we try and do a naive depth first search to traverse as many edges as

14649.5 -> possible until we get stuck. Let's begin at our starting note and execute a random depth

14656.41 -> first search. Let's take a write another write up, down diagonally up diagonally right. And

14665.229 -> right again, you'll notice that even though we started at the correct starting node, and

14671.16 -> that we knew in oil arian path existed, and furthermore, that we did end up at the correct

14677.8 -> end node that we still did not find the valid oil arian path since we didn't traverse all

14683.26 -> the edges. So what's going on? Well, what's happening is that we're doing our depth first

14689.87 -> search wrong, we need to modify the depth first search algorithm to force the depth

14695.479 -> first search to visit all the edges of our graph to illustrate this Consider this simpler

14702.32 -> smaller graph. Suppose we start our depth first search at node zero and try to find

14708.13 -> an oil arian path. Suppose we take the edge the right, then suppose the depth first search

14713.86 -> takes us right. Again, this causes us to accidentally skip the edges going to node two and back,

14721.32 -> which we know will need to be part of the oil era and path solution. For now let's not

14726.36 -> worry about it and keep executing our depth for search. So once we get stuck, meaning

14732.12 -> the current node has no unvisited outgoing edges, we backtrack and add the current node

14738.671 -> to the solution. So four gets added to the solution and we return to the node we were

14743.021 -> just at. We are stuck again because node three has no outgoing edges that are unvisited.

14749.06 -> So we add three to the front of the solution and backtrack when backtracking if the current

14755.351 -> node has any remaining unvisited edges, that is white edges, we follow any of them calling

14762.57 -> our depth first search method recursively to extend the ordering path, so we follow

14768.01 -> the edge up to node two, and then there's still another edge going downwards. So we

14772.51 -> take that one too. Now we're stuck again, because there aren't any unvisited edges anymore,

14778.49 -> what we do is we backtracking add the current node to the front of the solution. Effectively,

14783.38 -> we do this until we return to the start node and the recursion unwinds. So in summary,

14792.561 -> how we forced the depth first search to take all the edges is to keep taking unvisited

14797.89 -> edges on the recursive call back until no unvisited edges remain. Coming back to the

14803.82 -> previous example. Let's restart the algorithm. But this time, let's track the number of unvisited

14809.971 -> edges, we still have left to take at each node. In fact, we have already computed the

14815.6 -> number of outgoing edges for each node in the out array, which we can reuse, we won't

14822.09 -> be needing the inner array anymore once we validated that an Euler in path exists, so

14828.341 -> we can ignore it. Let's begin at the starting node once again. Now one thing we're going

14834.46 -> to do slightly differently is that every time an edge is taken will reduce the outgoing

14841.08 -> edge count for that node. Doing this will enable us to know when a certain node has

14846.66 -> no more unvisited edges. So let's just follow the same path we had last time until we get

14852.7 -> stuck.

14860.561 -> So now we are where we were last time, but we're not going to terminate the algorithm

14864.78 -> just yet. Instead, we're going to backtrack because we're stuck and there are no more

14869.979 -> outgoing edges to take from node six. One way to know this without looking at the graph

14876.189 -> is to check whether the outer array at index six has a value of zero, and it does. So let's

14883.65 -> backtrack and add six to the front of our solution. Now we are at node four and node

14889.59 -> four has remaining unvisited edges, those are the white edges, which we still need to

14894.83 -> take. So we call our depth first search method recursively and follow all the unvisited edges

14901.75 -> for note four, similar situation at node three, and node one and node two. For node two, we're

14909.81 -> going to take the edge going to the right, which brings us back to node four. But this

14915.551 -> time there are no more unvisited edges at node four. So what do we do, we backtrack

14922.29 -> and add four to the front of our solution. Now we're at node two and node two still has

14927.681 -> an unvisited edge since the outer array at index two is not equal to zero. So what we

14934.01 -> do is we follow that unvisited edge, which brings us back to node two, and node two now

14940.52 -> has no more unvisited edges. So we backtrack and add to the solution. And we're back at

14945.95 -> node two, and we backtrack now we're at node one, and backtrack from node one. Now we're

14951.95 -> at node three, and so on since all the edges have been visited, and at this point, we're

14957.391 -> just going to unwind the stack and add the current note to the front of the solution.

14962.18 -> I'll let the animation play.

14972.08 -> And that's how you find an oil arian path on a graph. In terms of the time complexity

14977.321 -> required to find an oil arian path, we know that it has to be big O of E. The reason is

14984.02 -> that the calculations we're doing to compute the oil arian path are all linear in the number

14989.521 -> of edges. Think about computing the internet degrees or the depth for search both of those

14995.97 -> only take big O of a time. So the whole thing is linear. And the number of edges. And now

15002.02 -> let's have a look at some pseudocode. To find an oil arian path, let's have a look at some

15008.04 -> of the variables we're going to need. The first three here are inputs to the algorithm,

15014.19 -> which are n, the number of nodes in the graph, M, the number of edges in the graph. And lastly,

15022.29 -> g the graph itself stored as an adjacency list. Then there's the in and out arrays I

15030.15 -> talked about earlier to track the in and out degrees of every node. Lastly, there's a variable

15036.32 -> called path, which is a linked list which is going to store the oil arian path solution.

15043.49 -> You can also use an array or some other data structure to store the solution. But I find

15050.33 -> that a linked list simplifies the code to actually find an oil arian path on our graph

15056.75 -> G we're going to call the find oil arian path method. The first thing we want to do is verify

15064.4 -> that an oil arian path exists. To do that, we first need to count the in and out degree

15070.241 -> of each node. And once we know that, we can verify that the graph is a good candidate

15076.2 -> for nor Larian path. So here we are looking at the methods which count the in and out

15082.08 -> degrees of each node. And verifies that Euler and path can exist. The count in and out degrees

15088.26 -> method is very simple. Simply loop over all the edges in the graph and increment the internet

15095.77 -> degree arrays for incoming and outgoing edges. The graph has Euler and path method checks

15103.08 -> all the preconditions for an oil arian path we're going to keep track of the number of

15109.87 -> start nodes and n nodes that we encounter. A start node is a node with one extra outgoing

15116.091 -> edge and an end node is a node with one extra incoming edge. If at any point we encounter

15124.07 -> a node which either has more than one extra outgoing edge or more than one extra incoming

15131.03 -> edge, we know that this graph is not oil, Larian. And we can return false immediately

15137.07 -> because of symmetry I believe you only need one of these checks. But to be explicit, I

15143.13 -> put both conditions there. Next up, I check if the current node is a start node or an

15149.55 -> end node, you cannot be a start node and an end node which is why this is an else if clause.

15158.25 -> The last thing to do is check if we have a valid number of start nodes and add nodes

15163.26 -> for our path either there are no designated Start and End nodes that is the oil arian

15168.74 -> circuit case which is also an Euler path, or there are exactly one start node and one

15174.77 -> and nodes. Coming back to the main method. The next step is to find that starting node

15181.07 -> and perform a depth first search to actually find the oil there in path. Let's begin with

15186.729 -> finding the starting node. We're going to start by assuming that the start node is node

15191.76 -> zero, although this will likely change in the future. Since we know that at this point,

15198.021 -> our graph is an Euler area and graph. This means that if we encounter a node with one

15204.04 -> extra outgoing edge that this node must be the unique starting node and we can return

15209.82 -> that nodes index immediately. Otherwise, we just want to ensure that we begin on a node

15216.67 -> with an outgoing edge, our default node node zero might not have an outgoing edge. In fact,

15223.47 -> this check prevents us from starting the depth first search on a singleton node, then return

15230 -> the start node after the loop. The depth first search method is where things start to get

15235.05 -> interesting. This depth first search method takes one argument and that is the current

15239.851 -> node. We're at the while loop in the depth first search loops while the current node

15245.17 -> still has outgoing unvisited edges. It does this by looking in the outer array at the

15252.101 -> current node and checking if there are still outgoing edges. The next line selects the

15257.29 -> next unvisited outgoing edge from the current node from our adjacency list. It also decrements

15264.32 -> the number of outgoing unvisited edges from the current node. So if you haven't caught

15269.5 -> on already, the outer array is currently serving two purposes. one purpose is to track whether

15275.67 -> or not there are still outgoing edges and the other is to index into the adjacency list

15281.62 -> to select the next outgoing edge. This assumes the adjacency list stores edges in a data

15288.08 -> structure that is indexable and constant time just like me, right? If not, say you're using

15294.09 -> an adjacency list composed of linked lists, then you can use an iterator to iterate over

15300.06 -> For all the edges once we've selected the next unvisited edge, we visit that edge by

15306.15 -> calling the depth first search method recursively. Once we exit the loop, append the current

15312.18 -> node to the front of the solution. Returning back to the main method. The last thing we

15318.23 -> need to do is check that we have actually found the correct number of edges for an oil

15326.52 -> arian path. It might be the case that our graph is disconnected and we found an oil

15332.18 -> arian path on one of the many connected components of our graph, in which case it's impossible

15338.5 -> to actually have an oil arian path so we return null in that case, otherwise, we simply return

15345.39 -> our path Today we're going to look at some source code for the oil arian path algorithm.

15351.76 -> Awesome. Here we are in the source code for the oil arian path algorithm. This code works

15358.54 -> by first instantiating this oil Larian path solver class and then calling a method to

15365.551 -> fetch the oil arian path itself should it exist. Let's begin by taking a look at the

15372.52 -> class constructor in the constructor, what you do is you pass in a directed graph to

15377.391 -> the algorithm as input and then the constructor verifies that you actually passed in a graph

15384.02 -> that's not know and it also initializes a few variables including n the number of nodes

15389.27 -> in the graph and the path linked list. Before we go too far. Let's have a look at some of

15394.781 -> the instance variables for this class. We already talked about n the number of nodes

15399.69 -> in the graph. Next we have edge count which we will compute dynamically from the input

15404.771 -> graph followed by

15406.03 -> in

15407.03 -> and out which are integer arrays to track the in and out degree of each node. Then we

15412.601 -> have path which is the oil arian path solution, as well as a reference to the input graph.

15419.591 -> So once you create an instance of this class, there's only one public method and that's

15425.189 -> get or Larian path, which does exactly what it says it will return to you an integer path

15432.18 -> consisting of the nodes you need to traverse to get a valid or Larian path or know if no

15438.181 -> path exists. So there's a few things that get Euler and path does which we'll cover

15443.979 -> step by step. The first thing in the get Euler and path method is the setup method. So let's

15449.59 -> have a look at that first. All this method does is loop through all the edges and increment

15457.72 -> the in and out array degrees, as well as compute the number of edges in the graph, which is

15463.801 -> being tracked by the edge count variable. Back to the get Euler and path method. The

15472.11 -> next thing is to check if the edge count is zero and return null if we don't have any

15478.74 -> edges to work with. Following this I called the graph has Euler and path method which

15484.65 -> verifies that our graph actually has no relation path because most graphs don't. The graph

15492.189 -> has Euler and path method is also fairly simple. What we want to do is make sure that no node

15498.57 -> has too many outgoing edges or too many incoming edges as well as ensure that there's the correct

15504.61 -> amount of Start and End nodes for an oil arian path to exist, the variables start nodes and

15511.81 -> end nodes keep track of how many nodes have either exactly one extra outgoing edge or

15517.52 -> one extra incoming edge for an Euler and path to exist, there has to be at most one start

15523.65 -> and end node. So when we're inside the for loop, we have three conditions. The first

15528.27 -> is to identify if the current node has too many incoming or outgoing edges, which mathematically

15534.71 -> means that the difference between the in and out degree or vice versa is greater than one.

15540.479 -> In this case return false because the path is impossible, there will be no oil arian

15545.36 -> path in such an event. The other conditions we care about are whether the current node

15550.22 -> might be a start node or an end node. And if it is, then we increment the start node

15555.27 -> and node counters respectively. The last step is to actually check that we have the correct

15560.54 -> number of storage nodes and n nodes and return the boolean value. Returning back to the get

15568 -> Euler and path method. The next thing in the algorithm is to actually find the earlier

15573.14 -> and path now that we know what exists. To do this, we find a valid starting node and

15578.69 -> feed that as the first node to the depth first search method. So let's have a look at both

15583.82 -> of those. We don't want to start out or they're in path anywhere as we saw in the first video,

15591.189 -> because this doesn't ensure that we find an Euler and path even though we know one exists,

15596.77 -> the fine start node method does exactly what it sounds Like it looks for a node which is

15601.89 -> a valid starting node, meaning a node with exactly one extra outgoing edge or in the

15607.479 -> case of an oil arian circuit, just any node with an outgoing edge, it's important that

15612.87 -> we start at a node with an outgoing edge because our graph might contain Singleton nodes that

15618.52 -> have no outgoing edges, but another component in the graph might have outgoing edges, which

15624.52 -> is where we really want to start if we are to find an oil arian path.

15630.29 -> Next up is the depth first search method where things get interesting. It turns out the depth

15635.39 -> for search method is really short and could even be shorter but at the expense of readability.

15641.15 -> Remember that when calling this method the first note is the starting node, which is

15645.65 -> the at variable in this method, which if you haven't guessed that yet is the current node

15651.28 -> index we're currently at. In essence, what's happening in this method is that while the

15656.71 -> current node still has unvisited edges, we're going to select the next node to explore and

15662.78 -> call the depth first search method recursively. Each time we take an edge, we decrease the

15668.79 -> number of outgoing edges for that note, which means that eventually there will be no more

15674.85 -> outgoing edges for the current node and the loop will terminate. Once this happens, we

15680.101 -> can add the current node to the front of the solution. The key realization In this method,

15684.9 -> I think, is that you have to notice that the out array is being used as both a way of determining

15692.76 -> if there are any unvisited edges left at the current node as well as an index for reaching

15699.22 -> into his adjacency list to grab the next note to visit. Let's go back up to the get oil

15706.55 -> arian path method. Once we've finished executing the depth first search, the next thing to

15711.97 -> do is ensure that we found an oil arian path, it could be the case that the graph is disconnected

15718.19 -> into multiple components, in which case the correct thing to do is to return null because

15723.26 -> no oil arian path exists. Checking that the graph is disconnected is not something the

15729.18 -> graph has Euler and path method verifies. And this is intentional, because it's easier

15734.39 -> to do after running the depth first search by ensuring that the solution actually has

15739.84 -> a size equal to edge count plus one. The next thing I do before returning the solution,

15747.36 -> which is optional, is simply to empty the contents of the link list into a primitive

15751.78 -> integer array. just for convenience. I do this because it's easier for the caller to

15756.45 -> index an array than it is a linked list. The rest of this file are just helper methods

15764.07 -> for creating a directed graph and adding directed edges to the graph. I also provide two examples,

15771.35 -> one from the previous slides and another that I made up, I encourage you to look them over

15776.729 -> to understand how this program works. Today we're talking about minimum spanning trees.

15782.479 -> And in particular, we're talking about prims algorithm and how it is used to find minimum

15788.74 -> spanning trees. So what is a minimum spanning tree on a weighted graph, a minimum spanning

15796.03 -> tree or just MST for short is a tree, which spans the whole graph connecting all nodes

15803.53 -> together while minimizing the total edge cost. It's important to note that your spanning

15810.03 -> tree cannot contain cycles. Otherwise, it's not a tree. Here's a weighted graph with nodes

15816.85 -> labeled zero through six with various edges of different costs. One possible minimum spanning

15824.15 -> tree is the following edges highlighted in green, whose edge costs some tonight, there's

15830.78 -> no way to connect all the nodes together and get a lower cost then this, note that even

15836.94 -> though the minimum spanning tree in this graph is unique, in general, it's possible for a

15842.94 -> graph to have multiple msts of equal costs. Alright, hopefully you've been paying attention

15850.8 -> because now it's your turn. I'm going to present to you some weighted graphs, and your job

15856.561 -> is to identify any possible minimum spanning tree you can find. Let's begin with this graph.

15863.591 -> Take a moment, pause the video and find any minimum spanning tree you can. So one possible

15873.53 -> minimum spanning tree is the following with a cost of 14. Again, minimum spanning trees

15879.83 -> are not unique. So there could be another valid minimum spanning tree here, but they'll

15885.95 -> all have a cost of 14. Let's do another one. Can you find a minimum spanning tree here?

15892.01 -> I'll give you a moment. Here's one possible answer with the minimum spanning tree. Hi

15900.41 -> In green with a cost of 39. All right, one last graph, I promise. This one is a bit of

15910.58 -> a trick question. Because there is no minimum spanning tree, all the nodes must be connected

15916.27 -> on a single component for a spanning tree to exist.

15922.95 -> Let's change focus and start talking about prims algorithm. prims is one of my favorite

15928.1 -> minimum spanning tree algorithms because of how simple and how intuitive it is. By nature,

15933.93 -> it's a greedy algorithm, which always selects the next best edge and adds it to the minimum

15939.189 -> spanning tree. So it works very well on dense graphs, which have a lot of edges. However,

15945.67 -> a few downsides to prims is that it's not easily parallelizable, or at least not as

15951.66 -> parallelizable as other well known minimum spanning tree algorithms. And it's slightly

15957.22 -> harder but not impossible to find the minimum spanning forest of a graph. There are two

15963.82 -> well known versions of prims I want to discuss. The first is the common the lazy version,

15970.601 -> which runs in big O of E log e. And then there's the improved eager version, which runs in

15976.32 -> big O of E log V, but requires a slightly different data structure. We're going to have

15982.4 -> a look at both, but this video is primarily going to focus on the lazy version. Let's

15987.979 -> start by looking at the lazy version, just because it's slightly easier to implement.

15993.25 -> Here's the general idea, maintain a priority queue that sorts edges based on minimum edge

15998.11 -> cost. This prior queue is used to tell you which node to go to next and what edge which

16005.04 -> is used to get there. Then the algorithm begins and we start on any starting node s and Mark

16012.82 -> s as visited and iterate over all the edges of s and add them to the priority queue. From

16019.939 -> this point on, while the priority queue is not empty, and the minimum spanning tree has

16024.689 -> not been formed, dq the next best edge from the priority queue. If the dq edge is not

16031.65 -> outdated, which it could be if we visit the node that edge points to via another path

16038.9 -> before getting to the edge we just pulled. Then we want to mark the current node as visited

16044.66 -> and add the selected edge to the priority queue. If you selected a stale outdated edge,

16050.12 -> then you can simply pull again, then repeat the process of iterating over the current

16055.25 -> nodes edges, adding them to the part of the queue. And while doing all this, take care

16060.42 -> not to add edges, which already point two visited notes. This will reduce the number

16065.43 -> of outdated edges in the priority queue. Let's have a look at an example. Suppose we have

16071.77 -> this weighted undirected graph, and we want to find any minimum spanning tree. An important

16078.41 -> thing to keep in mind is that while the graph above represents an undirected graph, our

16084.32 -> internal adjacency list representation has each undirected edge stored as two directed

16091.65 -> edges. So the actual internal representation typically looks something like this, which

16097.57 -> is a lot easier to work with. Along with a graph I will also be keeping track of the

16103.96 -> edges currently in the priority queue on the right, I will be representing edges as triplets

16111.11 -> containing the start node of the edge, the end node of the edge and the edge cost. Lastly,

16118.54 -> I will be coloring nodes as either blue for unvisited orange revisiting or gray for visited.

16127.44 -> So let's begin prims on node zero. So iterate over all the outgoing edges and add them to

16135.65 -> the priority. The first edge we're going to add to the priority queue is the edge from

16139.65 -> zero to one with a cost of 10. Then the edge from zero to two with a cost of one, and finally

16147.39 -> the edge from zero to three with a cost of four. Now we look inside our priority queue

16155.54 -> and we pull the next most promising edge and add it to the minimum spanning tree the edge

16161.06 -> from zero to two with a cost of one has the lowest value in the priority queue. So it

16166.91 -> gets added to the minimum spanning tree. This also means that the next node we process is

16172.91 -> node two. So next we iterate through all the edges of node two and add them to

16179.979 -> the priority queue while iterating over the outgoing edges of node to realize that we

16189.56 -> may encounter edges which point to already visited notes. We do not want to add these

16195.29 -> to the party queue because they are of no use the reason we Don't include edges which

16201.45 -> already point to visited nodes is that either they overlap with an edge already part of

16208.04 -> the minimum spanning tree, as is the case with the edge on the slide. Or they would

16212.82 -> introduce a cycle in the minimum spanning tree if included, which is forbidden. So the

16219.53 -> next best edge in the priority queue is the edge from two to three with a cost of two,

16225.47 -> so it gets added to the minimum spanning tree. This also means that the next node we process

16230.689 -> is node three. The same process of adding edges to the priority queue and pulling the

16236.74 -> smallest edge continues until the minimum spanning tree is complete. I'll let the animation

16242.28 -> play until something interesting happens.

16251.24 -> All right, notice that the next best edge we pull from the priority queue is an edge

16277.39 -> which already points to a visiting node node one. This means that the edge is outdated

16285.32 -> and stale, because we found a cheaper path to node one. So we can safely ignore this

16292.67 -> edge and pull again. The next edge is also stale. So let's keep pulling.

16308.66 -> So what happens when we have two edges with the same cost in the priority queue, which

16314.729 -> one gets pulled first, in practice, this doesn't matter. So we can assume that edge 258 gets

16322.58 -> pulled first because it was added first.

16336.1 -> We can now start premise because the minimum spanning tree is complete. We know the minimum

16341.96 -> spanning tree is complete because the number of edges in the tree is one less than the

16347.18 -> number of nodes in the graph. This is precisely the definition of a tree. If we collapse the

16353.36 -> graph back into the undirected edge view, it becomes clear which edges are included

16359.11 -> in the minimum spanning tree. To find the cost of the minimum spanning trees simply

16363.88 -> sum up the cost of all the edges which were selected to be part of the minimum spanning

16369.59 -> tree and this totals to 20. Great, we now understand the gist of the lazy implementation

16377.479 -> of prims. Let's have a look at some pseudocode. Let me first define a few variables that we

16384.439 -> will need. First is n the number of nodes and graph. The variable pq represents the

16392.398 -> priority queue data structure, it stores the edge objects based on minimum edge cost. Again,

16400.479 -> each edge object consists of a start node and node and an edge cost. Next is G which

16408.139 -> represents the graph we're working with. g represents an adjacency list of weighted edges

16415.699 -> in G every undirected edge is represented as two directed edges. As a side note, if

16423.619 -> your graph is extremely dense, meaning it has numerous edges, you should probably prefer

16430.34 -> using an adjacency matrix instead of an adjacency list for efficiency and space gains. And lastly,

16439.639 -> a visited Boolean array of size n, which keeps track of whether node AI has been visited

16446.539 -> or not. So on this slide is the whole algorithm for the lazy implementation of krems. Let's

16455.139 -> go over it one step at a time. The function takes one argument s which is the start node

16462.539 -> index and by default S is set to note zero. That I define a few more variables that will

16469.648 -> need just inside this function. M is a constant representing the number of expected edges

16476.049 -> in the minimum spanning tree. Edge count is the number of edges we currently have included

16481.289 -> in the minimum spanning tree. This variable is to make sure that the tree spans the whole

16487.238 -> graph. MST cast tracks the total cost of the minimum spanning tree and finally MST edges

16496.299 -> is an array which holds edges which we have in Included in the minimum spanning tree.

16503.799 -> The first actual a bit of logic we do is add all the outgoing edges from S to the priority

16510.078 -> queue with the Add edges method. So let's look at this method and see what's going on

16515.85 -> in there. Alright, here we are at the Add edges function, the first thing I do is mark

16521.779 -> the current node as visited. Next, I iterate through all the outgoing edges of the current

16528.16 -> node. And if the destination node is unvisited add the edge to the priority queue. So that's

16535.059 -> all this method does is it goes through all the edges of a node and adds them to the priority

16540.469 -> queue, if appropriate. Once we've added the first set of edges to the priority queue,

16545.77 -> the algorithm really begins and we enter a while loop while the priority queue is not

16550.639 -> empty and the minimum spanning tree is not complete, keep iterating then inside the loop,

16556.801 -> we pull the next best edge out of the priority queue and grab a reference to the destination

16563.109 -> node index. This is a node the edge is pointing at this next line is a very important it's

16570.23 -> the logic that skips adding an edge to the priority queue. If that edge points to an

16575.539 -> already visited node again, edges can become stale or outdated in the priority queue if

16582.26 -> the node they're pointing at becomes visited via another path.

16587.629 -> Next,

16588.809 -> actually add the edge to the minimum spanning tree by adding it to the MST edges array.

16594.229 -> And while adding the edge to the tree also sum over the edge costs. The last thing we

16600.68 -> want to do is call the Add edges method with the new current node. Recall that this will

16607.039 -> add all the outgoing edges pointing to unvisited nodes to the priority queue. And the very

16612.809 -> last thing is we make sure that we have actually found a minimum spanning tree that spans the

16619.16 -> entire graph. And we return the edges along with the MST cost. Today we're talking about

16625.898 -> finding minimum spanning trees with prims algorithm. The lazy implementation of prims

16632.43 -> inserts edges into a priority queue. This results in each pole operation on the priority

16638.969 -> queue to be big O of log e in the eager version, we maintain the idea that instead of adding

16646.209 -> edges to the priority queue, which can later become stale that instead we should track

16652.408 -> node edge key value pairs that can easily be updated and pulled to determine the next

16659.189 -> best edge we should add to the minimum spanning tree. For this ultimate sense, there's a key

16664.85 -> realization that needs to happen. And that is for any MST with directed edges, each node

16671.939 -> is paired with exactly one of its incoming edges. That is except for the start node.

16678.398 -> One way to see this is on a minimum spanning tree with multiple edges leaving a node but

16683.719 -> only ever one edge entering a node. Let's have a closer look at what I mean. Suppose

16689.658 -> we have this undirected graph. The equivalent directed version of this graph looks like

16696.479 -> this. A possible minimum spanning tree starting at node zero might be the following highlighted

16703.459 -> in green. Now notice that on this directed MST, each node is paired with exactly one

16712.84 -> edge except for the starting node. So in a sense, there seems to be a relationship we

16719.129 -> can take advantage of here, which is that each node is paired with exactly one incoming

16726.219 -> edge. In the eager version, we are trying to determine which of a nodes incoming edges

16733.779 -> we should select to include in the MST. The main difference coming from the lazy version

16740.459 -> is that instead of adding edges to a priority queue, as we iterate over the edges of a node,

16747.049 -> we're going to relax that is to update the destination nodes most promising incoming

16754.01 -> edge. So you might be asking yourself the question, how are we going to efficiently

16760.539 -> update and retrieve these node edge pairs? Well, one solution is to use an index priority

16768.209 -> queue, or simply IP queue for short, which can efficiently update and pull key value

16774.289 -> pairs. You can think of an IP queue as the data structure you would get if a hash table

16780.578 -> and a priority queue had a baby together. It supports sorted key value pair updates

16786.95 -> and pull operations in a logarithmic time. Using this new approach would reduce the overall

16793.228 -> time complexity from big O of E like E to big O of E log V, since there can only Li

16800.16 -> b v node edge pairs in the IP queue. If you're interested in learning more about the index

16808.27 -> priority queue data structure and how it's implemented, I would highly recommend my dish

16812.498 -> structures video on the subject. I will link it in the description below if you need to

16817.478 -> catch up, the implementation for the eager version is slightly different and the algorithm

16823.68 -> goes as follows maintain an IP queue of size v that sorts vertex edge pairs v e, based

16832.978 -> on minimum edge cost of he started the algorithm on any node s. Marchesa has visited and relax

16841.068 -> all the edges of S. Relaxing this context refers to updating the entry for node v in

16848.398 -> the IP q from V old edge to V new edge. If the new edge has a better cost than the old

16857.958 -> edge, then while the index priority queue is not empty, and a minimum spanning tree

16864.09 -> has not been formed, in dq the next best vertex edge pair v e from the IP Q, Mark note v as

16873.418 -> visited and add edge e to the MST. Lastly, relax all edges of V while making sure not

16881.378 -> to relax any edge pointing to a node which has already been visited.

16885.748 -> All right, I think

16888.26 -> it's time to see an example. Suppose we have the following weighted undirected graph and

16894.048 -> we want to find any minimum spanning tree. One thing to remember is that while we're

16899.36 -> dealing with an undirected graph, we will be internally representing it as a directed

16904.828 -> graph, where each undirected edge is stored as two directed edges, I will be keeping track

16912.548 -> of all node edge key value pairs on the right and update them accordingly as the algorithm

16919.28 -> executes. So you can think of the red box as the contents of the index priority queue.

16926.85 -> Let's begin the algorithm on node zero, start by iterating over all the edges of zero and

16932.578 -> relax them during the relaxing process. Add a node edge pair to the index priority queue

16939.04 -> if it does not exist yet, otherwise update the value if the new edge has a better cost

16945.458 -> than what already exists. The first node edge pair we add is node two with the incoming

16952.138 -> edge from zero to two with a cost of zero. And similarly for the rest of zeros edges.

16965.25 -> The next best node edge pair based on the minimum edge cost is node two with the incoming

16972.93 -> edge from node zero. Now iterate through all the edges of node two and relax all edges

16980.62 -> character ignore edges pointing to already visited nodes like the one on this slide.

16986.84 -> The Edge 256 has a better cost going to node five than the edge from node zero to node

16994.968 -> five with a cost of seven. So update the index party queue with this new edge I will denote

17001.058 -> IP q updates with a purple box around the edge being updated. The next best node edge

17008.6 -> pair is no three with the edge coming from node zero with a cost of five. Now iterate

17015.36 -> through all the edges of node three and relax all edges. The Edge coming from node three

17023.69 -> offers a better value. So I update the value for node one in the index party queue with

17029.51 -> the new edge. Add a new key value pair entry for node six since node six has not yet inside

17038.29 -> the index priority queue. Update the value for node five with the new better edge we

17045.568 -> just found. And from this point on, I will let the animation play please try and follow

17051.748 -> along.

17062.878 -> All right, and that's the algorithm you can see that the minimum spanning tree we found

17089.02 -> consists of the edges highlighted in green. If we collapse the graph back into its undirected

17096.98 -> edge view it becomes clear which edges are included. In the minimum spanning tree, you

17103.02 -> can also get the MST cost by adding the values of all the edges in the spanning tree for

17109.628 -> a cost of nine. Let's have a look at some pseudocode. For the eager implementation of

17115.068 -> prims. You'll notice that it's almost identical to the lazy version except for a few key details

17121.45 -> which I will highlight. First is n, which is still the number of nodes in the graph,

17127.1 -> followed by the variable IP Q, which represents the index party Q. Instead of a traditional

17133.85 -> priority queue which stores node index edge object pairs edge objects are still represented

17140.878 -> as start node and node edge cost triplets with the node index being an integer. G is

17149.988 -> once again our graph adjacency list of weighted edges. Remember that in je every undirected

17157.818 -> edge is represented as two directed edges. There's also the whole story about whether

17163.388 -> we should be using an adjacency list or using an adjacency matrix to represent our graph

17169.158 -> when running prims. Because we know that this can greatly impact performance. I was curious

17175.568 -> and did some analysis comparing the adjacency list versus the adjacency matrix. And the

17180.798 -> results I got were interesting. this dotted line graph shows the performance of using

17186.85 -> an adjacency list in blue versus an adjacency matrix in green, and the x axis represents

17193.628 -> the graph edge density percentage, and the y axis indicates performance measured in milliseconds.

17201.54 -> As you can see, for graphs with fewer edges, the adjacency list outperforms the adjacency

17207.738 -> matrix. But as the edge density increases, the adjacency matrix becomes the obvious choice.

17215.718 -> You may be wondering why the adjacency matrix, his performance starts to increase after the

17221.31 -> middle point where the graph starts to become more and more dense. This is an excellent

17227.44 -> question. And my guess is that the denser the graph, the fewer relaxation operations

17233.718 -> need to be performed, which is an expensive part of prims algorithm. Since the time to

17239.668 -> iterate over all the edges of a node is constant, but fewer relaxation operations are needed,

17245.82 -> performance should increase as a result, but I may be wrong. Even still, the results are

17251.648 -> interesting. And the takeaway is that the graph representation you choose can greatly

17256.77 -> impact the performance of your algorithm depending on whether your graph is sparse

17262.09 -> or dense. All right, back to the pseudocode. The last variable is the visited Boolean array

17269.138 -> of size n, which tracks whether node AI has been visited or not. Now let's have a look

17275.1 -> at the actual algorithm for eager prims. In the first block, I define a few more variables

17281.638 -> that will need m the number of expected edges in the MST edge count the number of edges

17288.208 -> we currently have included in the MST, this variable is used to make sure the tree spans

17294.51 -> the whole graph, then is MST cost which tracks the total cost of our minimum spanning tree.

17301.7 -> And finally, MST edges, which is an array that holds the edges we have included in the

17308.728 -> MST. After this, I call the relaxed edges at node method passing in the start node as

17316.638 -> an argument. Let's have a look at the relax edges that node method to understand what's

17321.578 -> happening in there. Alright, here we are, you'll notice that this method takes a single

17327.54 -> argument which is the current node we care about. The first thing we do is mark the current

17333.558 -> node as visited so we don't visit again in the future. Then I reach into our graph adjacency

17340.43 -> list and get all the edges going outwards from the current node. As we enter the loop

17346.17 -> and start iterating over all the outgoing edges. The first thing I do inside the loop

17350.44 -> is grab a reference to the destination node index. This is the node the edges pointing

17355.79 -> at next skip edges which point at already visited nodes. Now here's the bit where we

17363.128 -> actually relax the edge first check if the IP q contains the key with the value of the

17370.378 -> destination node. If it doesn't, then add the edge to the IP queue for the first time.

17377.92 -> Otherwise try and improve the cheapest edge at desk node index with the current edge in

17384.958 -> the priority queue back inside the main method. Next up keep looping while the IP queue is

17391.238 -> not empty. And we have not yet completed the MST after extract the next best node index,

17398.54 -> edge object pair From the IP queue based on minimum edge cost, include the selected edge

17406.328 -> as part of the MST and some over the edge costs. Lastly, relax all edges of the current

17413.7 -> node and repeat until the loop breaks outside the main loop check if we have successfully

17419.638 -> created a spanning tree. This might not be the case if some of the nodes are unreachable.

17425.418 -> But assuming that that is not the case, return the MST cost and the edges which make up the

17432.07 -> spanning tree. And that concludes the pseudocode for prims algorithm. All right, here we are

17438.738 -> on the source code for prims implemented in Java. At the top here, I posted some instructions

17444.908 -> on how to download and run the script in case you want to play around with it a little bit.

17451.34 -> Let's begin by taking a look at the main method right over here. The first thing I do is set

17458.28 -> up a graph we want to find the minimum spanning tree of In fact, it's the same graph we had

17464.69 -> in the slides in the previous video. To create the graph I call the helper method create

17471.01 -> empty graph and initialize an adjacency list of size and and afterwards add various undirected

17478.578 -> edges of different weights to the graph. Once the graph is set up, I create a minimum spanning

17484.228 -> tree solver and pass in the graph we just created. The solver is able to tell us whether

17489.898 -> a minimum spanning tree exists, what the cost of the MST is, as well as get all the edges,

17497.79 -> which make up the MST. The output of running the script is illustrated below right here,

17505.52 -> you can see that this particular minimum spanning tree has a class of nine and it has these

17512.34 -> six edges. If you were curious as to how the adjacency list gets initialized and how I

17521.328 -> add this to the graph, here's the code that does exactly that. Next up is a class struct

17532.44 -> which represents a directed edge used in the graph. One important thing to note about this

17538.458 -> class is that it implements the comparable interface and overrides the Compare to method.

17544.638 -> This simply means that edges are able to be sorted in reference to one another. Based

17549.398 -> on the minimum edge cost. This is important for the index priority queue because it needs

17555.12 -> to know how to compare edge objects with one another to sort them.

17562.988 -> After the edge class is the minimum spanning tree solver, where all the interesting logic

17568.1 -> happens. In this class, I store a whole bunch of variables to help us out. The first two

17574.718 -> inputs are n the number of nodes in the graph, which I get from the constructor, and the

17581.658 -> graph adjacency list itself. Internally I store a Boolean solve variable to track whether

17587.968 -> we have already computed the minimum spanning tree so that we don't need to do it again.

17594.09 -> Once we've already solved the problem, the MST exists variable tells you whether a minimum

17600.78 -> spanning tree was found in the input graph. It's important to note that by default, this

17606.59 -> value is false. The Boolean visited array is used to keep track of whether node AI has

17612.84 -> been visited or not. And lastly is the variable IP queue, which is short for indexed priority

17620.69 -> queue which is a data structure I have defined below. The outputs to this class include the

17627.328 -> minimum spanning tree costs and edges which make up the minimum spanning tree if one exists.

17638.77 -> After the constructor initialization there are two important methods to know about there

17643.908 -> is the get MST method for retrieving the MST edges and get MST cost which gets the spanning

17652.238 -> tree cost both of these methods work in the same manner they both call the solve method

17657.61 -> and then check whether the minimum spanning tree exists and returns a value or no therefore

17663.76 -> the real method we care about is the solve method. So let's have a look at that.

17675.09 -> The solve method is only ever executed once because we mark the solve the boolean value

17681.37 -> as true the first time solve is called and the other times the method returns early.

17687.61 -> The first thing I do in the solve method is initialize some more variables and allocate

17692.818 -> some memory for the arrays we will be using. M is the expected number of edges in a minimum

17699.26 -> spanning tree And edge count is the number of edges we have currently included in the

17706.18 -> minimum spanning tree so far. Next I initialize an index the priority queue of size. And this

17712.9 -> particular index barbecue is implemented using a dare heap. So we need to provide a node

17719.29 -> degree for the underlying supporting heap structure, I arbitrarily choose the base two

17725.818 -> logarithm of the number of nodes, which actually seems to give a pretty good performance, although

17731.52 -> Typically, this is an implementation detail that you do not need to worry about. The first

17737.158 -> actual bit of logic we're going to do is call relax edges at node four, node zero, this

17744.19 -> adds the initial set of edges to the priority queue. Let's scroll down and take a closer

17749.628 -> look at that method, which is right here. The first thing we do is marked the current

17755.25 -> note as visited so that we don't visit it again in the future. Then I reach into the

17759.828 -> adjacency list and get all the outgoing edges from the current node. As we enter the loop

17765.548 -> and start iterating over all the outgoing edges. The first thing I do inside the loop

17770.468 -> is grab a reference to the destination node index, this is the node that edge is pointing

17776.148 -> at next skip edges which point to already visited nodes because we know that we don't

17782.54 -> want to process those. Now here's the bit where we actually relax the edge. First check

17789.458 -> if the index priority queue contains the key with the value of the destination node. If

17794.82 -> it doesn't, then add the edge to the index priority queue for the first time. Otherwise,

17799.738 -> try and improve the cheapest edge at the destination node index with the current edge in the priority

17805.03 -> queue by calling the decrease function. So that's all for the relax edges at node method.

17812.568 -> Let's scroll back up to the main implementation right here. So after we add the initial set

17819.6 -> of edges to the index priority queue, we enter a while loop and loop while the index party

17825.95 -> queue is not empty and a minimum set burning tree has not been formed inside the loop pull

17831.49 -> out the next best node index edge pair. The destination node can also be found by checking

17838.148 -> which node the directed edge we just pulled out of the queue is pointing at after that

17844.558 -> add the pulled edge to the minimum spanning tree by placing it inside the MST edges array

17850.93 -> and sum over the edge costs. Finally, relax all the edges of the new current node. This

17857.86 -> process continues and we keep pulling the next best edge and slowly start building our

17864.03 -> minimum spanning tree until eventually the loop breaks. For last thing we need to do

17869.318 -> is set the MST exists variable to check if we have actually found a minimum spanning

17875.878 -> tree. If the edge count is equal to m, then we have successfully computed a minimum spanning

17881.87 -> tree Otherwise, the graph is disconnected in some way and no spanning tree exists. So

17887.968 -> that's all for the eager implementation of prims. The only piece of the puzzle that might

17893.318 -> still be unclear is how the index party queue implementation works.

17903.488 -> Here's the index priority queue implementation. However, this data structure merits a video

17908.77 -> on its own. Today we're going to start tackling the field of network flow by understanding

17915.11 -> what max flow is and in particular how we can use the Ford Fulkerson method. To find

17921.45 -> it, finding the maximum flow begins with having what's called a flow graph. This is a graph

17928.62 -> where edges have a certain maximum capacity which cannot be exceeded. edges also have

17936.27 -> a flow value, which is how many units of flow are passing through that edge. Initially,

17943.828 -> the flow is zero for all edges everywhere until we run a max flow algorithm on it. There

17950.201 -> are also two special types of nodes in the flow graph the source node and the sync node,

17956.478 -> usually denoted as s and t respectively. The maximum flow problem asks with an infinite

17963.878 -> input source, how much flow can we push through the network without exceeding the capacity

17970.458 -> of any edge and it's not at all obvious how one should figure that out. maximum flow can

17976.92 -> be used in numerous situations where edges and nodes can represent any number of things.

17984.328 -> For instance, suppose the edges are roads, cars or pipes of water, wires with electric

17990.568 -> current, and so on. Each of those has a certain capacity value we can associate with the maximum

17997.218 -> flow on the other hand would represent the volume have water that can flow through the

18001.75 -> pipe. So the number of cars the roads can sustain and traffic or the net electric current

18008.248 -> that your system can sustain. Effectively, the maximum flow is a bottleneck value for

18015.11 -> the amount of traffic your network can handle. And that is going from the source to the sink.

18021.86 -> Under all those constraints. The maximum flow for this particular network is seven. And

18027.29 -> you can tell because after running the maximum flow algorithm, the sum of the flows attached

18033.998 -> to the sink node is seven. Running a maximum flow algorithm is used to determine how much

18039.79 -> flow each edge should receive to achieve the overall maximum flow. Note that there might

18046.658 -> be multiple ways of achieving the maximum flow by giving each edge different flow values,

18054.648 -> but overall solutions will have the same maximum flow value. Let's dig deeper into how to find

18061.28 -> the maximum flow. To begin with, you will need a flow graph which consists of directed

18067.068 -> edges, which are also called arcs. Each directed edge has a certain capacity which can receive

18073.53 -> a certain amount of flow at all times the flow running through an edge must be less

18079.14 -> than or equal to the capacity. This intuitively makes sense. Because if we allow more flow

18084.808 -> than what the capacity permits, it means something has to go wrong. When an edge becomes overcapacity

18092.328 -> in some manner, in means that we've pushed the system past its limit. In the context

18097.718 -> of edges representing pipes with water it means your pipe broke or it leaked. If your

18102.898 -> edges a wire with electric current, it means your wire literally fried or melted exploded

18108.82 -> or something bad happened to it because there was too much electric current. This is not

18113.298 -> good. So this is why we don't allow more flow than capacity. each edge in the flow graph

18118.79 -> has a certain flow and capacity specified by the two values separated by a slash adjacent

18126.37 -> to each edge. Originally, the flow through each edge is zero and the capacity is a non

18131.899 -> negative value to find the maximum flow and also the min cut as a byproduct. The Ford

18138.48 -> Fulkerson method repeatedly finds augmenting paths through the residual graph and augments

18145.988 -> the flow until no more augmenting paths can be found. So you're probably asking yourself

18152.02 -> at this moment, what is an augmenting path? What the heck is a residual graph? And what

18157.2 -> do you mean by augment the flow? All right, let me explain. We'll do them one by one.

18162.79 -> an augmenting path is a path of edges in the residual graph with capacity greater than

18168.398 -> zero from the source s to the sink t in orange. Here I have highlighted a possible augmenting

18175.718 -> path. The key thing to remember about an augmenting path is that it can only flow through edges

18181.988 -> which aren't fully saturated yet. In fact, you know you've achieved the maximum flow

18187.738 -> when there are no more augmenting paths left to be found. How to actually find an augmenting

18193.738 -> path is a detail left unspecified by the Ford Fulkerson method for flexibility. For now

18199.708 -> let's assume that we're using a depth first search. Something else to know is that every

18205.238 -> augmenting path will have what I call a bottleneck value, which is the smallest edge along the

18211.828 -> path,

18212.828 -> you can find the value of the bottleneck by taking the difference between the capacity

18216.898 -> and the current flow of an edge. For this augmenting path, the bottleneck value is six,

18222.86 -> we can use the bottleneck value to argument the flow along the path. augmenting the flow

18229.668 -> simply means to update the flow values of the edges along the augmenting path. Here

18235.229 -> you can see that I've increased the flow of each edge along the augmenting path by exactly

18241.238 -> six units. However, we're not done augmenting the flow, we not only need to increase the

18247.11 -> flow along the forward edges, but also decrease the flow along the backwards edges, which

18253.068 -> are called residual edges, the residual edges or the dotted edges going backwards in the

18258.86 -> reverse order of the augmenting path. The logic behind having residual edges is to undo

18266.498 -> bad choices of augmenting paths which do not lead to a maximum flow effectively, we don't

18273.32 -> know which are the best or even correct augmenting paths to take. So this mechanism enables us

18280.03 -> to freely find any augmenting paths without having to worry about whether or not we'll

18285.048 -> be able to achieve the maximum flow. It should be mentioned that residual edges become valid

18291.728 -> edges to take when finding an augmenting path in later iterations. So if we take a step

18298.28 -> back, you can think of every edge in the original graph as having a residual edge with a flow

18305.668 -> and capacity of zero, which is not usually shown now that we know what residual edges

18311.628 -> are. The term residual graph simply means the graph which also contains residual edges,

18318.908 -> not just the original edges given and flow graph. So generally speaking, when I mentioned

18325.798 -> the flow graph, I usually mean the residual graph. So, here's a good question you might

18331.18 -> have at this point, the residual edges shown have a capacity of zero, aren't those forbidden?

18337.66 -> How does that work? So here's the thing. With this method of augmenting the flow, you have

18343.409 -> to think of the remaining capacity of an edge IE residual or not as the difference between

18351.04 -> the capacity and the flow of that edge. That is the difference between the capacity and

18357.19 -> the flow is the true remaining capacity for that edge. This ensures that the remaining

18363.328 -> capacity of an edge is always non negative, even if the flow can be negative. For example,

18368.878 -> in the residual edges we have right now, zero minus minus six is six, so we know that all

18375.36 -> our residual edges actually have a remaining capacity of six. So the algorithm proceeds

18381.45 -> and the Ford Fulkerson method continues to repeatedly find augmenting path after augmenting

18387.798 -> path and to augment the flow until no more augmenting paths from s to t can be found

18393.578 -> the QE ideation to make at this point is that the some of the bottleneck values that we

18398.87 -> acquire with each augmenting paths will result in the maximum flow. And that's the whole

18405.03 -> premise of this algorithm. It doesn't matter so much how to find augmenting paths. But

18411.12 -> so long as you keep solving the bottleneck values which they produce, you'll find the

18416.808 -> maximum flow. So let's keep finding augmenting paths. Remember that we can only select edges

18423.7 -> whose remaining capacity is greater than zero to be part of the augmenting path. So the

18430.158 -> bottleneck for this augmenting path is four since four is the minimum of all the remaining

18437.18 -> capacities along this augmenting path. Here's another augmenting path from the source to

18442.238 -> the sink, you'll notice that we're actually using one of the residual edges we created

18447.85 -> earlier in this path. You'll also notice that there are two purple edges in this slide.

18452.988 -> This is just a coincidence, since both of those edges have the same bottleneck value

18457.998 -> of six, then we argument the flow as we do.

18462.09 -> I'll let the animation play for this next one. And at the end, we can see that if we

18475.9 -> sum all our bottleneck values 646 and four, we're able to achieve the maximum flow which

18483.638 -> is 20. In terms of the time complexity, the Ford Fulkerson method derives its complexity

18490.19 -> from how we actually find those augmenting paths, which as we know is left as an unspecified

18496.2 -> detail. If you assume that finding augmenting paths are found by doing a depth first search,

18503.37 -> then the algorithm runs in a time complexity of a big O of F being the maximum flow times

18510.29 -> IE the number of edges in the graph. Here's a graph where we can derive the time complexity.

18517.36 -> Suppose that the side edges have very high capacity values of 100. And the middle edge

18524.398 -> has a capacity of one, you can clearly tell that the maximum flow should be 200. Because

18530.638 -> you can run two augmenting paths with the flow values of 100 on the top and the bottom

18538.09 -> of the graph from the source to the sink. However, recall that a depth for search traversal

18544.29 -> is essentially random. So it's possible for you to pick that middle edge with a capacity

18550.398 -> of one every single time. And what that'll do is it'll limit flow, you can push from

18556.64 -> the source the sink to be one, so one is always going to be your bottleneck value, so you're

18561.738 -> never going to be able to argument the flow by more than one unit. This results in flipping

18569.09 -> back and forth between the same two alternating paths for 200 iterations, which really kills

18577.138 -> your time complexity. Luckily, much faster algorithms and better heuristics exist to

18583.238 -> find the maximum flow value. One example is Edmonds Karp, which is Ford Fulkerson. But

18589.77 -> instead of using a depth first search, use a breadth first search to find the shortest

18594.648 -> augmenting path from the source to the sink in every iteration. There's also capacity

18600.068 -> scaling, which is the idea of picking larger paths. First to reduce the number of paths

18606.298 -> you need to find overall. And this turns out to work really well, at least from my empirical

18612.478 -> tests. Then there's dynex, which uses a combination of a breadth first search to first find a

18618.408 -> layered graph that guides edges towards the sink, which you then use a depth first search

18624.5 -> to actually find the augmenting paths. There's also this idea of push relabel algorithms

18630.398 -> which were differently than the algorithms we've discussed here, which try and find augmenting

18635.898 -> paths instead, push reliable algorithms maintain this concept of a pre flow if you will, to

18641.878 -> find the maximum flow of a network. Please be mindful that the time complexities posted

18647.238 -> here are very pessimistic and practice running maximum flow if any of these operates much

18654.12 -> faster. So it's very hard to compare the performance of two flow algorithms solely based on the

18660.15 -> complexity today, we're taking a look at the source code for the Ford Fulkerson method

18665.728 -> implemented with a depth first search. The goal of this video is to show you how to set

18672.148 -> up the following flow graph and find the maximum flow through it. So after we run the maximum

18678.68 -> flow algorithm, we should get a graph similar to this one with flow running through some

18685.468 -> but not all of the edges and achieving the maximum flow of 23. The source code and the

18691.78 -> example I have lined up for you today can both be found on GitHub. There's a link in

18696.238 -> the description for today, I encourage you to check that out and also play along as we're

18702.34 -> going over the source code. All right, here we are in the source code written in Java.

18706.94 -> This program has three main supporting classes, an edge class, a network flow solver base,

18715.11 -> and the Ford Focus in depth first search solver. However, before we get into any of those,

18720.148 -> I want to take a look at the main method where I actually use the classes above to solve

18726.298 -> the flow problem we just saw. I know a lot of people struggle setting up the flow graph,

18732.51 -> which is usually somewhat of a mystery. So I want to clear that up. The first thing I

18737.44 -> recommend you do every time you set up a flow problem is initialize three variables, and

18743.53 -> the number of nodes in your graph that is including the source and the sink nodes. And

18748.458 -> then what I recommend you do is you actually label the source and the sink nodes and assign

18753.84 -> them indices. And what I usually end up doing is I say, the source node equals index n minus

18761.1 -> one and the sink equals and minus two, the rest of the nodes in your graph should then

18767.56 -> have indices between zero and n minus three inclusive, I've always found this to be the

18774.248 -> easiest way to set up your flow graph. Next, I create the flow solver by providing the

18779.168 -> three variables n, s and t as inputs to the solver so it knows how many nodes there are

18785.908 -> and which nodes are labeled the source on the sink. Then I use the solver to actually

18791.218 -> create the flow graph by adding edges with different capacities. The next step is to

18796.79 -> hook up the edges to the source, those would be the ones shown in this picture. Then I

18804.558 -> carefully hook up all the middle edges.

18809.388 -> And lastly, the edges leading into the sink. It's usually always these three steps. And

18818.878 -> for most of the time, your graph is bipartite. So the middle edges are even simpler to set

18824.64 -> up. After this I call the get max flow method on the solver which actually runs the Ford

18829.69 -> Fulkerson max flow depth first search and returns an integer value for this graph, we're

18835.31 -> expecting a maximum of 23 followed by printing the max flow, I also display all the interesting

18842.398 -> edges of the residual graph. First, I get the residual graph from the solver after executing

18848.52 -> the max flow and iterate over all the edges and just display the flow on each edge. Let's

18853.828 -> actually run this program and see what the output looks like. So I just popped open a

18859.24 -> terminal. And for those of you who also have a terminal open and want to play along first,

18863.86 -> you can just clone the GitHub repo by typing git clone followed by the repo URL, which

18870.388 -> is github.com slash William fiza slash algorithms. You see that I've already cloned the repo

18877.67 -> so I don't need to do it again. Then just change directory into the algorithms folder.

18885.388 -> So the file we're working with is called the Ford Fulkerson example, dot java file. And

18891.068 -> it's in the graph theory network flow examples package. And luckily for us, it doesn't have

18896.04 -> any dependencies yet. So we can just compile it on its own with shafaq so if you type Java

18902.058 -> c followed by comm, Wm is the algorithms graph doing network flow examples. And then you

18908.319 -> find that file Ford Focus, in example, that Java, you compile it, it will produce a dot

18914.01 -> class file in that directory. So you can execute it by typing Java, and then the name of the

18921.03 -> class and then pressing Enter and then you get this beautiful output. So this prints

18925.59 -> a lot of interesting information. Notably, it prints the max flow of 23, and all of the

18932.158 -> edges plus four columns. The first column represents the start and end nodes of the

18938.468 -> directed edge, then the amount of flow running through the edge, the capacity of the edge.

18943.59 -> And lastly, a boolean value indicating whether the edge is a residual edge or not, which

18949.49 -> is quite handy for debugging. So let's go back to the code. So let's scroll back up

18954.668 -> the code and take a look at the first of the three classes which is the edge class the

18960.7 -> edge class is composed of a few instance variables in particular, every edge has a start node

18967.9 -> called from and an end node called to each edge in the flow graph has a certain amount

18975.69 -> of flow and capacity, the capacity of the edge is constant and does not change the flow

18982.068 -> is dynamic and adjusts. As we argument the flow when you create a new edge, it should

18988.229 -> have a start and end node plus an initial capacity, the flow defaults to zero, you might

18993.898 -> notice that the residual edge instance variable does not get initialized here or through the

19000.6 -> constructor. The reason is that I initialize the residual edge together with the forward

19006.318 -> edge and hook them up together in a helper method, which we'll see later. The next method

19010.898 -> is the is residual method, which determines whether an edge is a residual edge or not,

19018.238 -> because forward edges are not permitted to have a capacity of zero, you know an edge

19023.128 -> is residual if the capacity is zero

19025.808 -> pretty easy.

19026.808 -> There is also the remaining capacity method which can be used to determine the maximum

19033.03 -> amount of flow that we can push through this edge. This method works whether the flow is

19038.398 -> positive or negative. Next is the augment method which augments the flow for this edge

19045.068 -> alone. All it does is it increases the flow on the forward edge by the bottleneck value

19051.35 -> we found along the augmenting path and it also decreases the flow along the residual

19056.958 -> edge. Last is the to string method which is responsible for displaying those nice columns

19064.798 -> we saw in the terminal. The next class we're going to take a look at is the network flow

19070.95 -> solver base. This class is a generic base for max flow solvers, which all solvers should

19077.42 -> extend to gain access to reuse variables and setup methods and so on. For example, a simple

19083.62 -> task like adding an edge to a flow graph should be the same whether the max flow algorithm

19090.158 -> is Edmonds Karp dynetics, some capacity scaling algorithm, it shouldn't matter. Therefore,

19096.26 -> it makes sense to abstract that behavior and capture it in a base class. So there are many

19102.708 -> variables in this class. The first one is in short for infinity, which is just a handy

19108.648 -> large constant that doesn't overflow. If you add numbers to it, or at least they can handle

19115.36 -> having large numbers added to it, then there are the three input variables and the number

19120.93 -> of nodes in the graph is the index of the source node and T the index of the sync followed

19127.208 -> by this are two special variables I usually end up using because they greatly help boost

19134.11 -> performance. So the rationale behind using the visited token in combination with an integer

19140.468 -> array that tracks the visited state of a node is that when we are finding augmenting paths,

19148.1 -> whether via depth first search or breadth first search or whatever graph traversal method

19153.2 -> you want to use, you generally want to ensure that your augmenting path doesn't visit the

19159.138 -> same node twice. Otherwise, that could result in a cycle which we don't want. The way to

19164.1 -> check if node AI is visited is to check if the state in the visited array at index is

19172.878 -> equal to the visited token. This is super handy, because in the next iteration, when

19178.978 -> we yet again want to find another augmenting path, we can simply reset all the visited

19185.548 -> states of every node simultaneously by simply incrementing. The visitor token I know it's

19191.558 -> kind of hacky, but it's super efficient and really handy to have. The alternative is actually

19197.29 -> to maintain a Boolean visitor array and you fill that with false values every time just

19203.818 -> before you find an automatic path. That's not great because it requires an additional

19209.208 -> order and work every time you want to find an augmenting path. Next is a boolean variable

19215.048 -> called solved, which indicates whether or not we have actually run the network flow

19220.27 -> solver, the solver only needs to run once, because that always yields the same result.

19227.37 -> So for example, if the user calls the get max flow method multiple times the solver

19234.738 -> only needs to run once. The next value that we have right here is the max flow variable,

19243.238 -> which is the value we're actually trying to calculate. And finally is the adjacency list

19249.27 -> representing the flow graph itself. Looking at the constructor, we require the user to

19254.568 -> specify the number of nodes along with the index of the source and the sink nodes. Then

19260.728 -> inside this method, I also take the opportunity to initialize the flow graph. And as well

19267.408 -> allocate some memory for the visited array we'll be making use of later in the initialize

19274.318 -> empty flow graph method I do is initialize an empty array list of edges for each node

19281.978 -> index so that we don't get a nullpointerexception. When we try and add an edge to the graph.

19286.838 -> Talking about adding edges to the graph, let's have a look at the Add edge method. Here,

19292.87 -> we need to provide the start node and the end node of the directed edge and also provide

19299.398 -> a positive capacity for that edge. If the capacity is negative or zero, we throw an

19307.37 -> exception because that is an illegal argument, then what we do is that we actually create

19312.45 -> the forward edge and the residual edge, you'll notice that the residual edge actually has

19317.728 -> a capacity of zero, then what we do is we make the forward edges residual edge, the

19323.62 -> residual edge and the residual edges

19325.798 -> residual edge, the forward edge. And finally we add them both to the flow graph. So in

19330.76 -> effect, each edge is each other's inverse. And this is exactly what we want, we want

19336.638 -> a pointer that we can simply access when we need to access and edges residual edge. The

19343.958 -> remaining methods here are simply client facing methods, they can be used to get the residual

19350.838 -> graph after the solver has been executed. And to obtain the maximum flow of the graph.

19357.54 -> You'll notice that there's also this one special method down here, which is the solve method.

19363.54 -> And this is the method that the subclass needs to override. This is the method which actually

19370.1 -> solves the network flow problem and actually pushes the flow through the network. And you

19376.12 -> can see that every time the client goes and calls on these methods like get graph or get

19382.29 -> max flow, he calls the execute method, and the execute method will run the solver. So

19388.818 -> we will call this method if it hasn't been executed already. So it's got that smart logic

19394.11 -> built in. Now let's take a look at the Ford Focus in depth first, or solver which you

19399.12 -> can see actually extends the network flow based solver so we know it actually implements

19403.86 -> the solve method that we need for the get Maxwell method. Awesome. Let's let's have

19409.308 -> a look at this. So the first thing you'll notice is that this method also takes the

19414.218 -> inputs and s&t and all we do here is we call the superclass constructor in the network

19420.61 -> flow based solver which does all that nice initialization that we know about. Next is

19425.94 -> the most important method, which is that solve method I was talking to you about. And you

19430.308 -> can see that I actually overrides the method in the superclass. So in this method, you

19435.068 -> can see that I'm repeatedly calling the depth first search method returns as output the

19441.208 -> bottleneck value found along the augmenting path, I store that value as F and increase

19447.7 -> the max flow by F in each iteration, because we know that the sum of the bottleneck values

19453.888 -> equals the max flow, we do this until the bottleneck value is zero, at which point we

19459.2 -> know that no more augmenting paths exist and the algorithm can terminate in between finding

19465.318 -> each augmenting path you can see that increment the visited token This is used to make the

19472.1 -> state of every node unvisited. The depth first search method itself takes two arguments,

19478.748 -> the node ID and the flow. Initially, the starting node is passed in as the node index and the

19486.738 -> flow is set to be infinity. As we progress through the flow graph, the flow value eventually

19492.498 -> becomes the bottleneck value as we find smaller and smaller edges with more restricting capacities

19498.498 -> and we stopped the Alex Once the node index equals the sink, so that's actually our base

19504.84 -> case right here. Afterwards since we know that the current node is not the sink, what

19510.86 -> we do is we explore it by marking the current node as visited. We do this by assigning the

19517.068 -> current index or the current node index to be equal to the visited token, then comes

19522.76 -> the interesting part. First, we get all the outgoing edges of this node residual otherwise,

19528.798 -> and then loop over them. If the remaining capacity is greater than zero, meaning we

19533.968 -> can push flow through that edge and the next know that we're going to is unvisited meaning

19540.578 -> we don't risk creating a cycle, then we can enter this inner if block right here, inside

19547.77 -> the if block. The first thing I do is call the depth first search method recursively.

19552.218 -> What I do is I pass in the index of the next node we want to go to and the new flow value

19559.568 -> which should equal the minimum of the current flow or the current edges remaining capacity.

19566.66 -> Remember that that flow parameter is trying to capture the bottleneck value that intuitively

19573.19 -> makes sense. It's saying either keep the previously found bottleneck value or if this new edge

19579.628 -> is even smaller than it should be the new bottleneck value, this process continues recursively

19586.26 -> until a base case is hit and the sink was reached. This returns the bottleneck of the

19591.149 -> augmenting path, we can then use that value to augment the flow of our augmenting path.

19597.95 -> However, first check that the bottleneck value is greater than zero, it could be the case

19603.51 -> that we never actually made it to the sink, and we hit a dead end. assuming that's not

19609.2 -> the case, simply argument the flow by increasing the flow in the forward edge by the bottleneck

19615.998 -> value and decreasing the flow in the residual

19618.818 -> edge by the bottleneck value. After that, simply return the bottleneck value. This propagates

19625.158 -> it up the stack so that all the other edges along the augmenting path can also be augmented.

19631.888 -> This also ensures that the bottleneck value is returned to the solve method where the

19636.378 -> max flow is actually calculated. So that's

19639.638 -> about everything I want to cover for the Ford Fulkerson method implemented with a depth

19644.418 -> for search. Today we're going to start diving a little deeper into network flow, we're going

19649.59 -> to talk about unweighted bipartite graph matching, and specifically how we can use max flow to

19657.6 -> find a matching for us. Before we get started, though, I should mention what a bipartite

19663.32 -> graph is, a bipartite graph is one whose vertices can be split into two independent groups,

19671.468 -> u and v, such that every edge connects between u and v. Other definitions exists, such as

19680.03 -> the graph is too colorable, or there is a cycle with an odd length. bipartite graphs

19686.43 -> often arise when we're trying to match one group of items to another in some way. Think

19692.139 -> of situations such as matching suitable candidates to jobs. There could be multiple jobs are

19698.43 -> multiple candidates, but not every candidate is suitable for each job. If jobs are red

19705.33 -> nodes and candidates are white nodes, then there would be an edge between the two if

19710.168 -> the candidate is good fit. Another situation could be matching surfers to surfboards. Suppose

19717 -> there are multiple servers and multiple surfboards. But the surfers have preferences and requirements

19723.148 -> for the boards, such as color, size and so on. Then the same thing happens we placed

19729.068 -> an edge between the surfer and the surfboard to indicate that they are able to be matched.

19735.54 -> Generally when we're setting up a bipartite graph, we're interested in what's called a

19741.61 -> maximum cardinality bipartite. Matching. This is when we've maximized the pairs that can

19747.728 -> be matched with each other. For example, we've maximize the number of candidates that can

19753.668 -> be matched to jobs or the number of servers to surfboards. Finding a matching is not unique

19760.478 -> to bipartite graphs. However, you can also have a matching on a non bipartite graph,

19766.968 -> this variant is a lot harder to solve and also much less common. Another variant is

19772.498 -> finding a maximum matching on a weighted graph where you can either maximize or minimize

19780.04 -> the cost of the matching. This variant is also much harder to solve than the unweighted

19785.998 -> version in the unweighted version, no edge is better in any sense than any other edge.

19792.25 -> So it makes finding a matching much much easier. We're mostly going to focus on the top left

19798.878 -> box which is the easiest of the four variants, but hopefully it will get poke around in some

19804.048 -> of the other boxes as well. So if you want to find a maximum matching on an unweighted

19809.86 -> bipartite graph, you have lots of options, you can either set the graph as a flow problem

19815.838 -> and push flow through it, which is what we'll look at in this video. But you can also repeatedly

19821.54 -> find augmenting paths which maximize the matching using a depth first search. Or you can use

19827.28 -> the specialized Hopcroft Karp algorithm to do the same thing a lot faster. If your edges

19834.068 -> are weighted, and your graph is still bipartite, you also have a lot of options, you can use

19840.048 -> a min cost max flow algorithm, or you can run the Hungarian algorithm. And lastly, there's

19846.021 -> the more sophisticated network simplex algorithm which uses linear programming. If however,

19852.87 -> you graph is not bipartite, but your edges are unweighted, you can use admins blossom

19860.398 -> algorithm. And lastly, the hardest of the four variants is when your graph is non bipartite.

19866.838 -> And the edges are weighted. I didn't find much information about this one online. But

19871.458 -> the recommendation seems to be to use dynamic programming on smart graphs. Now let's look

19877.28 -> at an example. This is going to be for the unweighted bipartite case, the easiest of

19882.888 -> the four variants. So I want you to imagine that there are five people and five books

19888.808 -> in the library and that some people express interest in some of the books. This results

19895.318 -> in a bipartite graph with people on one side and books on the other. So far, so good. Now

19902.338 -> suppose we want to find the maximum cardinality bipartite matching, or in other words, we

19907.718 -> want to match as many people with as many books as we can. Let's try the greedy approach

19913.898 -> to this matching problem. Let's start with person green. Their first edge connects to

19920.489 -> the second book on the right side. The second book is unallocated so person green is matched

19926.78 -> with what is now book green. Next up is person orange. The first book they want is the same

19935.548 -> book as person green, which is already matched, so we cannot select person greens book. Their

19943.998 -> next choice is the third book which is unallocated, so they get matched to that one. Next up is

19950.718 -> person purple, they instantly matched to an unallocated book on the right hand side. Now

19957.479 -> person Read, read only has one edge, meaning that they're only willing to read that one

19964.378 -> book. However, that book has already been allocated to person orange, so person read

19971.19 -> cannot have it. Next up is person Brown. They also want person oranges book, but they also

19978.86 -> cannot have it. Fortunately, they have other options of books they're willing to read.

19984.328 -> So person brown gets one of those. So in the end, the greedy approach only found a matching

19991.228 -> of four, only four people were able to be matched with books. But can we do any better?

19997.658 -> Is this the true maximum cardinality matching. Turns out that it's not a greedy approach

20004.488 -> to the maximum matching problem will not work. As we just saw, we need a more sophisticated

20011.638 -> approach to ensure that we are able to get that maximum matching. So we're going to solve

20017.738 -> this maximum matching problem by turning our problem into a network flow problem and finding

20025.158 -> the max flow. The first thing we're going to do is make every edge directed and add

20031.408 -> one unit capacity to each edge. The zero slash one besides each edge means zero flow and

20040.62 -> a maximum capacity of one. Next we're going to introduce two new nodes, the source and

20048.908 -> the sink and hook up edges at words from the source to the people with a capacity of one

20055.488 -> and hook up edges from books to the sink also with a capacity of one. Once that's all set

20061.87 -> up, use any maxo algorithm to push flow through the network. What this will do is show us

20069.308 -> what edges get populated with flow with that information, we will be able to reconstruct

20075.86 -> the maximum matching. Here's a graph after the flow algorithm has ran. You can see that

20083.12 -> some of the edges have one unit of flow. Those were the edges selected by the max flow algorithm.

20090.418 -> The most interesting edges are the middle edges with one unit of flow. These are the

20095.77 -> edges which formed the maximum cardinality matching if We call her in the middle edges,

20102.238 -> which have one unit of flow, you can see that this time everybody goes home with a book

20108.488 -> and no one is left empty handed. Okay, so now we understand how this basic setup works

20114.708 -> and how it leads to a matching. Let's play around with this model a little bit to truly

20119.65 -> understand what all the weights here mean. We originally set the capacity of each edge

20126.45 -> from the source to each person to be one. But what constraint is that really enforcing?

20133.08 -> I'll let you pause the video and think about that for a second because it's so important.

20139.978 -> The answer is that that capacity of one ensures that each person can get up to one book and

20147.488 -> no more. If we increase this number. For some people, we can allow them to possibly pick

20154.498 -> up more than one book. If we rerun the max flow algorithm through this network, we see

20162.828 -> that it's now possible for one person to be matched with multiple books. The next thing

20170.338 -> we want to do is change the flow network to allow a book to be selected multiple times.

20177.29 -> Pause the video and think about how we can modify this flow graph to support having multiple

20185.03 -> copies of the same book in the library. I'll give you a short moment. The number of copies

20194.218 -> of a book is controlled by the capacity of the edges leading to the sink T. Increasing

20201.57 -> this value will allow more flow to run through those designated edges. This effectively limits

20209.668 -> or controls the number of copies of a book, let's change the capacity of those edges leading

20215.668 -> into the sink to allow having multiple copies of the same book and see what happens. If

20223.138 -> we rerun the max flow algorithm. Once again through the network, we see that we now have

20228.79 -> people matched with the same book multiple times because multiple copies exist. For example,

20235.658 -> Book Three and book five, both have two people grabbing a copy of them, the actual assignment

20242.458 -> of people to books would be as follows. I'll let the animation play.

20252.85 -> After the flow algorithm has ran, if you want to know how many copies of each book were

20258.738 -> actually given out, you can inspect the flow value on the edge leading to the sink. Currently,

20266.068 -> each person is only allowed to pick up one copy of each book, even though there are multiple

20270.968 -> copies of each book. How can we modify the flow network to support this? You've guessed

20278.69 -> it, we need to modify the edge capacity between a person and the book to allow that person

20286.638 -> to pick up multiple copies of that book. Today, we're going to look at how to use network

20293.83 -> flow to actually solve a useful problem. The problem we're going to tackle is what I call

20299.27 -> the mice and owls problem, which is a slightly harder variation of another competitive programming

20306.228 -> problem, which I'll link in the description. I love this problem because of its simple

20310.068 -> and elegant solution, but also it's realistic real world application. Let's have a look.

20315.248 -> Suppose there are m mice out on a field and there's a hungry owl about to make a move.

20322.09 -> Assume that the owl can reach every single one of these mice. Further suppose that there

20328.988 -> are h holes scattered across the ground, and that each hole has a certain capacity for

20335.558 -> a number of mice they can hide in it. We also happen to know that every mouse is capable

20341.59 -> of running a distance of are in any direction before being caught by the owl. The question

20347.838 -> asks, what is the maximum number of mice they can hide safely before being caught. If you

20353.708 -> want to give this problem a try, now's a good time to pause the video and try and write

20359.128 -> some code.

20361.558 -> The first step is to figure out which holes each mouse can reach. visualize this by drawing

20367.75 -> a radius of our around each mouse. And if inside the radius, there's a hole or the circle

20374.548 -> touches a hole will assume that the mouse can make it to the hole safely. So if we draw

20379.828 -> an edge between a mouse and a hole, if the mouse can make it to that hole, we get the

20385.228 -> following graph. The next step is to actually match mice to holes to maximize the overall

20392.87 -> safety of the group. By doing a simple quick inspection, it's clear that not every mouse

20399.03 -> should be matched to any hole, for example, this orange mouse should probably not try

20406.33 -> and run to the hole with a capacity of three, because it's the only mouse that can reach

20411.32 -> the hole behind it with a capacity of one, making any bad decision like this has the

20417.26 -> chance to jeopardize the maximum number of overall mice, they can hide safely. The key

20424.17 -> realization with this problem is that the graph is actually bipartite. And once we know

20429.888 -> that, it actually becomes a much simpler problem, because we can set up a flow graph and run

20435.748 -> a maximum flow algorithm to maximize the overall number of nice, which can hide safely, here

20442.01 -> are the steps I would do to set the flow graph and run a max flow. First, I would create

20447.728 -> n mice nodes labeled zero through m minus one inclusive, then on the other side, I will

20455.49 -> create h nodes, each representing a whole, I would label or index these nodes from m

20463.458 -> to m plus h minus one inclusive to give them a different ID than the mouse nodes, then

20470.218 -> I would place an edge with a capacity of one between the mouse and the hole. If the mouse

20476.19 -> can reach that particular hole in time. After that, I would connect an edge with a capacity

20481.718 -> of one from the source to each mouse to indicate that each node can have at most one maps.

20487.378 -> And lastly, connecting edge from each hole node to the sink node with the capacity of

20493.628 -> the hole. The problem has now been transformed into a maximum flow problem, we can run any

20499.91 -> maximum flow algorithm to get the maximum number of mice that can be safe. This is really

20505.42 -> neat. And it's worth looking at some source code to really understand how this setup works.

20511.4 -> All right, here we are in the source code. I have laid out some instructions on the top

20515.53 -> here in case you wanted to download the code and actually play around with it on your machine.

20520.968 -> This program also uses the Ford Fulkerson flow solver we saw two videos ago. So I highly

20527.708 -> recommend you go and watch that video before continuing. I'll link to it in the description

20532.308 -> below, just in case you haven't seen it. So let's get started. The first thing I do here

20537.248 -> is I create a mouse class, which is essentially a wrapper around a point object. Effectively,

20545.248 -> a mouse is just the point on a plane. I also do the same thing with the whole class except

20551.29 -> that the whole class with addition to having a 2d point object, it also has a certain capacity

20557.29 -> because we know that holes can only contain a certain number of mice. Next up in the main

20565.9 -> method, I create a bunch of mouse objects and place them in an array, I scour the mice

20573.048 -> more or less randomly across the field. And then I do the same thing with holes. The last

20577.818 -> thing I do in the main method is called the solve method which actually takes as input

20582.78 -> the two arrays we just created and a radius. The radius is how far a mouse can run from

20589.318 -> its current position before being caught by the hour. The sole method is where things

20595.61 -> really start to get interesting. Let's define some constants that will make our lives a

20601.1 -> lot easier. First is M which is just the number of mice then is h the number of holes we have.

20609.058 -> Following that I compute n the number of nodes, which is the number of mice plus the number

20615.328 -> of holes plus two. The plus two is to account for the source and the sink node. And as per

20621.668 -> convention I always index s&t the source and the sink two indices and minus one and minus

20628.648 -> two to ensure that they are unique. After that I initialize the network flow solver

20635.86 -> base by providing and s&t the solver classes defined below. It's the exact same one from

20643.668 -> the Ford Fulkerson source code video. In short, the solver lets you add edges with various

20649.51 -> capacities to the flow graph and then find the max flow. Once it's all set up. The goal

20654.238 -> of this video is not to explain to you how the max flow itself is found, or how the solver

20660.03 -> works. I already discussed that previously. What I really want to focus on in this video

20665.11 -> is how to set up the flow graph for this problem and push some flow through it when the graph

20670.85 -> is bipartite. Like it is in this problem. The setup is actually pretty straightforward.

20677.03 -> The first step is to hook up edges from the source asked to each mouse with a capacity

20683.18 -> of one. Intuitively, this limits each mouse node to represent at most one mouse. This

20690.36 -> is necessary because we don't want a mouse node to represent more than one mouse that

20694.62 -> doesn't really make sense. The next part is to hook up mouse nodes with holes. nodes in

20701.29 -> the flow graph. This is the middle section of the flow graph where we add an edge between

20707.94 -> a mouse node and a hole if the distance from a mouse to the hole is less than the radius.

20714.988 -> In other words, if the mouse can make it to the hole on time, add an edge connecting the

20720.36 -> mouse and the hole. The last step is also important, you need to remember to hook up

20726.888 -> the edges between the holes and the sink. These edges are slightly different, because

20732.588 -> their capacity represents the number of mice which can fit into each particular hole, say

20737.77 -> a hole has a capacity of three. But there are five mice which can make it to that hole.

20744.04 -> Well, we cannot allow more than three of those mice to fit in the hole. So we set the capacity

20749.048 -> of the edge to the sink to be three, those two leftover mice will need to find another

20754.67 -> hole, or gets scooped up by the apple. The very last thing we need to do to actually

20760.638 -> get the max flow is to run the solver which will return the number of safe mice, which

20767.398 -> for this configuration happens to be four, we're still talking about network flow. And

20772.81 -> today's topic is one of my all time favorite four problems, which is the elementary math

20778.85 -> problem. This problem is so interesting, because its solution uses flow. But it really doesn't

20787.158 -> hit you as a flow problem to begin with. One of the hardest things about network flow,

20792.28 -> I believe is actually identifying that a problem is a flow problem, and then setting it up

20799.85 -> as so this is why I'm spending so many videos solving some flow problems, so you really

20806.388 -> start understanding how to approach them and how they are solved. So let's dive into the

20812.478 -> elementary math problem. This problem is actually on caddis. Should you want to attempt it,

20817.94 -> the link is at the bottom of this slide and also in the description. Here's the problem

20822.93 -> statement. Ellen is a math teacher who is preparing and questions for her exam. In each

20829.718 -> question, the students have to add, subtract or multiply a pair of numbers. Ellen has already

20836.468 -> chosen the N pairs of numbers, all that remains is the side for each pair which of the following

20843.29 -> three possible operations the students should perform. To avoid students getting bored.

20850.16 -> Elon wants to make sure that the end correct answers on her exam are all different. For

20856.628 -> each pair of numbers a B in the same order as the input output a line containing a valid

20863.85 -> equation. Each equation should consist of five parts, a, one of the three operators

20871.87 -> be an equal sign and the result of the expression. All that n expression results must be different.

20879.978 -> If there are multiple valid solutions, output any of them if there are no valid answers,

20885.92 -> output a single line with the string impossible Instead, let's have a look at an example.

20892.958 -> So Ellen goes and picks four pairs of numbers, say one and five, three and three, four and

20899.498 -> five. And lastly minus one and minus six. She wants to assign operators either plus

20906.11 -> minus or multiply to yield the unique answers on the right hand side of the equation. One

20912.62 -> assignment of operators might be the following. However, this assignment of operators doesn't

20918.1 -> quite work because the answers are not unique on the right hand side. Here's another way

20923.61 -> of assigning operators to the pairs of numbers. This assignment does work because the answers

20929.718 -> are unique on the right hand side, which is one of the requirements for the problem. So

20934.6 -> we just saw that not any arbitrary assignment of operators yields a valid solution. But

20941.988 -> it's also possible for no answer to exist, consider the following pairs of numbers. In

20947.83 -> this case, there can be no solution because there are not enough unique answers that can

20952.818 -> be produced using the operators plus minus and multiply. As it happens, this problem

20960.36 -> presents itself as a network flow problem, even though that might not be obvious at first.

20966.298 -> So take a moment and attempt to set up a flow graph that can actually solve this problem.

20972.79 -> It's actually a really great exercise along the way. While you're doing this, there are

20978.76 -> a few questions you should ask yourself, or at least that I asked myself when I first

20983.548 -> did this. The first is there a way that this problem can be simplified into a bipartite

20990.59 -> graph. I asked myself this because I know that solving a flow problem when it's a bipartite

20996.668 -> graph can be done very efficiently and also because by Titan graphs are very easy to set

21002.6 -> up, then I asked myself, how am I going to detect impossible sets of parents? Will my

21009.708 -> flow graph be able to handle that? Or do I need to do some pre or post processing to

21014.37 -> actually figure that out? And lastly, I'm thinking about edge cases. So how, how do

21020.17 -> I handle multiple repeated input pairs? And how is I going to change the flow graph? These

21025.771 -> are all super important questions you need to ask yourself when solving this problem,

21029.94 -> this slide deck explains the first two. And the third is somewhat left as an exercise,

21035.75 -> I don't want to give away the full solution to this really awesome problem. So thinking

21039.798 -> about how we're going to solve this problem a little more, a key realization to make is

21044.87 -> that for every input pair, at most three unique solutions are produced, think of the input

21051.978 -> pair, two and three. Well, for that pair, we can either add two and three, subtract

21058.69 -> two and three, or multiply two and three. So there can be at most three unique results,

21064.53 -> there may be less if there are collisions, think of the input pairs 000, plus 000, multiplied

21071.058 -> by 00. And zero subtracted by zero is also zero. So we may end up with less than three

21077.74 -> unique solutions, and that's fine. The great thing about this is that we can easily set

21082.58 -> up a bipartite flow graph from this because we can have input pairs on one side and solutions

21089.03 -> on the other side, let's see if we can set the flow graph and solve the set of input

21095.02 -> pairs, we have the pairs 1533, minus one minus six, and finally to two so how we're going

21103.068 -> to set up this bipartite graph is we're going to have input nodes on the left side and answer

21109.148 -> nodes on the right side, for our first input pair one, five, if we compute one minus five,

21116.568 -> one plus five, and one multiplied by five, we get minus four, six, and five, which become

21124.468 -> answer nodes on the right hand side, then we want to attach an edge between that input

21129.228 -> pair and the answer. Do the same thing for the next input pair, make an input pair node

21135.15 -> and attach edges to the answer nodes. However, don't create another answer node if there

21140.809 -> already exists one with the value we need. In this example, you'll see that three plus

21146.658 -> three equals six, and we already have an answer node for six. So simply attach an edge from

21153.128 -> three three to six do not create another answer node. This is to ensure that our answers remain

21159.83 -> unique. And do the same thing for the other two remaining input pairs, you'll notice that

21166.45 -> the last input pair only produced two outgoing edges and not three. This is because there

21172.69 -> was a collision in particular, two plus two equals four, but also two multiplied by two

21179.338 -> equals four, and this is fine. Just put one edge don't put two edges. Then like every

21185.02 -> bipartite graph you're trying to find a matching for, you'll want to add the source s and the

21190.748 -> sync T. And the matching is really what we're after. Here, we want to match input pairs

21196.35 -> to answers. And then we've actually solved the problem. The next step, after adding the

21201.728 -> source and the sink is to actually assign capacities to the edges of the flow graph.

21207.75 -> Let's start on the right side, the capacities from the answer nodes to the sink should all

21213.208 -> have a capacity of one since the answers need to be unique and limiting the edge capacity

21218.808 -> to one ensures that capacities for the input pairs two answers should also have a capacity

21225.29 -> of one since only one of plus minus or multiply should actually be matched to enhance capacities

21233.298 -> from the source two, the input pairs should reflect the frequency of the input pair. In

21240.088 -> this example, all frequencies are one. But as we know, that's not always the case. Now

21245.818 -> the flow graph is set up, let's run a max flow algorithm on it. The flow algorithm does

21252.468 -> its thing and some edges are filled with flow. These are the edges that were selected to

21257.728 -> be part of the maximum flow. From this, we can derive what the matching was. More specifically,

21265.558 -> we're interested in the middle edges. Those are the edges which give us information about

21270.59 -> the matching. every edge in the middle with one unit of flow represents a matching from

21276.418 -> an input pair a B to its answer. For example, the input pair one five was matched to the

21283.04 -> answer node six because there's one unit of flow going through that edge. From this we

21288.488 -> can even deduce the operator used for each matching, which is actually needed for the

21294.04 -> final output. This can be done by trying which of the plus minus or multiply on Operator

21301.158 -> results in the found matching. Basically, we first solve the problem by figure out which

21307.51 -> answers we get and then working backwards to figure out which operator was used. In

21312.988 -> theory, we could tag each middle edge with what operator was used, but I didn't bother

21318.62 -> doing that. It's, it's more work. Let's wrap up this problem, the last thing we need to

21323.468 -> do is look at the matchings and figure out what operators were used. The first matching

21329.12 -> is the input pair, one five matched to six. So we ask ourselves, which of the three operators

21336.1 -> plus minus or multiply our results in one five equaling six? So we try all three options,

21342.818 -> and we figure out that, hey, one plus five is six. So the operator is the plus, then

21349.148 -> we move on to the next pair, and then we do the same thing.

21356.26 -> If there are multiple operators that result in the right answer, pick any of them. And

21362.26 -> that's basically it, we can verify that all our operators yield the correct result and

21367.93 -> that all our answers are unique. I didn't go into great detail on how to support multiple

21373.7 -> repeated pairs, but I'll leave that as an exercise to the listener. Today, we're going

21378.888 -> to probe even further into network flow. We're going to be talking about a specific implementation

21385.37 -> of the Ford Fulkerson method, which is the Edmonds Karp algorithm. Edmonds Karp is another

21392.218 -> maximum flow algorithm, which uses a different technique to find augmenting paths through

21397.98 -> the flow graph. Before we get started, let me give you a refresher on what we're trying

21404.128 -> to do. We are trying to find the maximum flow on a flow graph because we know that finding

21411.02 -> the maximum flow is really useful for finding bipartite matchings and also the solve a whole

21418.7 -> host of problems. So far, we've looked at one other technique to find the maximum flow,

21424.93 -> which is to use the Ford Fulkerson method with a depth first search. At a high level,

21430.228 -> it says that all we want to do is repeatedly find augmenting paths from the source to the

21436.02 -> sink argument in the flow and then repeat this process until no more paths exist. The

21442.148 -> key takeaway here is that the Ford Fulkerson method does not specify how to actually find

21449.2 -> these augmenting paths. So this is where we can optimize the algorithm. A few videos ago,

21456.36 -> we saw that the Ford Fulkerson method can be implemented with a depth first search to

21461.62 -> find the maximum flow. However, the pitfall with that technique was that the time complexity

21467.728 -> depended on the capacity values of the edges in the graph. This is because the depth for

21475.19 -> search picks edges to traverse in such a way that we might only ever be able to push one

21481.208 -> unit of flow in each iteration. This is really bad and can kill the time complexity even

21487.5 -> though it's highly unlikely to happen in practice, but it's absolutely something we want to avoid.

21495.18 -> should it happen right now the time complexity of Ford Fulkerson with a depth first search

21500.43 -> is big O of E times f, where e is the number of edges and f is the maximum flow. The idea

21509.02 -> behind Edmonds Karp says that instead of using a depth first search to find augmenting paths,

21516 -> we should use a breadth first search instead, to get a better time complexity. big O of

21521.798 -> V times e squared may not look like a better time complexity, but it actually is. What's

21528.68 -> different is that the time complexity while it might not look great, does not depend on

21534.12 -> the capacity value of any edge in the flow graph, which is crucial recall such an algorithm

21540.42 -> that doesn't depend on the actual input values a strongly polynomial algorithm and that's

21546.548 -> exactly what Edmonds Karp is and why it was so revolutionary at the time. Edmonds Karp

21552.99 -> can also be thought of as an algorithm which finds the shortest augmenting path from s

21558.958 -> to t that is, in terms of the number of edges used in each iteration. Using a breadth first

21564.888 -> search during Edmonds Karp ensures that we find the shortest path This is a consequence

21570.668 -> of each edge being unweighted. When I say unweighted, I mean that as long as the edge

21576.79 -> has a positive capacity, we don't distinguish it between one edge being a better or worse

21583.568 -> than any other edge. Now, let's look at why we might care about using Edmonds Karp. Suppose

21590.628 -> we have this flow graph and we want to find what the maximum flow is. If we're using a

21595.478 -> depth first search, we might do something like this. Start at the source Do a random

21601.54 -> depth first search forwards.

21611.748 -> So after a lot of zigzagging through the flow graph, we are able to find the sink. As we

21617.159 -> just saw a depth first search has the chance to cause long augmenting paths and longer

21623.248 -> paths are generally undesirable because the longer the path, the higher the chance for

21628.478 -> a small model neck value, which results in a longer run time. Finding the shortest path

21634.28 -> from s to t, again in terms of number of edges is a great approach to avoid the depth first

21640.808 -> search worst case scenario and reduce the length of augmenting paths to find the shortest

21646.578 -> path from s to t do a breadth first search starting at the source and end to get the

21651.388 -> sink while exploring the flow graph. Remember that we can only take an edge if the remaining

21658.54 -> capacity of that edge is greater than zero. In this example, all edges outwards from s

21664.95 -> have a remaining capacity greater than zero. So we can add all the neighbors to the queue

21670.668 -> when we're doing the breadth first search step. And then we keep going forwards, so

21677.808 -> add all reachable neighbors to the queue and continue. And now the breadth first search

21686.128 -> has reached the sink, so we can stop. In the real algorithm, we would stop as soon as any

21693.648 -> of the edges reached the sink. But just for symmetry, I show three edges here entering

21698.498 -> the sink, while in reality, we would stop as soon as one of them reaches the sink. If

21704.218 -> we assume that the bottom edge made it to the sink first, and we retrace the path, we

21708.908 -> get the following augmenting path. But we didn't just find any augmenting path, we found

21714.1 -> a shortest length augmenting path. So to augment the flow, do the usual find the bottleneck

21720.908 -> value by finding the smallest remaining capacity of all the edges along the path, then augment

21726.111 -> the flow values along the path that by the bottleneck. So that was the first path however,

21731.748 -> we're not done yet. Let's continue finding paths until the entire graph is saturated.

21739.02 -> Recall that while exploring the flow graph, we can only reach a node if the remaining

21744.62 -> capacity of the edge to get to that node is greater than zero. For instance, all the reachable

21751.34 -> neighbors of the source node in this case does not include the bottom left node because

21757.06 -> the edge from the source to the bottom left node has a remaining capacity of zero. All

21763.2 -> right, keep exploring until the sink is reached. And now we've reached the sink once more.

21775.1 -> So find the bottleneck value along this path. Then use the bottleneck value to update the

21781.84 -> flow along the augmenting path. Don't forget to update the residual edges. And we're still

21788.968 -> not done because there still exists another augmenting path. So now there only exists

21796.53 -> one edge outwards from the source with a capacity greater than zero, so it's the only edge we

21802.51 -> can take. So we follow it. There's also only one edge to follow from the second node because

21807.61 -> the other edges have a remaining capacity of zero.

21812.99 -> And now the breadth first search has reached the sink, we can trace back the edges that

21818.048 -> were used. We can find the bottleneck by finding the minimum capacity along the path and also

21825.26 -> augment the flow. And now you can see that there are no more augmenting paths left to

21830.51 -> be found because all the edges leading outwards from the source have a remaining capacity

21835.718 -> of zero. However, more generally, we know to stop Edmonds Karp, when there are no more

21841.818 -> augmenting paths from s to t, because we know we cannot increase the flow anymore. If this

21847.95 -> is the case, the maximum flow we get from running Edmonds Karp is the sum of the bottleneck

21853.241 -> values. If you recall in the first iteration, we were able to push five units of flow in

21859.498 -> the second iteration 10 units and in the last iteration, five units for a total of 20 units

21865.93 -> of flow. Another way to find the maximum flow is the sum the capacity values going into

21872.12 -> the sink, which I have circled in red. In summary, this is what we learned using depth

21878.748 -> first search on a flow graph can sometimes find a long, windy path from the source to

21884.12 -> the sink. This is usually undesirable because the longer the path, the smaller the bottleneck

21888.86 -> value, and the longer the runtime. Edmonds Karp tries to resolve this problem by finding

21894.628 -> the shortest length augmenting paths from the source to the sink using a breadth first

21899.318 -> search. However, more importantly, the big achievement of admins Corp is that it's time

21905.28 -> complexity of big O of the times e squared is independent of the max flow. So it doesn't

21911.408 -> depend on the capacity values of the flow graph. And that's admins carp in a nutshell.

21917.17 -> Today, we're going to have a look at some source code for the Edmonds Karp algorithm.

21922.26 -> Alright, here we are in the source code written in Java, I've laid out some instructions here

21927.668 -> in the header in case you wanted to download the code, play around with it and run it yourself.

21933.418 -> If I scroll down, you can see that we still have the same setup as before with the edge

21938.218 -> class, right here and the network flow base solver. But there is one important change

21946.668 -> I have made since the Ford Fulkerson video. And that is I have added three new methods.

21952.588 -> If we scroll down, you can see that three new methods are right here. The three methods

21959.79 -> I added abstract away visiting nodes and marking all known says unvisited now, this is all

21966.048 -> done efficiently internally, through the network flow based solver class using a visited token

21972.18 -> you don't have to worry about it also helps readability for anybody who's new to the code.

21977.308 -> Alright, now let's have a look at the Edmonds Karp solver which is the only thing different

21982.85 -> in this file. First, notice that the Edmonds Karp solver extends the network flow solver

21989.828 -> base. In doing so we get a whole bunch of things for free, including the ability to

21995.218 -> construct a flow graph, before we push flow through it. In the constructor for the Edmonds

22000.718 -> Karp solver, all I do is call the superclass constructor. This performs various initializations,

22006.908 -> including allocating memory for the flow graph and registering which nodes are the source

22010.94 -> and sink. The most important method and the Edmonds Karp solver is the solve method right

22019.7 -> here. The sole method is called just before we get the maximum flow, this method is really

22026.968 -> short. All we do is repeatedly find augmenting paths from the source of the sink until the

22034.24 -> flow we get is zero, at which point we know that the graph is fully saturated and no more

22039.578 -> augmenting paths can be found line by line. And what we do is mark all nodes as unvisited

22045.99 -> before each iteration, run a breadth first search and get the bottleneck value, and then

22052.12 -> some overall bottleneck values to calculate the maximum flow. Now let's take a closer

22057.898 -> look at the breadth first search method. The first thing I do is initialize an empty queue

22064.44 -> data structure. Because I know that we're going to need one to do a breadth first search

22068.76 -> after the creation of the queue. What I do is I visit the source node and add it to the

22073.638 -> queue so that the breadth first search starts at the source.

22080.41 -> Then do your standard breadth first search loop. While there are still nodes in the queue,

22085.308 -> remove the first node found in the queue. If it's the sink, stop, otherwise iterate

22090.02 -> through all valid adjacent neighbors, we can add a node to the queue if it is not already

22095.718 -> visited, and the edge leading to the node has a capacity greater than zero. However,

22101.718 -> before we add the node to the queue, we visit it and track where it came from by placing

22107.878 -> an edge in the prep array to rebuild the augmenting path later on. Alright, so moving on, we know

22115.52 -> that the breadth first search did not actually make it to the sink if we have no entry at

22121.228 -> the index of the sink in the prep array, so we can return early after this point. We know

22127.04 -> that there exists in augmenting path. Since we know an augmenting path exists, we can

22132.408 -> find the bottleneck value that is the smallest remaining edge capacity along the path. We

22138.468 -> do that by starting at the sink and reconstructing the augmenting path going backwards by repeatedly

22144.12 -> reaching into the prev array until we are back at the source, then we need to update

22149.85 -> the flow along the augmenting path to adjust the flow values. So once again loop through

22154.77 -> the edges forming the augmenting path, then the I've met method takes care of increasing

22159.888 -> the flow along the forward edges and decreasing the flow along the residual edges. The very

22165.35 -> last thing to do is to return the bottleneck value so that we can sum the max flow and

22171.558 -> the solve method. And that's basically it for Edmonds Karp. to actually build a flow

22177.818 -> graph. Have a look at the example right here in the main class. It sets up the flow graph

22183.1 -> from the previous video and pushes flow through it. You can see down here where we actually

22189.078 -> create the solver and run the solver to get the maximum flow and then finally display

22194.978 -> the resulting graph after the maximum flow has been pushed through it. So this is really

22200.37 -> handy to understand. So please have a look at this in more detail. If you're struggling

22205.888 -> to understand it and Karp, today, we're still talking about network flow. And in particular,

22211.558 -> we're going to cover something called capacity scaling, which is really more of a heuristic

22217.328 -> than it is an algorithm. Capacity scaling is a heuristic which says that we shouldn't

22223.218 -> attempt to push flow only through the largest edges first, and then allow using edges which

22228.78 -> have smaller capacities and do this to achieve the maximum flow more rapidly. Just before

22235.2 -> we dive into capacity scaling, I want to quickly revisit finding the max flow using a depth

22241.17 -> first search and the issues surrounding that. I keep coming back to this because I think

22245.628 -> it's important that we understand the intuition behind why all these new max flow algorithms

22251.978 -> were developed and why they came about. When we're looking at finding augmenting paths.

22258.16 -> The worst case is when we can only augment the flow by one unit, and it goes something

22264.28 -> like this, we start at the source node, we take any edge with a remaining capacity greater

22271.27 -> than zero. And we just keep going until we reach the sink. And once we've reached the

22279.708 -> sink, we find the bottle max value that is the edge with the smallest remaining capacity

22284.588 -> along our augmenting path, which in this case happens to be one. Then we argument or update

22291.628 -> the flow by adding the bottleneck value to the flow along the forward edges and subtracting

22297.588 -> flow by the bottleneck value along the residual edges. However, we're not done. So we're going

22303.968 -> to start once again at the source and start finding another path. Suppose this time we

22310.15 -> take the edge going down, then take the residual edge going up and sideways, and then down

22317.088 -> again. And now we have found another augmenting path, and we can find its bottleneck value.

22323.968 -> Recall that the remaining capacity of an edge is calculated as the capacity minus the flow.

22330.74 -> This allows residual edges with a negative flow to have a positive remaining capacity.

22337.78 -> Notice that yet again, the bottleneck value for this path is only one unit of flow. Now

22345.138 -> update or augment the flow. Do this by adding the bottleneck value to the flow along the

22350.77 -> forward edges and subtracting the flow by the bottleneck value along the residual edges.

22358.7 -> You could imagine the depth first search algorithm repeatedly taking an edge with a capacity

22364.478 -> value of one each time, which would ultimately limit how much flow we can push through the

22370.238 -> network in each iteration, as shown in the next few slides. So it would look like this

22375.558 -> we just keep alternating between the forward and the residual edge with a capacity of one.

22383.158 -> Capacity scaling is the idea that we should prioritize taking edges with larger capacities

22388.828 -> first to avoid ending up with a path with a small bottleneck. If we adjust the size

22394.798 -> of each edge based on its capacity value, then we can more easily visualize which edges

22401.04 -> we should give more attention to the capacity scaling algorithm is pretty straightforward.

22408.83 -> But first, we need to define two variables that we will need. let u equal the value of

22415.248 -> the largest edge capacity in the initial flow graph. And also let Delta be the largest power

22424.27 -> of two which is less than or equal to the value of view. The capacity scaling heuristic

22430.66 -> says that we should always take edges whose remaining capacity is greater than or equal

22435.85 -> to delta in order to achieve a better runtime. But that's not everything to the algorithm.

22443.4 -> The algorithm will repeatedly find augmenting paths through the flow graph which have a

22449.058 -> remaining capacity greater than or equal to delta. Until no more paths satisfy this criteria.

22456.708 -> Once this criteria is no longer met, what we do is decrease the value of delta by dividing

22463.498 -> it by two. And then we repeat this process while delta is greater than zero. So the reason

22470.998 -> you would want to implement capacity scaling is because it's very easy to code up and it

22476.94 -> works very, very well in practice. In terms of time complexity, capacity scaling with

22484.37 -> a depth first search runs in big O of E squared log Q. And in big O of E times v log you if

22492.838 -> the shortest augmenting path is found, which is basically Edmonds Karp but with capacity

22498.54 -> scaling, although I have found that to be much slower, so I would recommend the depth

22503.728 -> for search if you are going to implement this. Let's do an example, let's find the maximum

22509.488 -> flow of the following flow graph using capacity scaling. First, compute you as the maximum

22516.62 -> of all initial capacity values. In this example, use the maximum of 614 157 10 111 and 12,

22528.208 -> which happens to be 14. Next, compute the starting value for delta, which is the smallest

22533.61 -> power of two less than or equal to u, which we know is 14. Therefore, the starting value

22539.77 -> of delta is eight since the next power of two after eight is 16. But 16 is larger than

22546.298 -> 14. Now that we have delta, we can start finding paths from s to t, which have a remaining

22553.488 -> capacity greater than or equal to eight, start our search at the source from the source,

22559.7 -> there's only one edge which has a remaining capacity of eight or more, which is the edge

22564.838 -> with the capacity of 14 going downwards, then there's the edge sideways with a remaining

22571.53 -> capacity of 10, we can take and finally an edge with the remaining capacity of 12 going

22577.04 -> upwards, which we can also take. Now we've reached the sink. So we can find the bottleneck

22582.548 -> value which is 10. Because 10 is the smallest remaining capacity along the found path. Next,

22589.578 -> augment the flow along the path, I scaled down the size of each edge to reflect how

22594.94 -> much remaining capacity they have left, you can analyze the flow graph, but there are

22600.07 -> no more augmenting paths from s to t which have a remaining capacity greater than or

22604.898 -> equal to eight. So the new delta is haften two, and is now equal to four. One path we

22611.488 -> can take with all remaining capacities of four or more is the following. Start at the

22617.908 -> source. Go up, sideways and sideways again, then do the usual find the bottleneck and

22627.6 -> augment the flow. There is also another path with all remaining capacities rather than

22633.44 -> four which we can take from stt, which is

22638.87 -> down to one diagonally up to node two and to the sink again, find the bottleneck value,

22646.09 -> which we know to be four, because four is the smallest remaining capacity along a path,

22651.718 -> then we can augment the flow. If you now inspect the flow graph, there are no more paths with

22657.738 -> a remaining capacity with all values greater than or equal to four from s to t, so half

22664.54 -> the value of delta and two However, there are also no paths with the remaining capacity

22669.888 -> of all two or more. So we need to have the value of delta again. So now delta is equal

22675.35 -> to one, I believe there is one remaining path we can take before the graph is fully saturated.

22682.048 -> Let's start at the source and find it.

22687 -> Alright, now we found the path. And we can also find the bottleneck which has a value

22694.718 -> of one. And now the last step is to augment the flow. And now there are no more paths

22701.298 -> from s to t which have a remaining capacity greater than or equal to one. So the new value

22706.468 -> of delta is zero, which terminates the algorithm, we can compute the maximum flow by summing

22712.5 -> up all the bottleneck values we found in each iteration, which we know to be 10 Five, four

22719.191 -> and one for a total of 20. We can also compute the maximum flow by summing the flow values

22726.798 -> going into the st highlighted in red. So in summary of what we have learned about, we

22733.19 -> know that Ford Fulkerson implemented with a depth first search can result in having

22737.83 -> a bottleneck value of one in each iteration, which kills the time complexity. Capacity

22744.25 -> scaling is when we push flow only through larger edges first, to try and achieve a better

22750.69 -> runtime. One approach to capacity scaling is to maintain a decreasing parameter Delta,

22757.2 -> which acts as a threshold for which edges should be accepted and which should be rejected

22764.228 -> based on their remaining capacity. This is a pretty simple but extremely powerful idea

22769.818 -> that greatly speeds up finding the maximum flow. Today we're going to have a look at

22774.79 -> some source code for the capacity scaling algorithm. Okay, here we are on the source

22780.468 -> code written in Java. I've laid out some instructions here in the header in case you actually want

22785.958 -> to get the code play around with it and run it yourself. Scrolling down you can see we

22791.17 -> have the familiar edge class here. This is the class used to represent an edge that connects

22797.378 -> two nodes with a certain company. If I scroll a little further down, we have the network

22804.088 -> flow solver base, which acts as a template for all the different flow algorithms we have

22809.24 -> been implementing. I have already covered how these two classes work in the previous

22814.308 -> videos linked below. So please have a look at those before continuing. However, the class

22819.11 -> we're really interested in is the capacity scaling solver. Right here. The capacity scaling

22826.408 -> solver is an implementation of the network flow solver base, which uses capacity scaling

22832.35 -> to find the maximum flow, you will notice that I have defined one new instance variable

22838.398 -> in this class, which is called Delta. This is the same Delta that we saw from the slides,

22844.35 -> it's the parameter we use to determine whether an edge should be accepted or rejected based

22849.95 -> on the remaining capacity relative to the value of delta. The constructor for this class

22855.798 -> simply calls the super classes constructor to initialize the flow graph and allocate

22859.95 -> some memory that will need to actually push flow through the network. Just below is the

22865.078 -> Add edge method. The Add edge method is particularly interesting for capacity scaling to work,

22870.738 -> we need to know the value of the edge with the largest capacity in our flow graph. Since

22875.43 -> we also need to construct the flow graph to actually do anything interesting, we can capture

22880.498 -> the largest capacity value as we build the graph. The implementation of the Add edge

22885.441 -> method is defined in the network flow solver base, which we don't actually want to change

22891.18 -> the functionality of. So inside this add edge method, which I'm overriding here I do is

22897.18 -> I call the super classes add edge method. And I also initialize delta to be the largest

22902.42 -> capacity we encounter as edges come through simple enough. Inside the solve method, which

22911.578 -> gets called to compute the maximum flow, the first thing we do is initialize delta to be

22917.03 -> the largest value of two less than or equal to the largest capacity. And one way to do

22921.93 -> this is to find the floor of the base two logarithm and then raise that value to a power

22927.668 -> of two or in Java, you can simply use the built in function highest one bit to do that

22933.508 -> for you more efficiently. Following that we repeatedly find augmenting paths from the

22938.4 -> source to the sink using only edges with the remaining capacity greater than or equal to

22943.36 -> delta. After each iteration, we have the value of delta to allow taking smaller edges and

22949.61 -> being able to find more augmenting paths from the source to the sink until the graph is

22954.62 -> fully saturated. Inside the inner loop, we mark all the nodes as unvisited then we do

22959.75 -> a depth first search and sum over the bottleneck values to calculate the maximum flow we repeatedly

22965.04 -> do this until delta is equal to zero.

22969.5 -> Now let's have a look at the depth for search method. The depth first search method takes

22974.328 -> two arguments the current node and the minimum flow found along the path so far, when we

22980.941 -> initially call this method, the source node is the current node and the flow is set to

22986.69 -> positive infinity. This method performs the depth first search recursively. And we know

22992.09 -> we can stop searching when we have reached the sync node t. If the current node is not

22998.548 -> the sync node, then visit the current node and iterate through all the neighbors of the

23003.748 -> current node. However, here's the catch though we cannot take an edge going to a neighboring

23008.49 -> node if the remaining capacity of that edge is smaller than delta because this violates

23013.718 -> the capacity scaling heuristic, we must also ensure that the node we're going to has not

23018.958 -> already been visited. We do this to avoid cycles in the flow graph. Inside the inner

23024.03 -> if statement we call the depth first search method recursively passing the node or going

23028.748 -> to as the current node and the new flow as the minimum of the current flow and this edges

23034.888 -> remaining capacity, the depth for search returns the bottleneck value along the augmenting

23039.818 -> path. So after the depth first search call, we are unwinding the call stack from the sink

23045.7 -> back to the source. This is a perfect time to augment the flow of each edge along the

23050.558 -> augmenting path since we have the bottleneck value right there. So if the bottleneck value

23055.508 -> is greater than zero, this means we have found a valid augmenting path and we want to augment

23060.53 -> the flow, which is remember adding flow along forward edges and subtracting flow along residual

23067.328 -> edges. This is all done through the argument method in the edge class and finally, return

23073.29 -> the bottleneck value. If we scroll down even more, you can see that this is the main method

23079.2 -> right here. In here I set up an example of how to set up a flow graph. Specifically,

23084.75 -> this is the flow graph from the slides. So I create a flow solver. I add all the edges

23090.548 -> and then I push some flow through it and get the maximum flow. I also display the resulting

23095.308 -> flow graph after the flow algorithm has been executed so you can see what happened. Awesome.

23101.418 -> That's all I wanted to cover for capacity scaling. Today, we're still talking about

23105.158 -> network flow. And in particular, we're looking at finding the maximum flow and a new, very

23111.35 -> efficient method of solving the unweighted bipartite matching problem. denix algorithm

23116.52 -> is one of those extremely fast and revolutionary algorithms, which really push the field of

23121.6 -> network flow forwards. It was one, if not the first algorithm to introduce a bunch of

23127.17 -> new concepts like building a level graph, combining multiple graph traversal techniques

23132.28 -> together and the concept of a blocking flow, all of which we'll get into. So what is the

23138.918 -> next algorithm? It's a fast, strongly polynomial maximum flow algorithm. The fact that it's

23144.931 -> strongly polynomial is important, it means that the run time doesn't depend on the capacity

23150.12 -> values of the flow graph for which all we know could be very large. What's remarkable

23156.068 -> about dynex is that not only is it fast in practice for general graphs, but it boasts

23161.908 -> performance on bipartite graphs running and the time complexity of big O of square root

23167.578 -> v times E. The importance of this cannot be overstated. It makes it possible to handle

23173.76 -> bipartite graphs of a ridiculous size. If you're doing competitive programming. dynex

23180.11 -> is the de facto standard algorithm to solve maximum flow algorithms. The algorithm was

23185.328 -> conceived in 69 by Ephraim Dennis and published in 1970. The algorithm was later modified

23192.458 -> slightly and popularized by Shaiman. Evan mispronouncing denotes algorithm as demyx

23198.988 -> algorithm. Let's start by talking about the algorithm itself. But first, beginning with

23205.17 -> an analogy. Suppose you and a friend are planning to meet up at the coffee shop a few streets

23211.248 -> east of where you are, you've never been to this coffee shop, and you don't exactly know

23216.1 -> where it is, but

23217.248 -> you know, it's somewhere east. So how would you get there? With the information you have?

23222.718 -> Would it make sense to head south? What about Northwest, the only sensible directions are

23229.68 -> East, North East and Southeast This is because you know that those directions in guarantee

23235.958 -> that you make a positive progress towards the coffee shop. This form of heuristic ensures

23242.95 -> that we continuously make progress towards whatever place of interest we desire to go.

23248.86 -> So how can we apply this concept to solving the maximum flow? In this analogy, you were

23255.85 -> the source node and the coffee shop is the sink. The main idea behind dynex algorithm

23261.898 -> is to guide augmenting paths from the source to the sink using the level graph. And in

23268.509 -> doing so greatly reducing the runtime. The way dynex determines what edges make progress

23274.248 -> towards the sink T and which do not is by building what's called a level graph. The

23280.628 -> levels of a graph are those obtained by doing a breadth first search from the source. Furthermore,

23287.5 -> an edge is only part of the level graph, if it makes progress towards the sink, that is

23293.158 -> an edge must go from a node at level l to another node at level l plus one, the requirement

23300.149 -> that edges must go from L to L plus one prunes backwards or what I call sideways edges. Those

23307.068 -> are all the gray edges in the slide. So ask yourself if you're trying to get from s to

23314.29 -> t as quickly as possible, does it make sense to take the red edge going in the backwards

23320.148 -> direction on the slide? No, taking the red edge doesn't bring you any closer to the sink,

23326.28 -> so it should only be taken if a detour is required. This is why backwards edges are

23332.53 -> omitted from the level of graph, the same thing can be said about edges which cut across

23337.638 -> sideways across the same level since no progress is made. It's also worth mentioning that residual

23345.308 -> edges can be made part of the level graph but they must have a remaining capacity greater

23351.67 -> than zero. So that's the level graph the actual steps to executing denix are as follows First

23359.228 -> construct a level graph by doing a breadth first search from the source to label all

23363.67 -> the levels of the current flow graph. Then, if the sink was never reached, while building

23369.37 -> the level graph, you know, he can stop and return the value of the maximum flow. Then

23374.62 -> using only valid edges in the level graph do multiple depth first searches from the

23380.85 -> source to the sink until a blocking flow is reached and sum over the bottleneck values

23386.86 -> of all augmenting paths calculate the maximum flow as you do this. Repeat steps 123 a blocking

23393.978 -> flow is when we cannot find any more paths from the source to the sink because too many

23400.04 -> edges in the level of graph have been saturated. This will all become clear with an example,

23406.2 -> let's use the next algorithm to find the maximum flow of this flow graph. If this were a bipartite

23412.18 -> graph, we would also be able to get a maximum matching as a result. All right, step one

23417.43 -> is to figure out which edges are part of the current level graph, you don't need to think

23422.248 -> of the level of graph as a totally separate graph, you can think of it rather as a subset

23427.53 -> of the edges. So we start at the source and do a breadth first search outwards. The first

23433.12 -> layer includes all the red nodes, then this is the second layer, and so on until we reach

23440.338 -> the sink. Now, if we focus on the edges, which formed the level graph, we can see that they

23446.498 -> are all edges which go from L to L plus one and level and have a remaining capacity greater

23452.6 -> than zero. Step two of the algorithm is to find paths from s to t until a blocking flow

23459.89 -> is reached. That is, we cannot find any more paths through the level graph. So we start

23465.548 -> at the source and do a depth first search on the edges of level graph until the sink

23469.838 -> is reached. So we've found our first augmenting path and the bottleneck value along this path

23479.04 -> is five since five is the smallest remaining capacity, so update the flow values along

23484.978 -> the path by five. If you inspect the graph, the blocking flow has not yet been reached,

23490.758 -> since there still exists paths from s to t. Start once again the source and do a depth

23498.468 -> first search forwards. Now we found another path, this one has a bottleneck value of 15.

23508.1 -> So augment the flow along the path by 15 units. Now let's try and find another path from s

23514.408 -> to t.

23519.398 -> What happens now is that we get stuck performing the depth first search, there are no edges

23525.128 -> in the level of graph with a remaining capacity greater than zero, which can lead us to the

23530.218 -> sink. So the blocking flow has been reached, we just finished the first blocking flow iteration.

23536.988 -> Now we reset and rebuild the level graph. This time it should look different because

23542.218 -> the remaining capacities of multiple edges has changed. Start at the source expand outwards

23548.14 -> taking all edges with a remaining capacity greater than zero, which in this case is only

23553.398 -> the middle edge leading us to the red node, the top edge going outwards from the source

23558.01 -> of saturated and so is the one going downwards. We keep doing this and building the level

23563.148 -> graph layer by layer. Awesome. So this is our new level graph. You can see that this

23572.61 -> time we have one extra layer to play with. Let's try and find a path from s to t. Once

23579.83 -> again, we start at the source and probe forwards using only edges part of the level graph.

23587.91 -> Oops, we have now reached a dead end in our depth first search because we can no longer

23595.68 -> go forwards. What we need to do is backtrack and keep going until we reach the sink.

23609.77 -> Perfect We made it to the sink the current path has a bottleneck value of 10 now augment

23615.2 -> the flow by 10 units. And now if you inspect the flow graph, you will notice that the blocking

23621.378 -> flow has once again been reached. Now no more flow can be pushed through the network when

23627.84 -> we build the level of graph, which means the algorithm terminates. The maximum flow is

23633.68 -> the sum of all the ball and x values which if you recall were 515 and 10. For a maximum

23641.298 -> flow of 30. The maximum flow can also be calculated by looking at the flow values of the edges

23649.548 -> leading into the sink highlighted in red on the slide. However, one of the pitfalls of

23657.17 -> the current implementation of Linux algorithm at the moment is that it may encounter multiple

23663.738 -> dead ends during a depth first search phase. This is especially bad if the same dead end

23671.888 -> is taken multiple times during a blocking flow iteration. To resolve this issue in his

23679.27 -> original paper denotes suggested cleaning the level graph and getting rid of all the

23686.008 -> dead ends before each blocking flow phase. Then later in 1975. Shaiman Evans suggested

23694.048 -> pruning dead ends when backtracking during the depth for search phase, effectively getting

23699.918 -> rid of dead ends on the fly as the algorithm executes. This trick greatly speeds up and

23706.35 -> simplifies the algorithm because that ends are only ever encountered once. Awesome. So

23713.09 -> that's basically everything you need to know about Linux. So let's summarize everything

23717.908 -> that we've learned. First, we talked about the motivation behind Linux, and why having

23723.128 -> a guiding heuristic can greatly speed up our algorithm. Then we talked about the intuition

23729.29 -> and practicality behind having a level graph that directs edges towards the sink. Then

23735.76 -> we talked about the concept of a blocking flow, which is achieved by doing multiple

23739.998 -> depth first searches on the level graph until the graph is saturated.

23745.03 -> Afterwards, we looked at the process of rebuilding the level graph, and finding the blocking

23751.53 -> flow and doing this process repeatedly until no more augmenting paths exist and the maximum

23757.578 -> flow is found. And lastly, we talked about a critical optimization of minutes algorithm,

23764.26 -> which is pruning dead ends so that we do not encounter them again. Today, we're going to

23769.738 -> have a look at some source code for dynex algorithm. Okay, let's get started. Here we

23776.079 -> are in the source code written in Java, I laid out some instructions here in the header

23781.578 -> in case you wanted to get the code play around with it and run it yourself. Scrolling down

23788.298 -> as before, you can see the familiar edge class this class is used to represent an edge that

23794.79 -> connects to nodes with a certain capacity. It also has two important methods remaining

23801.818 -> capacity, which returns the true remaining capacity of an edge along with the argument

23807.85 -> method, which updates the flow along this edge and also the residual edge by a certain

23813.628 -> amount. A little further down is also the network flow solver base, which acts as a

23820.52 -> template for all the different flow algorithms we have been implemented. I already covered

23825.308 -> how this class and the edge class work and in previous videos linked below, so I won't

23830.208 -> spend too much time here. But what you need to know is that this class initializes the

23834.568 -> flow graph, it allows edges to add the flow graph, I like to call the get max flow method,

23841.218 -> which is somewhere down here, right here. Internally, they get Maxwell method calls

23846.79 -> the abstract solid method, which we need to implement by subclassing, the network flow

23852.238 -> software base. So the part that we are really interested in is this dynex solver right here,

23859.738 -> you will notice that the dynex solver class extends the network flow solver base network

23864.628 -> flows or base gets initialized when we call super by feeding it the three inputs n s and

23870.94 -> t n is the number of nodes in our graph, S is the index of the source node and T is the

23876.87 -> index of the sinking node. Just after that I initialize an array instance variable column

23883.61 -> level to be a size and the level instance variable keeps track of the level of each

23890.27 -> note. Now we're a level graph. Moving on the following method is the solve method. Recall

23897.009 -> that this is the method that we need to override and compute the maximum flow in. Remember,

23903.02 -> what we're trying to do did an algorithm begins by building a level graph using a breadth

23907.988 -> first search that is the outer loop. And for each level graph, we need to find the blocking

23913.24 -> flow by repeatedly doing multiple depth for searches from the source to the sink until

23918.968 -> the level graph is saturated, and the blocking flow is reached. Once that happens, rebuild

23924.568 -> the level graph and repeat the process until the graph is truly saturated. Let's have a

23929.85 -> look at the breadth first search method. So the breadth first search method really serves

23935.758 -> two purposes. One is to build the level graph and assign a level to each node in the level

23941.138 -> array. And the other purpose is captured by the return value of the function. And that

23946.648 -> is to determine if we are able to reach the sink during the breadth first search phase.

23951.92 -> And if not, this means that the graph is fully saturated and the algorithm can stop. The

23957.558 -> first thing I do in this method is mark each note as unvisited by setting each entry in

23962.548 -> the level array to be

23964.058 -> minus one. Then I initialize a queue data structure that we will need when performing

23968.798 -> the breadth first search. After that I immediately add the source node to the queue. That's because

23974.248 -> we're starting the breadth first search at the source node. Since we're already at the

23978.52 -> source node, we can mark the distance to the source node to be zero. Once we start the

23983.068 -> breadth first search loop while the queue is not empty, each iteration we remove the

23988.218 -> first node index we find in the queue and iterate through all the adjacent edges of

23993.27 -> that node. When building the level graph, we want to ensure two things first, that the

23998.238 -> remaining capacity of the edges We take r greater than zero and that we are selecting

24003.37 -> unvisited nodes. If both those cases hold, then we can compute the level for that node

24008.758 -> we're about to visit and add it to the queue. This process continues until the queue is

24013.258 -> empty and the entire level graph is built. The last thing we do is return if we were

24018.18 -> able to reach the sync node during the breadth first search phase. Okay, coming back to the

24023.82 -> solve method, now we understand how the breadth first search method works and how the level

24028.888 -> graph is constructed. Now let's have a look at the depth research method. However, before

24034.16 -> we do that, there's a key piece of information you need to know about and that is the next

24038.66 -> array in this method, the next array is part of the Shaiman evon optimization, and it is

24044.19 -> how we are able to prune dead ends efficiently. The idea is that since our graph is stored

24050.17 -> as an adjacency list, the list of edges going outwards from each node is indexed. And we

24056.34 -> can use this to our advantage to get the next edge to to reverse and skip all the edges,

24062.548 -> which we know lead to dead ends. Say we're at node i and we take the first edge in our

24067.87 -> adjacency list for node i suppose that this turns out to lead us to a dead end. Well,

24073.388 -> next time as in the next step, the first search in which we encounter the same node, we should

24078.738 -> not take the first edge in the adjacency list for that node, because we know it will lead

24083.27 -> us to a dead end. The next array is a way of tracking for each node, which edge we should

24088.89 -> take next, each iteration, you want to reset the next array to allow taking previously

24094.548 -> forbidden edges. All right, so we call the depth research method and we pass in three

24100.408 -> argument, the current node being the source, the next array and the minimum flow along

24105.62 -> the path, which starts at positive infinity, then for each augmenting path that we find

24111.008 -> sum over the bottleneck values to compute the maximum flow. All right, let's have a

24115.43 -> look at the depth first search method itself. The depth through search method takes three

24120.958 -> arguments, the current node, the next array, and the minimum flow along

24125.708 -> the path. So far, this method performs a breadth first search recursively. And we know we can

24131.03 -> stop searching when we have reached the sync node t. Then I captured the number of edges

24136.75 -> going out of this node, the for loop loops through all the edges. While we have not tried

24142.54 -> taking each edge for the current node, the next edge to take is the next outgoing edge

24147.798 -> from this node at the index in the next array. The thing we have to watch out for is that

24153.27 -> we must ensure that the selected edge has a remaining capacity greater than zero, and

24158.69 -> that it goes up a level. Remember that we're always trying to make progress towards the

24163.36 -> sink and taking an edge at the next level guarantees that unless of course, it leads

24168.738 -> to a dead end. But we end up pruning those so it doesn't really matter. So if all goes

24174.77 -> well, we get to enter the inner if statement. Inside the inner if statement, we call the

24179.53 -> depth for search method recursively passing in the node, we're going to as the current

24184.918 -> node and the next array and the flow as the minimum of the current flow and the edges

24190.36 -> remaining capacity. The depth for search returns the bottleneck value along the augmenting

24195.168 -> path after the this death research call, we are unwinding the call stack if you will,

24201.48 -> and we're going from the sink back to the wards the source, this is a perfect time to

24206.568 -> augment the flow for each edge along the augmenting path. Since we already know what the bottleneck

24212.078 -> value is. So if the bottleneck value is greater than zero, meaning we actually found an augmenting

24217.43 -> path augment the flow, which means to add flow along the forward edge and subtract flow

24222.878 -> along the residual edge. And once all that is done, simply return the bottleneck value.

24228.398 -> So assuming we were not able to take the selected edge from the current node, because it did

24234.568 -> not have enough remaining capacity, or didn't increase in level or we hit a dead end or

24239.628 -> whatever reason, we need to mark the selected edge as invalid so we can prune it and future

24246.498 -> iterations. This is exactly what the next at plus plus line does, which gets executed

24253.02 -> after the iteration of the loop. It increments the index of the edge take at the current

24259.62 -> node. If we scroll down to the main method, you see that I show you how to set up a flow

24265.44 -> graph by initializing the flow solver and pushing some flow through the graph. In particular,

24271.748 -> this is the flow graph from the slides last video. So you can verify that the maximum

24277.43 -> flow we get should be 31.

Source: https://www.youtube.com/watch?v=09_LlHjoEiY