Learniverse

Datenstrukturen in Python

00:00

This is a beginner-friendly introduction to common data structures and algorithms in Python.

00:06

This course is top-by-acquashNS, the co-founder and CEO of Jovian.

00:12

Data structures and algorithms in Python is a practical beginner-friendly

00:16

encoding focused online course that will help you improve your programming skills,

00:21

solve coding challenges and ace technical interviews.

00:25

You can also earn a verified certificate of accomplishment by completing this course.

00:30

Learn more in register at pythondsa.com.

00:33

The course runs over six weeks with two hour video lectures every week with live interactive

00:39

coding using the Python programming language. You will get a chance to practice and improve your

00:44

coding skills with weekly programming assignments consisting of real interview questions and

00:50

you will also build a course project that you can showcase on your resume or LinkedIn profile.

00:55

This is a beginner-friendly course and some basic programming knowledge will help you follow along

01:00

with the course. Don't worry if you're new to programming, you can learn it as you work on this

01:04

course with a little extra effort. You will also get to access the course community forum where you

01:10

can ask questions, participate in discussions and share what you're working on during the course.

01:15

The course is created by Jovian, a platform for learning data science and machine learning

01:20

with a global community of tens of thousands of learners from over 150 countries.

01:26

I'm your instructor Akash, co-founder and CEO of Jovian and I'm really excited to

01:30

kick off this course with you. Register now and invite your friends to join the course at pythondsa.com.

01:44

Hello and welcome to data structures and algorithms in python.

01:48

This is an online certification course brought to you by Jovian and today we are at lesson one

01:55

binary search, link lists and complex idea analysis. My name is Akash, I'm the CEO and co-founder of

02:04

Jovian and I will be your instructor. You can find me on Twitter at at Akash and is.

02:11

This course runs over six weeks and over the six weeks if you enroll with for the course

02:16

work on four programming assignments and build a course project you can earn a certificate of

02:21

accomplishment. Along the process you will also learn about common data structures and algorithms

02:26

in python and how to use these skills to is coding interviews and technical assessments.

02:33

So let's get started then to begin we need to go to the course website pythondsa.com.

02:40

See if you open up pythondsa.com in your browser that will bring you to this page. This is the

02:45

course page and you can watch an introductory video about the course here. You can enroll for the

02:49

course for free. You will need to sign in into Jovian. You can use your Google GitHub or email

02:56

to sign in into Jovian. And once you're enrolled into the course, you can also invite your

03:01

friends to join the course. The course is still open for enrollments. So please invite your friends

03:06

and colleagues. This course is a beginner friendly introduction to common data structures and algorithms

03:14

and this course will help you prepare for coding interviews. We have coding focused hands on

03:19

video tutorials every week. So you can either follow along with this video you can pause and run

03:25

the code as we speak and you can practice coding on the cloud or you can watch the video right now

03:31

and you can practice later. In this course we will solve questions from real programming

03:38

interviews and you can earn a verified certificate of accomplishment. So let's go to lesson one binary

03:46

search link lists and complexity. On the lesson one page you can see a recording of the lesson

03:53

once it is completed and you will also be able to see a Hindi version here and all the code used in

04:00

this lesson is linked below. So the first set of code that we will look at today is called linear

04:08

and binary search. So let's open it up. So this is the first tutorial that we will work through

04:14

in this lesson and you will be able to work through it as well and this is part one and there are

04:21

a total of 12 notebooks or 12 tutorials we will go through. Now the course assumes very little

04:30

background in programming and mathematics but you still do need to know a little bit for instance

04:35

you do need to know basic programming with Python things like variables, data types,

04:39

loops and functions and don't worry if you don't know them already you can click through and follow

04:44

these links. Each of these is a separate tutorial the tutorial will take you about

04:49

half an hour or so each of these and you can learn the basic programming with Python in just a

04:54

couple of hours. You will also need to know some high school mathematics and if you want to brush

05:00

up things like polynomials, vectors, matrices and probabilities you can click through and read

05:05

these. But no prior knowledge of data structures or algorithms is required you do not need to have

05:11

an extensive coding background. We will cover any additional mathematical and theoretical concepts

05:17

as we we need as we go along. So how to run the code what you will see here is the some

05:25

explanations and then you will also see some code so you can see here that there is some code written

05:29

here and there is some function so the library is imported and a function from the library is used here.

05:36

Now to run the code you have two options you can either run this code using free online resources

05:42

which is what we recommend or you can run it on your computer locally and you can read these instructions.

05:47

I'm going to use free online resources provided by Jovian so we just scroll up here at the top of

05:53

this page and click run and then click run on binder. So this will take a second or two

06:00

and what we're doing here essentially is setting up a machine for you on the cloud using

06:05

a software called binder. It's an open source software and now what you were looking at here

06:10

this was actually not a blog post this is actually something called a Jupiter notebook. A Jupiter

06:17

notebook is something that can not only contain explanations but can also contain code and you can

06:25

look at the code and it's outputs right here in an interactive fashion. So if I scroll down here

06:30

you can see that we have all the same content that we were looking at except this time we can actually

06:35

run this code. So we can click the run button here and the run button will run the code and here we

06:41

click the second run button and that is going to run the second line of code. Now we will be using

06:47

Jupiter notebooks extensively throughout this course because Jupiter notebooks are a great way to

06:52

do interactive programming you can change the code for example instead of a mad dot square root you can

06:58

use mad dot seal and you can change the value here. So Jupiter notebooks are great for experimenting

07:05

with code. Now just a couple of tips that you want to do as soon as you run a Jupiter notebook

07:12

you can click on kernel and click restart and clear output. What this will do is this will

07:19

remove all the pre executed outputs from your code. So you can now see that the output of the

07:25

function is gone and you can see that the numbers here go away. So now you can execute the code

07:30

line by line yourself and see the output discover the output. And then one other thing you can do

07:39

if you want to hide the UI a little bit is to toggle the header and also toggle the toolbar.

07:45

Now you might need the toolbar for the run button but there's a tip here instead of pressing

07:50

the run button you can use shift plus enter. So if you press shift plus enter that will execute

07:55

a cell and that's a pretty handy shortcut. So once again you go on the lesson page

08:06

on the lesson page you will find a link to the notebook called linear and binary search.

08:12

On the linear and binary search you can read the explanations but you can't run them to run the code

08:16

you need to click run and then select run on binder and clicking run on binder will

08:22

set up a cloud machine for you and all the code that you see here will get executed on the cloud.

08:29

So you do not need to set up anything on your computer you do not need to download anything

08:33

we've done all that for you. So let's get started then. This course takes a coding

08:40

focus approach towards learning and in each notebook or each tutorial we will focus on solving

08:45

one problem and then learn the techniques algorithms and data structures to device

08:50

and efficient solution for that specific problem we will then generalize the technique and apply

08:55

to other problems. So in this specific tutorial we will focus on solving this problem and here's

09:01

the problem we're solving and this is a typical problem that you will come across in a coding

09:06

challenge or a coding interview. So here's how the problem goes Alice has some cards with numbers written

09:12

on them and then she arranges the cards in decreasing order and lays them out face down in a

09:17

sequence on a table. So this is what it looks like. These are cards each of these cards has a number

09:23

below it and the numbers are in decreasing order. She challenges Bob to pick out the card

09:28

containing a given number. For example she could say Bob I want you to pick out the number 7

09:34

by turning over as few cards as possible. So this is a puzzle that's given to us and we're not

09:40

told how many cards Alice has. So you need to write a function to help Bob locate the card.

09:45

So Alice can put down any number of cards and the target number that Bob has to pick out could be

09:51

anything. So we have to tell Bob not not the solution for a specific problem but a general

09:57

strategy that he can use to turn over as few cards as possible. So for instance look at these

10:04

7 cards and maybe put some imaginary imaginary numbers before them below them and try to figure

10:10

out a strategy try to start thinking about the problem. And this may seem like a simple problem

10:16

especially if you're familiar with the concept of binary search but the strategy and technique

10:20

that we're learning here will be widely applicable and we will soon use it to solve harder problems.

10:27

Now before I'll think about the problem and before we start solving it I just want to talk about

10:32

why you should learn data structures and algorithms and whether you're pursuing a career in

10:36

software development or data science it's almost certain that you will be asked programming

10:40

problems like reversing a leg list or balancing a binary tree in a technical interview or

10:45

coding assessment. Now it's well known that you never face these problems in your job as a software

10:50

developer so it's okay to wonder why such problems are asked in interviews and they're asked

10:55

because they demonstrate the following traits and these are very important traits for a programmer.

11:01

Number one is that you can think about a problem systematically and then solve it systematically

11:06

step by step two and the number two is that you can envision the different inputs and outputs

11:11

in edge cases for your problem because programs when you put them out in the while as part of software

11:16

can encounter any kind of inputs and as you have thousands or millions of users you will encounter

11:23

any and every possible input and often this has many security implications it can take down the

11:29

server it can take down your application or you can have a loss of data or loss of property.

11:36

You can communicate your ideas clearly to co-workers that's a very important part of

11:40

problem solving and most importantly you can convert your thoughts and ideas into working code

11:46

and the code should also be readable to other people. So it's not really the knowledge of specific

11:51

data structures or algorithms that's tested in an interview but it is your approach towards

11:55

the problem. So you may fail to solve the problem but you may still clear the interview or

12:00

vice versa you may solve the problem and still not clear the interview. So in this course we will

12:06

focus on the skills to both solve the problem and to clear interview successfully. So that's why

12:13

you need to learn data structures and algorithms. So coming back to the problem at hand now you

12:20

get the problem and you may have been thinking about it and maybe you have some ideas on how to

12:24

solve it and your first instinct might be to just start writing the code for it but that is not

12:29

the optimal strategy and you may actually end up spending a longer time to solve the problem due

12:34

to coding errors or you may not be able to solve the problem at all. So what we are going to cover

12:39

here is a systematic strategy that you should apply in interviews or in coding a problems

12:45

on encoding assessments or in general whenever you're faced with a problem like this.

12:51

So here's a strategy that we will apply. Step one, state the problem clearly,

12:56

identify the input and output formats. Step two, come up with some example inputs and outputs

13:02

and try to cover all the edge cases. Step three, come up with a correct solution for the problem.

13:09

It can be as simple as possible and state it in plain English. Step four and this is a

13:14

step that is optional sometimes. Implement the solution and test it using example inputs

13:19

and then fix any bugs in your first solution. In step five, analyze the algorithms complexity

13:26

and identify any inefficiencies and finally step six, apply the right technique to overcome

13:32

the inefficiency and then go back to step three which is come up with a new correct solution

13:37

which is also efficient then implement the solution and analyze the algorithms complexity.

13:42

So this is the technique that we will apply over and over for the course of six weeks to many

13:47

different problems and applying the right technique is where the knowledge of common data structures

13:52

and algorithms comes in handy. So this is the method we'll be using. So let's jump into the solution.

13:58

Step one, state the problem clearly. Now you will often encounter detailed word problems

14:03

and coding challenges and interviews. They will go on for paragraphs and paragraphs. For

14:06

instance, here we are talking about Alice having a deck of cards and then shuffling them,

14:11

putting them out on a table, talking to Bob etc etc etc. The first step is to state the problem

14:18

clearly and precisely in abstract terms because computers don't understand people, computers don't

14:23

understand cards, computers understand numbers. So for in this case, we can represent the sequence

14:29

of cards as a list of numbers. So a list is a basic data structure and Python.

14:35

And the turning over of a specific card is equivalent to the accessing of the value of the number

14:41

at a certain position in the list. For instance, if we think of this set of cards being represented

14:48

by this list, you can see here that this list is sorted in decreasing order. Then turning over a

14:53

certain card is equivalent to accessing that specific element from the list. So turning over card

14:58

number two or as we say in computer science, card number one because this is card number zero.

15:04

And this is one thing that you might want to get into your head as well that whenever you're counting

15:09

always start counting from zero, otherwise you may turn it to many off by one errors.

15:14

So this is position zero and this is position one. So if you turn over the carded position one,

15:19

it is as good as accessing an element from a given list, which in this case will turn out to be 11.

15:25

So these are the positions in the list starting from zero. And now what we have to figure out

15:30

is how many elements do we need to access. So we need to access the minimum number of elements

15:36

to get to a particular element. So the problem can now be stated as follows. We need to write a

15:42

program to find the position of a given number in a list arranged in decreasing order.

15:48

We also need to minimize the number of times we access the elements from a list.

15:53

So we're finding the position of the given number seven and the position in this case is three.

15:58

And we want to minimize the number of times we access elements from the list. So if we go in

16:03

this direction for example, we would need to access 13, 11, 12 and finally we discover seven.

16:09

We come from this end, we may discover seven, six, five, four and finally we may discover seven. So

16:13

definitely coming from the left is better than coming from the right. But is that the best?

16:19

That's what we're solving.

16:20

Now once we've defined the problem and what you should do is you should try to write down the

16:24

problem in your own words and primarily this is for you to make it clear to yourself.

16:29

Either speak it out loud to the interviewer or write it down in your own words as short as

16:35

make it as short as as long as possible so that you clearly understand what's in it and then

16:40

come up with the inputs in the outputs. So there are two inputs here. There's the input cards,

16:45

which is a list of numbers sorted in decreasing order. And then the second input is a query,

16:50

which is a number whose position in the array is to be determined. And there is one output,

16:55

which is position and the position is simply the position of query in the list of cards. For

17:00

example, seven is that position three counting from zero of course. And as soon as you've written

17:05

the input and output out, you can now write what is called the signature of our function, which is

17:11

a structure of our function without any actual code inside it. So now we can call it death,

17:15

locate card with cards and query and the single statement inside it called pass because a function

17:21

in Python cannot have an empty body, you need to put in at least one statement. So you always put

17:26

in the past statement first because it doesn't do anything. There you go. So now we have framed our

17:33

problem in abstract terms. And now we have a function signature to work with. Now a couple of tips

17:41

here, this is something that interviewers specifically will look for, but also encoding assessments,

17:47

because your code is also shared with the company. So you may want to name your functions properly

17:52

and think carefully about the signature. For example, here, you should not call your function f1

17:58

or funk1 or f or something like that. It's better to call it locate card because that's what it

18:03

is doing. And the similar thing is true for variable names as well. Use descriptive variable

18:09

names. One because it's good for coding practice in second because as you work on the problem,

18:14

you may lose track of what a variable represents. For example, if you call this a and you call this

18:18

b. Now 20 minutes down the line, talking about the problem writing different lines of code,

18:25

you may forget what a and b represents. So please call them what they represent, even if it can

18:30

get a little long. And finally, if you're unable to come up with a function signature,

18:34

if you're unable to come up with a simple description, then discuss the problem with the interviewer,

18:38

if you're unsure how to frame it in abstract terms. So keep that in mind, and this is really the

18:43

first and most important step, which is stating the clarifying the problem statement and stating it

18:49

clearly. Do not start coding before you have done this. Otherwise, you may get halfway into the

18:55

code and realize that you have not understood the problem at all. So step 2, now we will come up with

19:03

some examples. Example inputs and outputs and our goal will be to cover all the edge cases.

19:12

So before we start implementing a function, we want to have some examples. So that once we

19:17

implement it, the first thing we want to know is it correct? And in general, the answer is no,

19:22

because coding, especially when you get getting started is hard because you have to think about

19:27

many different scenarios. So and especially, especially interviews or coding assessments are also

19:33

stressful situations. So you may not be able to focus and think about all the different things that

19:38

you need to keep in mind. So simplest way to reduce the risk of going wrong is to use

19:45

test cases. So here's one test case that we came up with. We what we've done is we've taken the

19:52

information that we had listed above in the inputs and outputs and we've written it in as code.

19:57

So now we have a variable code cards, which is the list of cards, a list of numbers. Then we have

20:02

a query, which has the value 7 and then we have the output, which has the value 3. So the expected

20:08

output from the function is 3. And once we have a test case, you can test your function at any point

20:14

anything you want to test. You can simply pass the input, for example, cards and query into the

20:19

locate card function and get back a result. And you can see here right now because there's nothing

20:25

inside the function, the result you get back is none. But later you'll start getting back a proper

20:30

result from your function. And what you can then do is you can compare the result with the output

20:34

of the test case. So in this case, when we compare them, obviously the output is 3, the result is none,

20:39

we get back false. Now one thing we will do in this course to make testing easier because we will be

20:47

testing our algorithms again and again as we keep improving them is that we will represent our

20:52

test cases as dictionaries. So here for example, this this test case will be represented or

20:58

every test case will be represented as a dictionary containing two keys input and output. And the

21:04

input will contain one key for each argument to the function. So if your function arguments are called

21:11

cards and query in the function signature and that's why we wrote down a function signature first

21:16

so that we don't get confused here. So if your function arguments are called cards and query,

21:22

then we can take one key called cards and put the value of cards there, one key called query,

21:29

put the value of query there and then in the output we simply contain, we simply put the output

21:34

that we expect from the function. And now you can test the function like this. So how you might want

21:40

to test it first is maybe actually passing values like this. So you have test input cards and

21:46

then test input query. But there's a trick here whenever you have a dictionary. So here we have

21:52

a dictionary with two keys and we want to pass these two keys as two arguments to a function.

21:57

So we want to pass cards as the cards are given to the function locate card and query as a

22:02

query argument to locate cards. What you can do is you can simply put the dictionary itself

22:08

and just write star star. Now if you write star star what Python does is it takes the key

22:14

from this dictionary and the values are then used as arguments for parameters with these names.

22:22

So there we are now calling locate card on test input and we can compare it with test output.

22:28

And you can see that we get back false. So that's one test case for us. But is that enough?

22:34

Is that enough for you to now start writing code? Probably not because out in the while your

22:40

function should be able to handle any number of any set of valid inputs every pass into it.

22:45

And here are some possible variations that we might encounter and it really helps to list them.

22:50

In fact while I was writing these variations I realized that there are many cases that I had not thought of.

22:56

So even after coding for 12, 15 years almost I still find it really useful to list out all the

23:03

scenarios that we can find our input in. So the simplest scenario is that the query occurs

23:10

somewhere in the middle on the list of cards. This is what you imagine when you read the question.

23:14

This is what is called the general case. But then there are some special scenarios as well.

23:19

What if the query is the first element in cards? And what if the query is the last element in cards?

23:26

What if the list cards contains just one element which is the query itself? Or and this is

23:33

something that I had not thought of. What if the list cards does not even contain the number query?

23:38

What if Alice is bluffing? So what should be Bob's strategy then to figure out that the number

23:45

does not exist? What if the list of cards is empty? And what if the list contains repeating numbers?

23:52

This is again another interesting thing that may not come to mind because we said a list of numbers

23:57

and we did not specify that the numbers are unique. So the list can contain repeating numbers.

24:01

And finally what if the number the query itself occurs more in more than one position in cards?

24:08

So those are eight cases that I could think of. And this see if you can think of any more variations.

24:15

And it's likely that when you first heard the problem, you did not think of all these cases.

24:21

Because you often tend to just focus on one generic case. It's hard to hold too many cases in

24:26

mind. And that's why it helps to list them down actually right them down. In a coding interview or

24:31

in a coding assessment or an interview, you may want to put this in comments. If you have a

24:36

page coding page, you can just create a comments and list out all the test cases.

24:41

And some of these, especially things like the empty array or query not occurring in

24:46

a cards are called edge cases because they represent rare or extreme examples. And while edge

24:51

cases may not occur very frequently, your program should be able to handle edge cases. Otherwise,

24:56

they may fail in unexpected ways or somebody with the with malintensions can use the edge cases

25:05

to hack your software. So let's create some more test cases for the variations that we've

25:11

listed. And we'll store all our test cases in a list for easier testing. So here we are creating a

25:17

list called tests. And this time we will create all our test cases in the format that we

25:22

discuss, which is a dictionary format. And we will keep upending them to our list. Now if you

25:28

do not understand lists and dictionaries and upending, then you can go back and review some of the

25:33

basic material on Python, which is linked at the top of this notebook. So first we take the one

25:39

test case that we already have. We put that and we take maybe one more example of the query occurring

25:43

somewhere in the middle. So here you can see this is the card list and that the query one occurs

25:48

somewhere in the middle, although it's closer to one end. Then here's one case where the query is

25:53

the first element. Four and the output obviously the output expected is zero. Here's one case where

25:59

the query is the last element minus 127. And this is another thing the numbers could be negative

26:05

as well. Something you may want to keep in mind. Here's another one where the card contains

26:12

where cards contains just one element the query itself. Now the problem does not state what to do

26:19

if the list cards does not contain the number query. And you may often face these questions where

26:26

it may not be clear what to do in a certain situation or if a certain situation can occur.

26:30

And when you have questions like this, this is a process you should follow. Step one read

26:35

the problem statement carefully or ask the interview to repeat the question. So read the problem

26:40

statement carefully and you will often find hints and sometimes these hints are just single number

26:45

single words somewhere. Often you will also find some examples provided with the problem.

26:51

You will also find if you scroll down to the bottom, you will find some conditions,

26:54

you will find limits on what the numbers can be, whether they can be integers or can be

26:58

decimals, whether they can be negative or positive. So it's important to read the problem carefully

27:02

before you start coding and look through the examples. And then ask the interviewer or maybe

27:09

post a question on the platform for a clarification. Often it happens that interviewers because

27:14

they take so many interviews, they may forget to specify a certain detail. And or they might

27:18

expect you to ask the question because you should not be coding with an insufficient requirement.

27:23

So to clarify the specifications of the problem is very important. So if you have any doubt

27:28

ask the interviewer, even if you are somewhat sure about it but just want to verify, it's a good

27:34

idea to ask. Then finally if you are done with all of these and you still do not have a solution,

27:42

then you just make a reasonable assumption stated and move forward. So we will assume that

27:48

a function will return minus 1 in case cards does not contain query. So if cards does not contain

27:54

query, then we return, we expect the equation to return minus 1. Now here is one of the case where

28:03

the cards erase empty and obviously then it does not contain the query as well. And finally there's

28:08

one last case which is the number itself can repeat in cards. Numbers can repeat in cards

28:15

and then the query itself can repeat in cards. So here the query does not repeat,

28:19

three does not repeat but the numbers on the in the cards are a do repeat and the last case is

28:26

when the query itself repeats. So you can see here in cards the query occurs many times. Once again,

28:33

it is not specified what to do here and sometimes it may be okay sometimes the problem statement

28:38

may just say that return any one position but more likely than not what you will want to do is you

28:44

may want to make it more deterministic and that will also make it easy for you to test the function.

28:50

So what we can say we can impose this additional restriction that we will

28:55

expect our function to return the first occurrence of query and that will make it easier for us to test

29:01

so that when we when we're testing a problem we we know that if we're getting a failure it's not

29:06

because of multiple possible answers it but it's because of some issue in our code right.

29:14

So you want to get good feedback from failures and that's why you want your tests to be deterministic.

29:19

So here is the final test and now we can see the full list of test cases.

29:25

So now we can see the list of test cases here. So you have about eight or 10 test cases here.

29:34

You may not need to create this many test cases in an interview or a coding assessment

29:38

depending on how much time you have but you should create at least a few at least cover

29:43

the three or four edge cases a good number to aim for would be five and this will

29:50

not only help you in the coding interview help you solve the problem this will also be

29:54

appreciated by the interviewer because it shows that you're thinking about the problem.

30:00

So definitely take a minute or two. Now we've spent 10 15 minutes talking about this but once you

30:06

start applying this technique over and over you will see that you will start creating test cases

30:10

in seconds. So as soon as you read the problem and you state the problem find the

30:16

find the input format find the output format write a function signature and write the test and then

30:21

you will start working on test the ideas will automatically start coming to you and within maybe

30:26

two or three minutes you will be done with both all two both of these steps.

30:32

So great we now have a fairly exhaustive set of test cases and creating test cases before

30:39

hand allows you to identify different variations and edge cases and sometimes it may happen that

30:43

you may have no clue how to work on the problem you may feel completely confused but if you

30:48

simply start writing multiple test cases and start looking at them like literally just staring

30:53

at the test cases the question and the answer the solution will reveal itself to you.

30:58

So don't underestimate the power of writing things down and don't stress it don't stress out if

31:05

you can't come up with an exhaustive list of test cases because this takes time it's a skill that

31:10

you cultivate with time. So what you can do is you can list out maybe the test cases that come to

31:14

your mind right now and put them in a single place and keep coming back whenever a new test case

31:19

comes to mind while coding or while discussing or while analyzing you can just come back to the

31:24

same place and write down the test case. The important thing is that you have a single place where

31:29

you're listing all test cases. So we've written our test cases now and now we can come up with

31:35

a correct solution and how do you come up with a correct solution not by writing code but for

31:39

I first stating it in plain English. So your first goal and by correct we do not mean the best or

31:48

the most efficient solution. First we want to solve the problem. We want to figure out where

31:54

the particular number lies in the list and not to minimize because that's solving two problems

32:00

at once and sometimes that can get tricky. So first aim for correctness then aim for efficiency

32:06

and the simplest or the most obvious solution which almost always exists and is almost always

32:12

very easy to see involves checking all the possible answers and this is also called the

32:18

brute force solution. So in this problem coming up with a brute force solution is quite easy.

32:23

Bob can simply turn over the cards in order one by one till he finds the card with the given

32:28

number on it. So this is what this is how it might work. If we want to implement it in code and this

32:34

is where writing it in your own words becomes important. So we create a variable called position

32:41

inside a function with the value 0. Then we check that the number at the index position in the

32:46

card list equals query or not. Now if it does since we are starting from the beginning

32:52

if it does then position is the answer and we can return it from our function. But if it does it

32:57

then we simply increment the value of position by one and then we repeat the steps. So we go back to

33:02

step 2 and then we check whether the number at the index position on in cards equals query and once

33:09

again if it does we return position. If not we increment the position once again and repeat and we

33:15

repeat that till we reach the last position. And if the number was not found we return minus one.

33:20

So it's a simple 4 5 step description. It doesn't take very long. You can either say it out loud

33:26

to the interviewer. They will also appreciate it that they will know. You know you may know that

33:31

you know the brute force solution and you may not say it because it seems too simple or obvious.

33:36

But the interview does the interviewer doesn't know that. So it's important to state the

33:41

brute force solution. You may say that the brute force solution is fairly straightforward

33:45

and it goes like this. Steps 1, 2, 3, 4, just take 30 seconds. But at the very least it informs

33:50

the interviewer that you're able to think of some solution. And it happens very often. I've seen

33:57

it in interviews where 30, 40 minutes have passed out of 45 minutes and not a single solution

34:03

has been proposed so far even though many lines of code have been written. So it's important

34:06

to state your solution and if you state your solution the interviewer will also help you and

34:10

correct you as you go forward. So it is a collaborative experience. It is a discussion. So use that.

34:16

And if you are in a coding assessment you may just want to write out a few comments.

34:21

And what we've implemented here is congratulations. It's just our first algorithm. And then

34:27

algorithm is simply a list of statements, a list of steps that can be converted into code and

34:34

executed by a computer on different sets of inputs. So this particular algorithm is called

34:44

linear search because it involves searching through a list in a linear fashion element by element.

34:55

So now we're ready to implement the solution and just a quick tip as I've already said always

35:03

try to express the algorithm in your own words and it can be as brief or as detailed as you like

35:10

and don't underestimate the power of writing. Writing can be a great tool for thinking.

35:13

It's likely that you will find that some part of the solution is difficult for you to express.

35:17

And that simply suggests that you are probably unable to think about that part clearly.

35:22

So the most more clearly you're able to express your thoughts. The easier it will be for you

35:26

to turn it into code and you will not have to come up with a strategy while you're writing the code.

35:32

So you can focus on coding and focus on avoiding errors. And that brings us to the next step.

35:37

Implement the solution. And then test it using the example inputs. Now you can see how everything

35:44

comes together. We've already know what the function signature looks like, what the inputs look like.

35:50

We already have some test cases and through the test cases we've also identified what are the

35:54

different H cases we need to handle. And we've already written out a description or of description of

35:59

what the algorithm looks like. And in fact, what you can do is you can simply write out

36:03

comments within your function as the English description and then you simply need to fill out

36:08

code for those comments. So for instance, here are the five steps that we are just written down.

36:13

Create a variable position with the value 0 set up a loop. Check if the element is matches the

36:17

query. If yes, the answer is found. If not increment the position and then go back and then check

36:24

if we freeze the end of the array. If we have then we return minus one. So the code now is

36:31

pretty straightforward. Now we create the position variable 0. We set while true. So while true

36:37

kicks off a loop and we just want to first set up a loop and then we can break out of it when we

36:42

need to. Then we check if the element at the value position matches the query. If it does

36:49

where it done the position. If it doesn't. So if it doesn't, then this we come to this part.

36:54

If it does, then the function exists and none of the code gets executed. But if it doesn't,

36:59

then we increment the position. And then we check if we have reached the end. Now if we have

37:03

reached the end, obviously, we don't want to continue. So we can simply return minus one and exit

37:09

the loop and exit the function itself. But if it, if we have not reached the end, then we go back to

37:15

the top of the loop and now position starts out with value 1. So we check value 0 1 2 3 so on

37:20

up to the end of the array. Simple enough. Great. So now we have our first function. And let's test

37:27

our function with the first test case. So here's our test case once again. And we can simply

37:33

call locate card with the test input and the cards in the query. And this is the result we get.

37:41

And you can already see that the result matches the output. And that's why when we compare them,

37:44

we get the value true. So yeah, the results match the output. And because this is something that

37:50

you should be doing very often in this course, we have put together a small function for you

37:55

within the Joven Python library. So the Joven platform also offers a Python helper library.

38:00

That is that contains some utility functions. So we put together a small function for you

38:06

called evaluate test case. And you can write it on your own as well. But you can use this library

38:11

so let's install the library. We will install the Joven library using pip install Joven minus minus

38:18

upgrade. And then from Joven. Python DSA. So Joven is the name of the library. And then inside

38:24

the Joven library, since we have many courses, the Python DSA course, the utilities for this course

38:29

are present inside the Python DSA module. From that module, we import the function evaluate test case.

38:36

And finally, we can call evaluate test case. And then we can give it the function that we want

38:41

to test. So we want to test the locate card function. And the test case, the test case needs to be

38:45

defined in this format. So all it is going to do is it is going to pick out the input,

38:51

pass it into the function, get the output, compare that output with the expected output. And also

38:57

print some information for you to see. So here's what it does. It prints out the input.

39:04

It prints out the expected output. It prints out the actual output. It prints the execution time.

39:11

And this is something that will become important later and tells you whether the test has

39:15

passed or not. So it's nice to have this, because we don't have to look through the output

39:21

and import and compare them, especially when you're in a situation where you need to think fast.

39:27

It's helpful to create a small function that can just print pass or fail for a test case.

39:33

So now while it may seem like we have a working solution, because our test cases passed,

39:38

we can't be sure about it until we test the function with all the test cases. So for doing that,

39:44

we can use the evaluate test cases function. So just as you have evaluate test cases,

39:48

you have evaluate test cases. Also part of the joven library. And you can call evaluate test cases

39:55

with the same function locate card. And this time pass it a list of test cases. Each of the

39:59

test cases is a dictionary. Again, you don't have to use this function. You can simply put

40:05

things into a loop. So you can always just do four test intests. And then simply call evaluate

40:11

test with locate card and test. Or you can even just directly call locate card with

40:20

the test inputs and the test out and compare the output with the test output. So you can do this as well. And you can simply print that.

40:36

So here's a simple way to do this. What we are doing here.

40:42

But what we'll do is we'll use the evaluate test cases function because it prints out a lot of useful information for us.

40:48

So now you can see that it prints out case case by case. Now test case 0. We have input expected

40:54

output actual output. The case has test cases passed. That's what we saw it. In fact, it's the same test

40:59

with just did. Test case 1 passes as well. Test case 2 passes. Test case 3, 4, 5, 6. Okay,

41:06

all of them are fine. Okay. Test case 6 seems to have caused an error. So here is the error.

41:11

It says list index out of range. So that's okay. It's perfectly all right for your functions to

41:17

encounter an error. So the first thing they're most important thing is not to panic. In fact,

41:23

it's a good thing that we know exactly where the function is failing. If you look back here,

41:27

you can see what the issue is. And then we'll see how to fix the error. But one good strategy to

41:33

approach this is to keep in mind that there will always be bugs in your code. And approach

41:40

writing code not with the assumption that your code will be correct. But go with the default

41:45

assumption that your code will be wrong. That there will be issues. What that lets you do is

41:50

one, you do not feel demotivated or you do not panic when you see an error. And second,

41:56

you then tend to be a little more careful while actually writing the code. So the way you should

42:01

be writing code is every time you write a line of code, you should be asking yourself,

42:05

how can this line of code go wrong? Or in this particular case, how can cards position

42:11

equals equals query in a if statement, go wrong and throw an error. And let's look at it.

42:17

One easy way to check this is to add what is called a logging or what is called printing the

42:23

information inside a function. So we'll just rewrite a function. In a locate card function,

42:29

we will put in cards, we will put in query. The exact same function that we have, we'll set the

42:34

position. But before we create the value, we'll simply print the cards in the query. So just for our

42:39

information, just so that we can see what the function is working through, we can get some

42:42

visibility into the function. We print out cards and query. And then while true, so this is the same

42:48

loop. At the beginning of the loop, we will print out the position that we are tracking.

42:53

So let's do that. We've simply added some print statements and this print statement will

42:58

give us an insight into the inner working of the function. Now if you do not put in a print statement,

43:03

then you will have to work it out yourself by reading the code and executing it in your head.

43:07

It's always easier to just print all the information and then print it nicely, just say cards

43:12

and query. You know, we could also have done this without saying cards here. But then that would

43:17

make it a little harder to read than that would be more cognitive overload apart from already

43:22

dealing with the stress of solving an error. So just add nice, pretty print statements to make it

43:28

very obvious what we're printing. So let's see now, let's get the test case out. So let's get from

43:35

test 6, get the input, get the cards, get the query as well and pass it into locate card.

43:41

And now we see that initially the cards array is empty in a query 7 and the position is 0.

43:48

And then we encounter an error. We encounter the error list index out of range on the line cards position

43:56

equals equals query. And now at this point, it should be fairly obvious what the issue is.

44:01

The issue obviously is that we have an empty list and empty list has no elements,

44:06

but we're trying to access the position 0 which is in normal human condition, the first element

44:11

of a list. But there is no first element to access and that is why we get the error list index

44:17

out of range. So this is very important whenever you get an error, do not try to start looking at

44:22

the code first, just try to understand the error first. And if you're unable to understand the error,

44:28

just add some print statements. There are tools like debuggers that people use, but I personally

44:33

in 15 years haven't used a debugger. Maybe use it a couple of times, but I don't know how to use it.

44:39

Print statements are really simple, you just put them in, chuck them into the function, wherever

44:43

you need them as many print statements as you need with nice clear messages, make it very obvious.

44:49

And that will almost certainly solve the issue for you. So the card's array is empty, we cannot

44:55

access position 0. So what's the solution here? The solution obviously is that before we access

45:00

anything from a list, we need to make sure that we can access that list. And this is the way to do

45:08

it. So now we've written a function slightly. We once again start out with position 0, but this time

45:14

instead of putting in a while true, instead of assuming that we can access the zeroes element

45:18

of the list, we say that the position should be less than the length of cards. Now if you have

45:25

a card's list of n elements, the indices go from 0 to n minus 1. Or in the case of zero

45:32

elements, there are no indices to access. So the position has to be less than the length of

45:38

cards for you to be able to access it. And in this case, the length of the cards will be 0. So

45:42

0 is not less than 0. So the while loop will not run at all and we will directly return minus 1.

45:48

But if the card does have elements, then we can check the element at the value position

45:54

compared to the query and return the position. If it does not match the query, we can increment the

46:01

position. So that was a fairly straightforward fix, easy save. So let's test the failing case again.

46:10

Great. So it looks like the failing case is now passing because we have output minus 1 and

46:16

the expected output matches the actual output of the function. Minus 1 because the query does not

46:21

exist in the array which is empty of course. Now this is not enough. Every time you make a change

46:28

to the code, you want to go back and test all the test cases because what happened is while

46:33

fixing one error, you may introduce another error. And that is where having a good set of test

46:38

cases is very important. So let's run evaluate test cases once again. You can see here this time that

46:44

all the test cases are passing. And it's just nice to it just makes you feel good as well.

46:51

Makes you feel motivated as well to see that a bunch of test cases are passing.

46:58

Now in a real coding assessment or a real interview, you can probably skip the step of implementing

47:04

and testing the brute force solution in the interest of time because it may take about 5 to 10

47:09

minutes to implement the solution and then if you have errors in the solution, it may take

47:13

some more time to fix those errors. So it's generally quite easy to figure out the complexity which

47:20

we'll talk about in second of the brute force solution from the plain English description. And that

47:24

is why you should first state it and plain English with which only takes around 20 seconds or so.

47:30

And the computer doesn't throw errors at you for speaking. So you can just state the plain English

47:36

description and move on to talk about the complexity and start optimizing it. But while you're

47:41

practicing always, always implement the brute force solution too. And there's an important

47:47

reason why you should know how to implement the brute force solution because in case you're not

47:51

able to figure out the optimal solution to the problem, you can still go back and implement the

47:56

brute force solution and in a lot of cases that's okay. Sometimes interview us ask hard questions

48:00

just to push your boundaries a little bit. But if you're unable to figure out the optimal solution,

48:06

then they will allow you to implement the brute force solution. So that is why you should state it

48:11

and that is why you should know how to implement it. Okay. So we're done with so we're done now with

48:22

the implementation of our brute force or simplest solution and now we need to analyze it.

48:28

And this is where we'll now learn about what is called the complexity of an algorithm.

48:34

What does it mean? Now recall the statement from the original question,

48:37

Alice challenges Bob to pick out the card containing the given number by turning over as few

48:42

cards as possible. But right now what we're doing is we can simply turn over cards one by one.

48:47

And before we talk about what does it mean to minimize the number of times we turn over cards

48:52

or the number of times we access elements, we need a way to measure it. And let's think about it.

48:58

You know it's as simple as just thinking about it. Since we access the list element once in every

49:03

iteration so here's our code, our code is pretty straightforward. And this is where we are accessing

49:08

an element from the list. So since we access the element, since we access the element once

49:18

in every iteration for a list of size n, we access the elements from the list up to n times

49:25

because we may have to access this element and then this element and this element and so on.

49:29

So Bob may need to overturn up to n cards in the worst case to find the required card.

49:37

Now let's introduce an additional condition that suppose Bob is only allowed to overturn one card

49:41

per minute. So that means it may take him 30 minutes to find the required card in the worst case,

49:46

if 30 cards are laid out on the table. Now is this really the best he can do?

49:51

Or is there a way for Bob to arrive at the answer by turning over just five cards and save

49:56

25 minutes instead of turning over all 30. And this field of study and by the way Bob in this case

50:05

is represented of what a computer does and a computer takes some amount of finite time to perform

50:11

each instruction. So each array axis actually takes some time although it's so fast that we do not

50:16

see it especially for small inputs. But this is something that will become increasingly important

50:21

as we go week over week where we see that we will start to see the limits of how long it takes

50:28

computers to solve certain problems. So the field of study concerned with finding the amount of time

50:34

or the amount of space or the amount of other resources required to complete the execution of a program

50:38

is called the analysis of algorithms and the process of figuring out the best algorithm to solve

50:43

a problem is called algorithm design and that is what we are doing here. We are actually doing

50:48

the analysis of algorithms right now and algorithm design next. So there are a couple of terms

50:55

we need to understand and then we will go back to writing code. First thing is complexity and the

51:00

second thing is the big one notation and both of these are terms that you will hear very frequently

51:05

in when you're talking about data structures and algorithms when you're talking about

51:09

coding interviews assessments. So these are terms that you need to understand and they're fairly

51:15

simple terms although the term itself is complexity but all it means is that the complexity of an

51:22

algorithm is simply a measure some measure of the amount of time or space required by an algorithm

51:30

to process an input of a given size. Example if you have a list of size n,

51:36

another complexity is the amount of time required or the amount of space required on the ram

51:41

to process an input of that size. Now, unless otherwise stated the term complexity always

51:47

refers to worst case complexity. So it's possible that the Bob turns over the first card and that is

51:53

the answer but we always talk about what is the longest or the highest possible time or space

51:59

that may be taken by the program to process an input right. So we need to design our programs

52:03

keeping the worst case in mind. Now in case of a linear search which is what we've implemented just

52:10

the time complexity of the algorithm is some constant C times n assuming n is the size of the

52:16

list n is the number of cards right. So now this constant C obviously depends on the number of

52:22

operations that we perform in each iteration. So in each loop for example we have four to five

52:27

statements and then the time taken to execute a statement on your specific hardware. Now if you have

52:33

a two gigahertz computer that may be twice as fast as a one gigahertz computer. If you're running

52:38

it on a phone it may be different. So the C captures all of these things. So information about

52:43

the number of specific operations that we perform in each iteration and information about

52:48

the actual hardware that you're running on. So C n is the time complexity and n is the size of the

52:55

input. So in some sense what we understand from this is that the time complexity is proportional

53:00

to the size of the input and that's the important part here. The constant you know it doesn't

53:04

change as you change the input the constant doesn't really change. Now similarly the space complexity

53:10

now since we're already given an array the additional space that our linear search requires

53:15

is simply a single constant when we are calling it C prime or C dash and it is independent of n.

53:21

So no matter how many no matter how large a list is given to you and the list is already present in

53:26

memory we just need to allocate one new variable called position and that variable is used to iterate

53:32

through the array and it occupies a constant space in the computer memory because we keep

53:36

going updating the variable. So the space complexity is C or constant it is independent of n.

53:43

Now what we do normally is to represent the worst case complexity we often use the big

53:50

own notation and in the big own notation what we do is we drop any fixed constants

53:55

and we lower the powers of the and we drop any fixed constants and we drop any lower powers

54:02

of variables. So the idea here is to capture just the trend just the trend of the relationship

54:08

between the size of the input and the complexity of the algorithm. For example if the time

54:13

complexity of an algorithm is some constant times n cube plus some constant times n square plus some

54:19

constant time n plus some constant where n is the size of the input in the big own notation we simply

54:25

say that it is order of n cube which is that you know in the long run in the if you just study the

54:31

trend it the trend will be some something which looks a little bit like the n cube function and

54:38

it may be offset by a constant or such. So putting it this way the time complexity of linear search

54:45

is order n because we just drop the constant C and the space complexity is order 1. So we again

54:51

drop the constant C prime and we'll see why it's okay to drop the constant sometimes you

54:57

may find that okay we are not exactly doing n iterations but we're doing n minus 1 iterations so

55:01

we drop the minus 1 sometimes you will find that we are just doing n by 2 iterations and that's

55:06

simply half a fan so we drop the half and you might wonder that okay that that might take twice

55:10

or 3 times the amount of time how why are we dropping that constant because that's probably an

55:15

important thing to keep in mind but we'll see we'll see soon as we implement our efficient

55:20

solution to the problem. So before we move forward before we optimize the algorithm we are just

55:28

going to save our work because this notebook as I mentioned to you is running on an online platform

55:35

we've set up everything for you you've not had to install anything but because thousands of people are

55:39

using this using this platform this will shut down this will not keep running forever and what

55:46

you need to do is you need to save your work from time to time and here is how you can save your

55:51

work and then pick it up everything happens on the joven platform you there's no need to download

55:55

anything although you could download it if you want but you there's no need to download anything

56:00

so all you need to do is use the joven library once again we've got another helpful function for you

56:05

so you say import joven and then run joven.com it so you run joven.com it and then give it a project

56:12

name the project name by which you want to identify this specific notebook and then there are some

56:20

other arguments is not too important so you can even skip this and that should be perfectly fine

56:25

so now when you run joven.com it we will capture a snapshot of your notebook from this online platform

56:33

or wherever it is running even if you're running it on your own computer we will capture a snapshot

56:37

of your notebook from your computer wherever it's running and we will upload it and give you a link

56:43

where you can access it so let's open up this link here so now you will be able to see this page

56:50

called python binary search and it will be on your profile and you can see you can scroll down and see

56:56

that it contains all the explanations and it contains all the code so this is a read only version of

57:01

the jupiter notebook so the read only version of the jupiter notebook obviously does not require us

57:05

to keep servers running so that you can run this code and when when you need to run it you know

57:10

your work is saved to whatever extent you have executed things and now when you need to run it you simply

57:16

click run and then click run on binder once again okay so and that is how you resume your work so

57:27

what this will do is this will set up a new machine for you and on the new machine it will post

57:31

the jupiter notebook and it will start up the machine for you open up the jupiter notebook and you will

57:37

be able to start running the code and not just you now you can make a notebook's public or you can

57:42

keep them private you have multiple viewership options so your public and private not just you but

57:47

anybody else you can take this link and tweet it out if there's an interesting problem that you

57:53

worked on you want to tweet it out you can just share this link online and anybody will be able

57:57

to read through your solution and they can run it as well right in fact the notebook that I've shared

58:03

with you is hosted on my profile so jupin is not just a platform for you to learn it's also a platform

58:09

for you to build a repository of projects now if you go back to your profile you click on your profile

58:13

look click on the jupin logo and you can see here that you will find a notebook step and in the

58:19

notebook step you will find all the notebooks that you have worked on in the past okay so anything

58:24

that you have committed using jupin.com it you will be able to resume working on it so that's that's

58:31

how you save your work and keep saving your work from time to time all you need to do is run

58:36

jupin.com it you do not even need to put in this project argument this is just something if you

58:40

want to actually give you a project a name otherwise the name will be picked automatically so just keep

58:44

running jupin.com it from time to time especially if you're leaving your computer for

58:49

half an hour or so then and your computer gets goes to sleep then this server will shut down and

58:53

you may lose your work coming back to a problem we've just implemented linear search and we

58:59

understood that it has the complexity of order n which is and that's why it's called linear

59:04

it runs in a linear time is another expression that is used it is also called linear because

59:09

we are going through the array step by step now the next step is to apply the right technique to

59:17

overcome this efficiency now of course we've not learned any technique saved but we can

59:21

probably figure it out if we think about it and maybe this is something that occurred to you

59:25

right at the beginning and the idea that occurred to you is something that we will now implement

59:31

so at the moment we are simply going over the cards one by one and not even utilizing the fact

59:36

that they're sorted and that's why our approach is pretty poor we're basically checking everything

59:42

so it's not a great solution but it would be great if somehow this would be the best case

59:48

if somehow Bob realized somehow Bob could guess the card at the first attempt that would be perfect

59:56

then that would be an order one that would be a constant time solution but with all the cards

01:00:02

turned over it's simply impossible to guess the right card now the next best idea is to maybe pick

01:00:08

a random card so maybe let's say Bob picks this card and this card turns out to be a 9

01:00:16

now Bob can use the fact that the cards are inserted order so if this card turns out to be 9 that means

01:00:21

all of these cards have numbers greater than 9 and the target card is 7 so the target card cannot

01:00:28

lie in this region so the target card has to lie in this region and just by picking a random card

01:00:35

rather than picking the first card Bob has eliminated four out of seven cards to be checked right so

01:00:42

with one check Bob has eliminated a total of five cards one to three four five and of course if

01:00:48

this number turns out to be seven perfect grade guess but even if it doesn't we've still

01:00:52

eliminated quite a few if this number turns out to be less than seven we've still eliminated three

01:00:57

cards so that's the basic idea here that we pick something not from the edges but somewhere in the

01:01:04

middle now what is the best place to pick something in the middle now obviously when we have

01:01:07

picking a card we do not know whether it is going to be less than or greater than the number that we

01:01:13

want especially when everything is closed so we it's best to just pick the middle card so that

01:01:19

whichever case turns out it turns out to be we're still left with as at most three cards to process

01:01:27

right so if you pick this card and it doesn't turn out to be seven you either need to look at these three

01:01:32

or you need to look at these three so that is the strategy we'll follow and this technique is

01:01:38

called binary search and why do it just once just keep repeating it so each time you pick the

01:01:44

middle card and you can eliminate half of the array and this is what the strategy looks like so here

01:01:50

we have the array and in the array we want to figure out the number six of this slightly different

01:01:55

problem but still decreasing order we want to figure out the number six so we access the middle

01:02:01

okay we compare it with six now it is not six okay it was a bad guess no problem but we know that four

01:02:07

is less than six so that means that six lies to the left of four so we've certainly eliminated half

01:02:14

of the array we've done one access and eliminated half of it and now we left with three numbers we

01:02:19

pick the middle number we get seven seven is greater than six that means the number lies on the right

01:02:27

now we're left with just one card we over turn that last card or we check that last number

01:02:31

okay it is equal to six great if it is not well nothing more left to check all the numbers here

01:02:38

are greater than six are less than six and all the numbers before this are greater than six so if this

01:02:42

number is in six then there's no six and just like that for an array of seven elements we have done

01:02:50

just three checks and arrived at the answer and that was the worst case right it means it will never

01:02:55

take you can verify that it will never take more than three checks if six comes at this position

01:02:59

we guess it immediately if six comes at this position or this position we guess it in two checks

01:03:05

and then if six comes at any of the other positions we will guess it in three checks so that's pretty good

01:03:13

and now the idea if you if you read this part it says apply the right technique to overcome the

01:03:19

inefficiency and then repeat the steps three to six so now we're going to go back to step three

01:03:23

which was come up with a correct solution for the problem and stated in plain English and we have

01:03:27

come up with a solution already we just need to state it so here is how this technique called binary

01:03:32

search is applied to the problem it's called binary because well we take a left and right decision

01:03:38

so first we find the middle element of the list if it matches the query number then we return

01:03:44

the middle position as the answer and if it is less than the query number then we search the first

01:03:51

half of the list and if it's greater than the query number then we search the second half of the list

01:03:55

so the exact thing that we saw here we apply it here and finally if no more elements remain

01:04:02

we simply return minus one so let's just save our work now let's from this point on we'll

01:04:07

keep saving our work from time to time using Jovi.com it so now we've come up with the algorithm and

01:04:12

you can again it's important to write it in your own words whether you want to write a short

01:04:18

description a paragraph or a step by step guide but write it in your own words and you'll do this

01:04:23

in the assignment so next implement the solution now and test it using the example inputs so here's

01:04:31

the implementation so what we'll do is we will look at once again let's go back to this visual

01:04:37

representation and we will keep a track of our search space so current initially our search

01:04:42

space is the entire array so that means we have an array of seven numbers so our search

01:04:47

space goes from position zero to position six and slowly we'll keep reducing our search

01:04:52

space over time so to keep track of the search space we will create two variables low and high

01:04:58

low will have the value zero which is it will point to the first position in the array and

01:05:03

high will have the value pointing to the last position last valid position in the array which is

01:05:11

which is land cards minus one so while low and then the while loop becomes very simple

01:05:19

because as long as we have at least one element in our search space we can go ahead now to have

01:05:25

at least one element in our search space the low value which is the starting index should

01:05:32

be less than or equal to the end value right so while low is less than equal to high because if

01:05:39

the starting index is higher than the end index basically we've exhausted and we've

01:05:44

covered the entire list and there's nothing more that we can search for so we should

01:05:49

exit at this point okay so now once we have once this condition is satisfied and it is initially

01:05:55

let's say you have seven cards low is zero cards is lend card minus one is six then you find the

01:06:01

middle position and you can get the middle position by doing low plus high divided by two and now

01:06:07

let's start applying that strategy here where we say that every time we write a line of code we

01:06:12

should think about how it can go wrong now if you write it like this low plus high divided by two

01:06:18

and think about how it can go wrong okay low plus high may not be divisible by two

01:06:22

and if low plus high is not divisible by two you may end up with a decimal number now if you do end

01:06:27

up with a decimal number in fact the division operator and Python always remains a retainer

01:06:33

floating point number then you cannot use it as an array index because we want to use this

01:06:38

as a position within the array so that's why we need the double slash which is the which is

01:06:43

the integer division which simply returns the quotient so we get the middle position and then we

01:06:49

get the number at the middle position so we also get cards made so we access that element from

01:06:54

the array now this is where it makes it easy for us to count the number of times we access

01:06:58

because here is one axis happening inside the list and there are no other excesses

01:07:03

then we get the mid number and a remember last time we faced an error and we had to add print statements

01:07:09

you might as well just add print statements right away so here's what you can do we can just

01:07:15

print the value of low the value of high the value of mid and the value of mid number what this

01:07:20

will do is this will help you check whether the number is working as expected whether the function

01:07:25

is working as expected or not so now here comes the actual check the meat of the problem

01:07:32

if the middle number matches the query then we return the middle number great we found it well done

01:07:38

now if the middle number is less than the query remember the elements are inserted array

01:07:44

and we are looking for the number query now the middle number is less than the query

01:07:50

so that means the query probably lies to the left of it because the query because the elements

01:07:56

are decreasing order right so if the query lies to the left of it so then we need to search

01:08:04

we decrease the search space from the beginning to the position just before the middle number

01:08:10

right so what we can do is we can simply set high to mid minus one on the other hand if mid

01:08:17

number is greater than query so that means because of the decreasing order of the array the query

01:08:22

lies to the right now we need to move the starting of the search space to beyond the middle number

01:08:28

so we simply set low to mid plus one and that's it and you can see that we've written a

01:08:33

we've used if LF LF loop here so LF stands for L safe in Python and here the last condition

01:08:41

could might as well just have been L's because there are only three possibilities either

01:08:45

that they're equal or mid number is less or it's greater but sometimes it's nice to list out

01:08:50

all possibilities just to make it super clear and it makes it easy for you while debugging fixing

01:08:54

issues as well okay so that's our binary search based algorithm and finally when we exit out of the

01:09:02

loop if you have not returned the middle number if you've not exited the function yet then we

01:09:07

return minus one that the number was not formed so let's test it out using our test cases and we

01:09:13

have our handy evaluate test cases function here but you can also test it manually if you want

01:09:18

by passing individual test cases but I'll just do this from now on so great so now we have test

01:09:24

case 0 this is the input and this is the query and it passed here we have test case 1 this is the

01:09:31

input and this is the query and it passed and now because we have these print statements we can

01:09:35

clearly look into our test cases and actually tell if this is tested correctly or not because

01:09:42

now you can see here that we started out with low 0 high 7 and mid mid value of 3 so 0 1 2 3 we

01:09:50

found the number 7 the query is 1 so we need to check this half of the array and that's exactly

01:09:55

what we did we moved low to 4 and high remains 7 then mid number became 3 so that means once again

01:10:04

we need to check this half of the array and then we check this number and then we found the output so

01:10:08

now you can see exactly how the algorithm works and this is in general what you want as a programmer

01:10:14

you want to have a full understanding of the code that you've written you don't want your code

01:10:18

to work incidentally you don't want it to you don't want to be in a position where you are just

01:10:23

fixing things they're trying out different things and somehow at once the code works you want

01:10:29

to be in complete control you want to know that these this is exactly what the code is doing

01:10:34

and if it is failing why it is failing so we go to test case 2 3 4 5 6 looks good looks like we

01:10:41

may have solved everything or probably not so test case 8 seems to have failed so test case 8

01:10:48

is this number this list and this list contains repeating numbers and not just repeating numbers but

01:10:54

the query itself occurs multiple times and now if we look here and maybe let's go go down and

01:11:01

evaluate just a test case separately so here we are now using the singular version of the

01:11:05

evaluate function so if you look here you can see you have 8 8 6 6 bunch of 6 is then 3 2 2 0

01:11:13

the query 6 so we start out with the low of 0 higher 14 total of 15 elements that gives you a

01:11:19

middle position of 7 and the mid number at that position so let's count 1 0 1 2 3 4 5 6 7 okay

01:11:29

and the mid number at that position is 6 great 6 is also the query so that's why our function

01:11:35

returns 7 but remember that we had decided that our function should return the first position

01:11:42

of the number within the array so our function is failing that condition and why is that happening

01:11:50

because unlike linear search where we start from the left and so we will always bump into the

01:11:55

first position because of the decreasing order of elements so we'll hit will encounter this 6

01:12:01

before we encounter this 6 binary search does not access elements in an order it access elements

01:12:08

sort of randomly if the still strategy but it goes left and right and it also depends on the

01:12:13

values of specific elements whether this element is accessed before this element can depend on the

01:12:18

value of let's say this element right so as such it's kind of a pseudo random kind of order

01:12:25

and so we need an additional condition to keep track of it right so how do we fix it

01:12:33

so the way to fix it is actually quite simple when we find that the middle position in a particular

01:12:40

range is equal to the query we simply need to check whether it is the first occurrence of the query

01:12:46

in the list or not that is whether the number that comes before it is it equal to query or not

01:12:55

if the number that comes before the middle element is also equal to query then obviously the

01:13:00

middle element is not the first occurrence so that simply means that we can go back and because

01:13:05

it can occur multiple times before that simply means that we can now search the left half on the other hand

01:13:11

if the middle element if the number before the middle element is not equal to query and obviously

01:13:16

because it is a sorted list it will be greater than query then all the numbers here are going to

01:13:22

be greater than the query and so this must be the first or the only position okay so make sure

01:13:30

you understand that this must be the first or the only position where the query occurs so once again

01:13:38

to make it easier what we will do is because there is some logic involved here what we will

01:13:43

will define a helper function called test location and this is a very helpful thing that you can

01:13:48

do every time you find that okay you have to cover these special cases and your function may

01:13:54

start to get slightly longer and slightly more complicated what you may want to do is create a

01:13:59

helper function and a good rule of thumb is not to have functions that are more than 10 lines of

01:14:05

code or so I try to keep my functions below 7 lines of code because 7 8 lines is approximately

01:14:13

the amount of information that you can hold in your head at once if your function is about 7 8 lines

01:14:18

you can probably take a quick glance and tell what it is doing identify issues but anywhere beyond

01:14:23

that it is very hard and if you are writing functions that are going into hundreds of lines

01:14:28

please stop doing that please start breaking your code into small functions

01:14:32

there is a there is a code by I forget who it is by but he is a creator of I think it is Eric

01:14:40

Meyer he created the RX library for reactive programming and he said that great programmers

01:14:46

write baby code which is really small bits of code that anybody can understand with a single look

01:14:53

so you should be writing as many functions as many small pieces of code small pieces of logic

01:14:58

as possible so let's see our test location function its purpose is to take the query and then take

01:15:07

just a specific position so forget about binary search for now just take a specific position

01:15:12

and tell if that position is the answer and how do we do that we first get the mid number from the

01:15:20

cards so we get a mid number from cards so we then we print out mid and we print out mid number

01:15:27

and then we compare the mid number with the query so this is the special case that we need to

01:15:32

handle this is where we had the error now what we need to check if if the element before the mid number

01:15:38

is also equal to query so if the element before the mid number is also equal to query then we need to go

01:15:44

left so just to make it super clear what we do instead of setting high low etc we simply say

01:15:49

that we need to go left so we will return the actual string left but one thing to keep in mind here

01:15:55

because once again whenever you're accessing an array you need to make sure that the index is valid

01:16:00

so we simply check that mid minus one should be greater than or equal to zero that we made

01:16:05

is not this position and which can happen as your search space decreases for example if this is your

01:16:09

search space your mid will actually be this position so if it is equal to if the number before the

01:16:18

mid number is equal to query then we return left otherwise we return found once again making it

01:16:23

very obvious that we have found the number so we return found else the other case is if the mid

01:16:31

number is less than query that means that the query lies on the left because of the decreasing order

01:16:37

of the list so once again we need to search on the left else it returns right so a test location

01:16:41

simply tells us whether we found the solution or we need to look on the left or we need to look

01:16:46

on the right now in sometimes you will see programs specially in C++ Java return something like

01:16:53

minus one zero and one and then use that to represent whether you should go left and right

01:16:58

but Python is a high level language and strings are first class sings are first class feature of

01:17:03

the language so just use strings because they're really descriptive they make your code readable

01:17:07

somebody else reading your code will be able to understand now if you're looking at minus one plus one

01:17:11

etc that is going to be difficult for people to understand so now we can now simplify a locate card

01:17:18

once again we have our low high land cards minus one zero and land card minus one the

01:17:23

while loop is the same and we print low and high as well so we're planting row and high inside

01:17:28

the locate card function and then we are printing mid and mid number inside the test location function

01:17:33

wherever is the right place to print something you printed then we get the mid position

01:17:38

and now we simply call test location so we're testing if mid is the answer and if it is not

01:17:45

the answer should we go left or should we go right now that makes it really simple because now

01:17:50

we get this result and we check this result and if it says found then we return mid that's the answer

01:17:56

if it says left then we return mid minus one or then we simply move high to mid minus one

01:18:01

and if it returns right then we simply set low to mid plus one so we are simply changing the

01:18:06

start position of the search space to after the mid element and here we are changing the end

01:18:11

position of the search space to before the mid element right so this makes it extremely obvious

01:18:16

and it's really hard to go wrong when you write code like this especially so when you have

01:18:22

and binary search problems are specially tricky because they always have certain these special

01:18:27

cases that you need to handle and if you start handling them within this if loop so now you have

01:18:31

a while loop inside which you have an if loop inside which you have another another if statement

01:18:36

and it can get pretty tricky and difficult to debug so let's evaluate that test case and

01:18:43

looks like that test case has passed this time perfectly you can go through the logs here to verify

01:18:51

let's evaluate the test case all the test cases as well we should do this every time we change

01:18:55

the function and that is why it's helpful to have a function where you can every time you make a

01:18:59

change you can just run the test and on a coding platform like elite code or hacker rank you will

01:19:06

be given some test cases although those test cases will not be visible to you so you can

01:19:10

submit your solution but you may not get an actual result you may not get to know what the test

01:19:15

case was or where your answer was wrong and that's where you may want to create your own

01:19:19

test cases if you're getting a lot of errors and in fact once you've written out the algorithm

01:19:26

you may realize that maybe you need to add more test cases what if the number lies in the first

01:19:31

half of the array what if the number lies in the second half of the array so this was not an

01:19:34

important factor when we were not thinking about binary search but now that we're thinking in

01:19:38

this direction of splitting the array into half we may want to add some test cases where the number

01:19:42

lies exactly in the middle in the left in the right and the simplest way to do that is now go

01:19:47

back to the test array so you can open a you can create a new cell here by pressing the character

01:19:53

B so if you click outside and press a character B you can create a new cell and then you can simply

01:19:58

do test.append and then write your test case so here is the final code for the algorithm

01:20:05

without without the print statements so we have test location and then we have locate card

01:20:13

and try creating a few more test cases to test your algorithm more extensively and once again

01:20:18

at every step we are going to save our work by running joven.com it.

01:20:23

Now we're down to analyzing the algorithms complexity and identifying inefficiencies if there

01:20:30

are any. Now you may have just read online you can actually look it up just search for complexity

01:20:35

of binary search and you will read and you will find an answer but and you may even just say that

01:20:42

in interviews but it's always nice to just come up with that answer from first principles it's

01:20:48

always nice is especially in an interview if you can talk through it if you can talk through

01:20:51

why it is order why it is whatever it is and we'll see what that is.

01:20:56

So now let's once again try to count the number of iterations in the algorithm because we need

01:21:01

to minimize the number of times we access elements from the array and to do that we know that in

01:21:07

each iteration we are accessing the element just once and then we are comparing it so we're doing

01:21:13

a bunch of other operations but in each iteration we're accessing one element so we need to count

01:21:18

just the number of iterations the number of times the y loop was executed. Now if we start out with

01:21:23

an array of n elements then each time each time the size of the array reduces to half for the next

01:21:30

iteration. Now that's roughly true because when you check the middle element and then you decide

01:21:36

whether to go left or right it's actually probably n by 2 minus 1 if n is even and if n is

01:21:44

odd it is the floor of n by 2 but again with algorithms with complexities we are generally

01:21:51

want interested in studying the trend so we can ignore that small part in the calculation.

01:21:56

So let's say the important part is that even it's okay to over estimate a little bit but

01:22:01

try not to underestimate. So after the first iteration we may be left with the search

01:22:06

space of size n by 2 it may be slightly less than that but it's okay to over estimate.

01:22:11

So we have n so we after n we have after the first iteration we are left with a search space of

01:22:17

n by 2 then we split it into half again so next time we may be left with a search space of n by 4

01:22:22

which is n divided by 2 square and then then we may be left by we may be left with n by 8 and it's

01:22:29

possible that at any of these iterations we may just exit because we may have found the right number

01:22:33

but what we always try to analyse is the worst case complexity of an algorithm what is the longest

01:22:40

possible amount of time or the largest amount of space it can take. So right now we are talking

01:22:45

about time because we are counting iterations and each iteration takes some time.

01:22:50

So n by 8 after iteration 3 that's 2 to the power 3 and I think then you can start to see the

01:22:54

trend here that after the k iteration you will end up with n divided by 2 to the power k

01:23:00

elements. Now when does the iteration stop? So the final iteration is on an area of length 1 and

01:23:06

that is when we access that last element and check whether after all this checking the last

01:23:10

element is equal to the index or not. So we can do n divided by 2 to the power k and if we

01:23:18

set that to 1 we can rearrange the terms and we get back n equals 2 to the power k. So after the

01:23:26

case iteration if you want to be left with one element then that means n divided by 2 to the k should

01:23:31

be equal to 1 or n should be 2 to the k or in other words k should be equal to log n. Remember

01:23:38

logarithms and here obviously log refers to log to the base 2 but what I will argue is that

01:23:45

you can change the base of the logarithm and that will simply add a constant. So that will simply

01:23:50

if you are taking the natural log then that will simply add a constant here and remember when we

01:23:55

talk about time complexity we ignore constants. So we can just generally say that our algorithm

01:24:02

binary search has the time complexity of order of log n. That means as the input grows,

01:24:10

the amount of time taken by binary search is proportional to the logarithm of the number

01:24:15

of elements in the list pass to it or the amount of time taken is logarithm to the size of the

01:24:22

initial search space and you can verify this you can verify that the space complex you can

01:24:27

you can check this out by simply writing it out as well you can take some examples. Let's say you take

01:24:32

a card list of size 10 and then walk through it the worst case and count how many iterations you

01:24:38

have and compare if that is close to log n or not. And then as an exercise you can verify that

01:24:45

the space complexity of binary searches order 1. Can you you can try posting in the YouTube

01:24:50

comments or in the YouTube live chat how the space complexity of binary searches order 1.

01:24:56

I'll let that steam. So let's now compare linear search with binary search.

01:25:05

How are the two different and what we do is we will create a large test case because

01:25:09

you start to see the benefits of the difference between the order n algorithm and the order log

01:25:14

n algorithm. Only when you have larger test cases because small test cases everything runs

01:25:20

instantly so it's not really that much of a hassle.

01:25:24

Secure we have a locate card linear and this is the linear version of the algorithm where we simply

01:25:29

go through each of the cards 1 by 1 and then we have a really large test case here so we have the

01:25:36

input and then we have the cards which goes in the range okay let's see. So that's 1 to 3 so that's

01:25:44

1000 another 3 that's million so we have 10 million elements here. So we have 10 million elements

01:25:53

and we are looking and so we are actually creating a range here so we are using a function in python

01:26:01

so we're creating a list of numbers going down from 10 million all the way to 1 so a decreasing

01:26:08

list going from 10 million to 1 and this is how you created and you can check it out and in this list

01:26:13

we are looking for the number 2 which occurs at the very end so we are sort of creating this is

01:26:18

as we will see if you want to really analyze it this is going to be a worst case scenario both for

01:26:23

linear search and for binary search approximately worst case so the queries to and then the output

01:26:30

is this is the output that we expect obviously because 0 to 9999999 is are the array indices

01:26:39

and the last element is 1 so the element just before is 2 so this is the expected output.

01:26:44

Okay so now we have this large test let us call evaluate test case and let us pass check the linear

01:26:51

search pass in the last test and because this is a huge list we may want to turn of the display

01:26:57

of the output we may not want to actually see the input being displayed so we can simply turn

01:27:04

of the display by passing display equals false and we can just get back the result from the

01:27:09

evaluate test case function so the result will give the output the actual output of the function

01:27:14

whether the test passed and the running time of the algorithm so it takes a second so it looks like

01:27:22

the test did pass or algorithm is correct so that's great and it took 1 to 2 4.291 milliseconds

01:27:29

or about 1.2 seconds to answer it and you can probably tell why because it because this is the

01:27:39

result so it probably took 999998 iterations so it had to go through all the elements to get to the

01:27:46

variant on the other hand when we talk about binary search so now we are passing in the binary search

01:27:52

version once again turning display to false and we are displaying the output okay so this time

01:27:58

the result is the same the test did pass but the execution time is 0.019 milliseconds so that's

01:28:06

55000 times faster than the linear search version and in fact you can tell how many elements we

01:28:11

actually had to access so if we just check log of so log of this number is about 7 and maybe

01:28:23

now if you're checking log 2 we can maybe check something like this so not more than 20 elements

01:28:27

had to be accessed so where we linear search needed to access about 10 million elements binary search

01:28:36

was able to get to the answer with just about 20 checks so that's a lot of times saved and

01:28:44

you can increase the size of the array by a factor of 10 and increase this by factor of 10 as well

01:28:50

and then you will see far bigger difference where for a 10 times larger array linear search would

01:28:57

run for 10 times longer whereas binary search would only require three additional operations

01:29:03

so the linear search would go from 10 million operations to 100 million operations binary search

01:29:08

would go from 20 operations to 23 and that is the real difference between the complexities order

01:29:15

and order login and as the size of as the size of the array's gross bigger another way to

01:29:20

look at it is that if you just divide the complexities binary search runs n by login times faster

01:29:26

than linear search for some fixed constant because there's always some constants involved

01:29:31

and as the size of the input grows larger the difference only gets bigger the difference in

01:29:36

performance and that is what algorithm analysis of algorithms and optimization of algorithms is

01:29:44

all about it's about overcoming the limitations of computers by devising clever techniques to solve

01:29:51

problems and it's something that you can actually apply in real life as well in a lot of cases

01:30:00

there are a lot of things that you may see a brute force solution to but if you just apply your mind

01:30:06

you may find a more optimal solution a more easy way or a more lazy way to do it with less work

01:30:11

so think about that and here is a graph showing how the how you can compare common functions

01:30:21

how the how the running times of common functions vary so people look at all kinds of functions

01:30:27

we look at constant time functions order one for example accessing an element from an array

01:30:32

is order one so even if you have an element of 10 a list of 10 million elements you can access

01:30:37

the last element in constant time on the other hand we've looked at binary search which has

01:30:42

which is order login and we've also looked at linear search which is order n now in the future

01:30:47

we will look at other techniques which have complexities of n square n cube n to the power n

01:30:54

or far far higher and somewhere in between there is a very nice special type of complexity called n login

01:31:01

which is rather nice so we'll talk about that as well n login in fact a lot of questions in

01:31:07

coding assessments and coding interviews tend to be taking algorithms which would be

01:31:14

which would have n square complexity in in a brute force approach and optimizing them either

01:31:21

to order n or to order n login so we'll discuss all of this so don't worry if this doesn't

01:31:27

make sense just yet but I hope you see now why we ignore constants and lower order terms while

01:31:32

expressing the complexity of the big notation so we've covered binary search but we've seen it

01:31:39

in the context of a problem and now we can step away one more step and abstract it out further and

01:31:45

identify the general strategy behind binary search and this general strategy is actually applicable

01:31:50

to a wide variety of problems and this is what you want to keep doing as a programmer you need to

01:31:56

abstract away peel away the layers of specific problems specific details and find the general

01:32:02

technique find the general strategy and then encode that using your functions and programs so

01:32:07

here's the general strategy come up with a condition to determine whether the answer lies before

01:32:16

after or at a given position so we are assuming here that we have some kind of a range and we have

01:32:21

to identify a position within a range or maybe an element within that range but we can access

01:32:26

elements using the position so come up with a condition that that first tells you whether

01:32:30

given a position the answer lies at or before or after that position once you have that condition

01:32:39

first retrieve the midpoint and the middle element of the list now if the middle element of the

01:32:45

midpoint is the answer then return the middle position that is the answer you're done if the answer

01:32:50

lies before it repeat the search so repeat the process with the first half of the list

01:32:57

or the first half of the search space and if the answer lies after it repeat the search with the

01:33:02

second half of the search space so here is the generic algorithm for binary search

01:33:08

implemented in python and you can see a classic detailed documentation here so while

01:33:19

so here you have the binary search is going to take a search space low and high so low is going to be

01:33:25

zero and high is going to be well we will pass in maybe the final we will pass in maybe the final

01:33:34

index of the array but writing it this way rather than passing in array also allows you to use

01:33:39

binary search for problems that are not based on array sometimes these could just be numbers for

01:33:44

example if I ask you to find a number between 1 million and 10 million that is a perfect square

01:33:53

then you can use binary search to do that

01:33:56

then it takes a condition so what it does is it starts a loop so while low less than equals high

01:34:07

we get the midpoint so low plus high divided by 2 that gives us a midpoint then remember earlier we

01:34:14

had this condition test location so our condition simply is supposed to take the middle position

01:34:22

and identify if the middle position is the answer or we need to go left or right so the

01:34:27

condition should return either found left or right so if the condition returns found we return the

01:34:33

midpoint as the answer if the condition returns left we return the high we move to the left side

01:34:42

so which is we take the end of the search space and set it to before the midpoint so we set

01:34:46

high equal to mid minus 1 and if the condition returns right which is the else case here we set

01:34:52

low to mid plus 1 so we take the start point of the search space and move it after the element

01:34:56

then we return minus 1 so that's your binary generic binary search algorithm and if you start using

01:35:01

this what will happen is now this is a tested piece of code and in fact we can see it here

01:35:07

now we can rewrite locate card and locate card can be we are passing in cards and we're passing in

01:35:13

the query and we need to write a condition and here we're using a very interesting feature of Python

01:35:20

we are writing a function inside a function so this is called function closure and it's a very

01:35:24

handy feature so now we can simply write condition inside locate card and what that does is

01:35:29

binary search is going to pass the middle value the middle position but condition can also

01:35:35

access cards and query so which is because it lies inside locate so what we do inside condition

01:35:41

is okay we check them we get the mid element card's mid if card's mid is equal to query

01:35:46

then here we have that check we check whether it is the first occurrence of query or can

01:35:52

query occur before it if query occurs before it we return left else we return found and then these are

01:35:57

the original conditions that we already had so you can verify this by going back and checking

01:36:02

but the important part here is now the while loop has gone away now we can simply call binary search

01:36:07

with zero line cards minus one so the start index the end index and the condition and we can

01:36:16

evaluate the test cases and you can see that the well in test cases are correct and now you can

01:36:23

use this binary search function because we have not tested it with one problem you can use this

01:36:26

exact same function to solve other problems too in some sense it is a tested piece of logic so

01:36:32

here's what we will do we'll take a quick question and we will implement it now we've spent

01:36:37

what one and a half are talking about a particular problem but let's spend maybe two minutes talking

01:36:45

about a new problem and solving it so here's a slightly related question given an area of

01:36:50

integers sorted in increasing order find the starting and ending position of a given number so once again

01:36:56

you have a sorted area this time they're increasing the only difference is now apart from the fact

01:37:01

that they are sorted in increasing order the other difference is that we're looking for both the

01:37:08

start index and the end index so we're looking for both the start index and the end index

01:37:17

of a particular number because the number can repeat like we saw one example and

01:37:22

this is a very simple way to solve this as simple strategy is do binary search once to find the first

01:37:29

position and that's what this function does I let you read through it the only changes here are

01:37:36

this variable this has changed this order because now the now the elements are increasing order

01:37:44

and then the second change and there's no other change here so that this is just one change here and

01:37:48

then there is another function called last position here instead of checking the left we are

01:37:54

checking the right so instead of checking mid minus one we are checking mid plus one and if

01:38:00

mid plus one equals a target we go to the right and of course we have the same change here in this

01:38:04

code because instead of decreasing we have increasing order right so now we write two positions

01:38:09

now we write two functions first position last position and then first and last position is simply

01:38:14

getting the first position once so that's one binary search and getting the last position once that's

01:38:18

two binary searches and that's not bad you know the complexity still order login

01:38:23

two times login or two times some constant times login when you express it in the big

01:38:27

connotation is still login so that's okay and that was quick we were able to reuse most of the

01:38:33

code that we have written and that's the benefit of making generic functions like binary search

01:38:39

and in fact we can test the solution by making a submission here so let's go to leadcode.com

01:38:46

let us here what I've done is I have already copied over the binary search

01:38:52

function the first position function and the last position function so by the way leadcode is

01:38:57

a great platform for practicing so you can go to leadcode.com sign up with any account and you will

01:39:04

find a lot of problems especially on the in the problem step and here you can see that this is exactly

01:39:09

the problem that we have been solving just now so we've just post the code here by research first

01:39:15

position last position first and last position and leadcode requires you to write this class

01:39:19

called solution this is something that they give you beforehand and inside the solution you need

01:39:23

to define a function called search range where we are simply calling our first and last position

01:39:29

here I'll let you see and we simply we can test the code with our test case so you can pass a test

01:39:35

case here and test it out great or we can simply submit it and here you can see that the

01:39:44

problem was submitted successfully and it tells you things like how much runtime it used what was

01:39:50

the memory it used and your solution was accepted right so check out leadcode.com go to the problem

01:39:57

section and you can see all the different problems that they have you can also explore and you

01:40:03

have different problems that come up every day it's a great place to practice so that's binary search

01:40:09

for you but I just want to revisit the method once again so this is the systematic strategy that we

01:40:13

applied for solving the problem we state the problem clearly and we identify the input and the output

01:40:19

formats this this shows that you've understood the problem you know what the solution will look like

01:40:26

then come up with some example inputs and outputs and try to cover all the edge cases so this

01:40:30

shows that you are envisioning what are the different inputs that can come in before you write code

01:40:38

then you come up with a correct solution not necessarily the most efficient one and state it

01:40:41

in plain English now when you try to state it you will have to clarify it and that will help you

01:40:47

clarify your own thoughts and then you can analyze the algorithms complexity and you can implement

01:40:54

the solution and test it using example inputs so this is the basic solution now in interviews and

01:40:59

encoding assessments maybe you know where there's a time limit you may not want to implement

01:41:04

the brute force solution because then you may get stuck in fixing issues with brute force and

01:41:09

you can directly jump ahead to step five but while you're practicing always implement brute force

01:41:14

then step five analyze the algorithms complexity and most of the time it is simply a matter

01:41:19

of counting the number of iterations how many times a while loop or maybe a loop within a loop

01:41:24

is getting executed and identify inefficiencies and if it is a brute force solution it's

01:41:30

generally quite easy to see the inefficiency for example in this case the inefficiency was that

01:41:35

we know that the arrays sorted that anything we do will be better than going line by line right

01:41:40

we could pick a random element and that would help us eliminate a good chunk of the array

01:41:47

so that is the inefficiency and then apply the right technique and we are learning the techniques

01:41:52

so we've learned binary search today and then we're going to learn a lot more techniques

01:41:55

that are asked in interviews so apply the right technique to overcome the inefficiency and repeat

01:42:00

steps 3 to 6 which is go back and come up with a correct solution with the optimized technique

01:42:05

implement the solution and test it using some example inputs and then analyze that algorithms

01:42:10

complexity and identify any inefficiencies so what we've done for you is we have created a

01:42:20

template so you can see this python problem solving template and how you can use this template

01:42:27

is to simply run it so you run the code you run this template and then when you run the template

01:42:36

inside it you will see this question mark in a bunch of places so you can give it a nice project

01:42:39

name and you can commit it to your profile one way you can save a copy over this template to your

01:42:45

profile is by clicking the duplicate button if you click the duplicate button you can copy it in your

01:42:49

profile and you don't have to look for it you can just find it on your joven profile but anyway

01:42:54

once you have it copied you can click the run button and then click run on binder and run the

01:43:01

template then you go down once you run it and you can copy over a problem statement you can copy

01:43:07

over a link to the problem so that when you need to make a submission you can go back and

01:43:10

refer and then here the method is summarized for you and here we have created sections for you

01:43:17

so you can simply start filling out this method, step 1 step 2 step 3 step 4 step 5 so whenever

01:43:23

you are faced with a difficult problem just use this template and I guarantee it one if you

01:43:28

work through this course you will be able to solve a majority of the problems that you come across

01:43:34

and specifically even if you are able to follow maybe about 30 to 40 percent of this course

01:43:40

you will easily be able to solve most questions that are asked in interviews because questions

01:43:44

asked in interviews are fairly simple in terms of the data structures or algorithms they test

01:43:49

but the intention there is more to test your approach look at the quality of your code and

01:43:54

see how clearly you are expressing yourself and this is what is exactly what this method

01:43:59

teaches you to do now to encourage you to do this to encourage you to try it out and you can take

01:44:05

problems from places like lead code code chef code forces there are a few links listed here

01:44:11

you can see practice problems there are a bunch of links listed here so that was today's lesson

01:44:16

for the next lesson common data structures in python so this is data structures and

01:44:23

algorithms in python and online certification course brought to you by Jovian

01:44:30

thank you hello and welcome to data structures in algorithms in python this is an online certification

01:44:36

course pick offered by Jovian today we are looking at assignment one binary search practice

01:44:43

so let's get started first thing we will do is go to the course website pythondsa.com

01:44:50

on the course website you can enroll for the course and view all the previous lectures and assignments

01:44:56

for assignment one you may want to review the video and notebook for lesson one

01:45:01

let's open up assignment one it's called binary search practice

01:45:07

now in this assignment you will apply and practice the concepts that we covered in the first lesson

01:45:12

so you will understand and solve a system solve a problem systematically

01:45:17

implement linear search and analyze it and optimize the solution using binary search and ask

01:45:22

questions and help others on the forum let's open up the start and notebook for the assignment

01:45:29

which contains that problem statement and other information now this is a notebook you're

01:45:36

looking at hosted on Jovian you can see some description here and if you scroll down below

01:45:42

you can also see some code and you will need to execute this notebook modify the code with

01:45:47

within it and record a new version which you can then submit to see your score so let's start reading

01:45:54

through it as you go through the notebook you will find three question marks in certain places

01:46:00

to complete the assignment you have to replace the question marks with appropriate values expressions

01:46:05

or statements to ensure that the notebook runs properly and to end now keep in mind that you need

01:46:11

to run all the cells otherwise you may get errors like name error or undefined variables

01:46:17

you should not be changing any variable names or deleting any cells or disturb any existing code

01:46:23

you can add new code sales or new statements but do not redefine or do not change some of the

01:46:29

existing variables you will be using a temporary online service for code execution and we'll see how to

01:46:36

use it in a moment so keep saving your work by running Jovian.com at regular intervals and then the

01:46:42

question marks optional will not be considered for evaluation although we recommend doing them they

01:46:47

are for your learning but you can make a submission before you have solved the optional questions

01:46:54

now you can make a submission back on the assignment notebook page and we'll see how to do that

01:46:59

and if you're stuck you can ask for help on the community forum it's listed here

01:47:06

and we'll see how to do that as well now one final thing I want to mention is you can get

01:47:13

help with errors or ask for hints you can even share your code and errors that you are getting in the

01:47:18

code but please don't ask or share the full working answer code on the forum this is so that

01:47:25

everybody has the opportunity to work through the problem statement on their own make mistakes learn

01:47:31

from their own mistakes and arrive at the right solution now how do you run this code the recommended

01:47:38

way to run this code is by clicking the run button at the top of the page and selecting run on binder

01:47:43

but you can also run it using some other options like Google, Colab or Kaggle or you can run it on

01:47:49

your computer locally so we're going to use the recommended method run on binder

01:47:57

now we have the notebook running in front of us the first thing I like to do is go to kernel

01:48:06

and click restart and clear output so that we can see all the outputs of the notebook from scratch

01:48:14

and I'm also going to toggle the header and the toolbar so that we can zoom in a bit

01:48:28

so now the same Jupyter notebook is now running online on a platform called binder

01:48:34

and before starting the assignment let's save a snapshot of the assignment to a joven profile so

01:48:39

that we can access it later and continue our work I'm going to run pip install joven

01:48:45

this is going to install the joven library then run import joven to import the library and set a project

01:48:54

here I'm just calling it binary search assignment and run joven.com it now you've taken a

01:48:59

start a notebook which was hosted on my profile and then you've run it on binder where as soon

01:49:04

as you run joven.com it a copy of the start an notebook gets saved to your profile so what you will

01:49:09

see here is a link to a notebook hosted on your joven profile let's open it up here and see

01:49:15

so now this is your personal copy of the assignment notebook any changes that you make here and

01:49:21

run joven.com it will get added to your profile so if you want to come back and continue your

01:49:26

work then you do not have to go back to the original start a notebook which contains all blanks

01:49:32

rather you can come back to your profile and you can come to your profiles simply by opening

01:49:36

joven.ai and on your profile you can go to the notebook step and on the notebook step you will be

01:49:43

able to find as you can see here you will be able to find the binary search assignment here

01:49:52

there you go this is the binary search assignment that we just created and you can open it and

01:49:58

run it on binder to continue your work. So moving along this is the problem we are looking at here

01:50:06

you are given a list of numbers obtained by rotating a sorted list and unknown number of times

01:50:12

okay so we have two new terms here rotating a sorted list and don't worry if you don't know

01:50:17

that means normally if you see any new terms in a problem they will be explained somewhere within the

01:50:23

problem itself. For instance here you can see that there's a definition we defined rotating a list

01:50:32

as removing the last element of the list and adding it before the first element one is

01:50:37

instance rotating the number list 3241 leads to removal of the last number and then placing it

01:50:46

at the very beginning so you end up with the list 1 324 this is a new operation that we are defining

01:50:52

this is not something standard but you will find that a lot of problems will define new terms

01:50:58

or new operations so that it becomes easier for you to understand the problem so that's rotating

01:51:04

a list now rotating a list once produces 1 324 now if you rotate that list again the resulting

01:51:10

list one more time then you will end up with 4132 and so on and then the other term is sorted

01:51:17

so sorted refers to a list where the elements are arranged in increasing order. In this case we have

01:51:23

numbers and the numbers 1 357 are increased arranged in increasing order so this is a sorted list

01:51:30

but if this was 3241 well that's not the numbers are not arranged in increasing order so that's

01:51:36

not a sorted list so you are given a list of numbers obtained by rotating a sorted list and

01:51:42

unknown number of times for instance this sorted list 0 234569 is rotated a certain number of

01:51:48

times and you can verify that if you rotate this three times you end up with the list 5 6 9

01:51:53

sees your 234 right you can see that first a certain 9 comes to the beginning then 6 comes to

01:51:58

the beginning and then 5 comes to the beginning so you need to write a function and you're given

01:52:07

just this the list you're not given the original sorted list you're given the list obtained

01:52:11

by rotating some sorted list and unknown number of times now you need to write a function to

01:52:16

determine the minimum number of times the original sorted list was rotated to obtain the given

01:52:22

your function should have the worst case complexity of order log n where n is the length of the list

01:52:27

and you can assume that all the numbers in the list are unique okay so 3 parts write a function

01:52:33

to determine the minimum number of times you need to rotate the original sorted list in this

01:52:37

case it is 3 the function should have the worst case complexity of log n so this determines correctness

01:52:44

and this determines efficiency and then this is some additional information to help you that you can

01:52:50

assume all the numbers in the list are unique if this was not mentioned you would also have to

01:52:56

handle the case where your list is not contained unique numbers now we will apply the method that we

01:53:04

have been applying all throughout this course for solving the problems number one state the problem

01:53:12

clearly identify the input and output formats number two come up with some example inputs and outputs

01:53:18

and try to cover all the edge cases number three come up with a correct solution for the problem

01:53:23

and state it in plain English number four implement the solution and test it using some example

01:53:29

inputs and having test cases and then implementing a solution allows you to test them using the

01:53:36

example inputs and fix any bugs that's why it's very important to have some test cases number five

01:53:41

analyze the algorithms complexity and identify any inefficiencies and number six apply the right

01:53:46

technique to overcome the inefficiency and then you go back and repeat steps three to six come

01:53:52

up with a correct solution implement the solution and test it and analyze the algorithms complexity

01:53:59

and you can review lesson one for a detailed explanation of this method let's apply it step by step

01:54:05

the first step is to state the problem clearly and identify the input and output formats

01:54:10

now why while it is stated clearly enough it always helps to express it in your own words

01:54:15

in the way that it makes it most clear for you and this is something that you can keep

01:54:19

returning to rather than the original problem statement because this is something that you will

01:54:24

understand better and it's okay if your problem overlaps with the original problem statement

01:54:30

but do try to express it in your own words so in this case what I've just done is I have double clicked

01:54:35

here once you double click you can now edit this text cell and now we can start writing a problem

01:54:41

so let's say given a rotated list we need to find the number of times it was rotated

01:54:52

and okay I think what I've probably missed here is that it is a sorted list so given a

01:55:01

sorted list that was rotated some unknown number of times

01:55:10

we need to find the number of times it was rotated right maybe I'm just going to say given a

01:55:18

rotated sorted list because technically the input is not a sorted list it's a rotated sorted list

01:55:23

so given a rotated sorted list that was rotated and unknown number of times we need to find

01:55:27

the number of times it was rotated but doing this exercise helps you determine

01:55:34

if you understood the problem correctly and you may often find that okay there's a certain

01:55:38

detail in the problem that you missed okay but at this point I'm happy with my description

01:55:44

and you will see that it is matching the description to a large extent but it's something that

01:55:51

I understand better so I'll just refer to this from this point now assuming here that I know

01:55:57

what rotation and sorted means otherwise I could also include those then here's a question

01:56:03

the function will you will write will take one input called names what is it represented

01:56:08

and given example okay so once again we double click on this and one input is

01:56:17

names so this is a sorted rotated list and let's give an example here let's say we take the

01:56:30

sorted list three five six seven nine and then we rotated a few times let's say we rotated a

01:56:40

couple of times so we end up with this sorted rotated list that's our input so we answered the

01:56:49

question here now the first question was to express the problem in your own words this is a solution

01:56:54

the second question was what does the input names represent given example it represents a sorted

01:57:00

rotated list seven nine three five six the third question is the function you will write will return

01:57:06

a single output called rotations what does that represent well you have to write a function that

01:57:10

identifies how many times the list was rotated so this is the number of times the sorted list was

01:57:19

rotated okay and in this case the example that we have is that this sorted list was rotated twice

01:57:30

three five six seven nine was rotated two times so you mentioned to here now you can see these

01:57:36

backcodes that I'm using here this is next to the number one on your keyboard or below the escape key

01:57:44

what these backcodes let you do is they let you express text as code within markdown you can see that

01:57:51

they have a gray background and they have a different font this looks a lot more like code

01:57:55

same is here true for nouns so you can use markdown and its features

01:58:02

to your advantage to organize your descriptions and your text better okay so now based on the above

01:58:10

we can now create a signature of our functions we have a function called counts rotations it takes

01:58:14

the list of numbers and it returns well right now we are just putting pass in here but we know that

01:58:20

it's going to return on single number rotations now after each step remember to save your notebook

01:58:27

so we are going to just run jobin.com it and now if you leave your computer you do not have to be

01:58:35

worried that your work may be lost so you can go in here and you can open up this notebook from your

01:58:43

jobin profile and press run at any point to run this notebook now step 2 is to come up with some

01:58:55

example inputs and outputs and try to cover all the edge cases and our function should be able

01:59:00

to handle any set of valid inputs so here are some variations that you can encounter a list of size

01:59:07

10 rotated 3 times a list of size 8 rotated 5 times so these are two generic examples and then

01:59:13

a list that wasn't rotated at all a list that was rotated just once a list that was rotated

01:59:18

n minus 1 times where n is the size of the list a list that was rotated n times and what you mean

01:59:26

by rotating the list n times well let's see an empty list and a list containing just one element

01:59:33

and if you can think of more test cases you should definitely add more test cases here and what we

01:59:39

do is we will express our test cases as dictionaries so this will help us organize the test cases

01:59:48

and test them all at once more easily using helper functions so you can see here that we've organized

01:59:54

one test case here and we've expressed a test case as a dictionary so here we have the input to the

02:00:00

test case that is the input key and then we have the output to the test case now because a function

02:00:05

can take many arguments the input itself is going to be a dictionary and then for each argument

02:00:12

in this case there's just one so we just call it norms we have the input here and this is the

02:00:20

size of the output okay so let's create the test case and let us then if you want to fetch

02:00:31

actual input an output out of it so here we can fetch test input norms that's going to give us the

02:00:36

norms we can use test input the output should be test outputs this seems to be an error

02:00:47

and the result is counterrotations num 0 okay so this is the actual result obtained by passing

02:00:53

the test case into counterrotations and you can see that the result we get back is none because

02:01:00

right now we do not have any code we just has passed inside and the result and the output are not

02:01:08

equal because the output is the number 3 but the result is null so that's okay our test case is

02:01:14

failing right now because we have not yet implemented the function but as soon as we implemented

02:01:19

we expect to see the test case passing now to help you avoid all of this work we have given

02:01:24

you a function called evaluate test case so from jove not python dsa you can just import

02:01:30

evaluate test case and then call evaluate test case with a function you want to test and the actual

02:01:35

test case and you can see here it prints the output that was passed in the expected output

02:01:42

the actual output that was obtained and the test result in this case the test result was failed

02:01:48

and the execution time is also printed here if you just want to evaluate if a certain

02:01:53

implementation is faster than another so now your job is to create test cases for each of

02:02:00

this scenarios listed above so here is test 0 that is same as the original test case that we had

02:02:06

created now here is test 1 a size a list of size 8 rotated 5 times I will let you create this

02:02:14

but it will look something like this you will open up you replace the three question marks with

02:02:19

let's say a list of size 8 so 1 2 3 4 5 6 7 8 and you can imagine that this was rotated 5 times

02:02:28

then 1 2 3 4 5 or 5 of these numbers will then move to the first position

02:02:35

and you get this as the input numbers and the output well it was rotated 5 times so I think

02:02:44

you can guess that the output here should be 5 now here is a list that wasn't rotated at all

02:02:53

what should be the output here I'm sure you can guess that the output here should be 0 so I let you

02:02:59

fill this out here is a list that was rotated just once so let's try let's fill out this one

02:03:08

so this this list rotated once would give us 7 3 5 there you go a list that was rotated

02:03:25

n minus 1 times where n is the size of the list okay I'll let you do that a list that was rotated

02:03:32

n times where n is the size of the list okay what does that look like

02:03:42

so you take this list and then you first put 10 in the first position and then you put 9 in the first

02:03:47

position so 9 10 comes to the beginning and 3 5 7 8 comes after it then you move 8 to the first position

02:03:54

then you move 7 then you move 5 then you move 3 if you move all of these back to the first position

02:03:59

you end up with the same list so you've rotated it n times now what should be the output in this case

02:04:07

there are about 6 numbers here so is the output 6 I don't know I'm not so sure because remember

02:04:16

the question the original question says write a function to determine the minimum number of times

02:04:24

the original sorted list was rotated to obtain the given list so it has to be the minimum number

02:04:28

of times we may want to just go back and change this we need to find not the number of times

02:04:35

it was rotated but the minimum number of times it was rotated or it needs to be rotated right so

02:04:41

coming back here the output should be 06 but 0 so keep that in mind then here's an empty list

02:04:53

I let you figure out what should be the numbers in the output here and here is a list containing

02:04:59

just one element once again should be pretty straightforward can you rotate a list with one element

02:05:07

I let you decide and then we're taking all the tests and putting them into a single list now

02:05:14

since I have not defined all the tests I'm not going to use this definition which contains all

02:05:18

the tests but I'm just going to pick the number of tests that I have defined so we have defined here

02:05:25

S0, S1, S3, S5 I'm just going to put in S0, S1, S3 and S5

02:05:41

and that's the full sort of tests that we have you definitely need to fill out all the test cases

02:05:52

and if you can think of some other cases that you should be testing and you should include those

02:05:59

test cases here as well okay now to evaluate your function against all the test cases together

02:06:12

you can use the evaluate test cases helper function from Joven so there are two functions

02:06:18

evaluate test case works with the single test case and evaluate test cases works with

02:06:23

a list of test cases so we have a list of test cases here I have four but you should have

02:06:28

about eight at least and a few more if you have created them so we can import from Joven

02:06:35

or Python DSA evaluate test cases and then invoke evaluate test cases with the count rotations

02:06:42

function still it we don't have any logic in the function so we all the test cases should pass

02:06:50

and the list of test cases we've created so you can see test cases zero fails one fails two fails

02:06:55

so out of the four test cases none of them have passed no problem we have completed step two which

02:07:05

is to create some test cases and we'll know once we've defined a function whether the function

02:07:11

definition is correct now the next step is to come up with a correct solution for the problem

02:07:16

and stated in plain English and there's a hint here for you already coming up with a correct

02:07:22

solution is quite easy and it's based on this simple insight if a list of sorted numbers is

02:07:28

rotated k times so you keep rotating at step by step moving the last number to the first position

02:07:36

then the smallest number in the list ends up at position k okay and you can verify this it's very

02:07:43

simple to do this whenever you have a doubt discrete a new cell by the way you can create a new

02:07:49

cell by clicking on the left side of a cell and clicking insert cell below or if you're in a

02:07:55

code cell just click here near the prompt and press the B character and that adds a new cell below

02:08:03

so let's take the list one three five seven five six seven and let's rotate it k times

02:08:12

let's try with k equals two so if you set k equal to two then you're going to take two of these

02:08:18

from their very end and move them to the beginning so that means zero comes at position six comes

02:08:25

at position zero seven comes at position one and the starting element in the sorted list now comes

02:08:31

at position two that's interesting let's move the third element as well okay so now we've moved

02:08:40

three elements or rotated the list three times and the smallest element ends up at position three

02:08:45

so it seems to hold true and you can verify this now with a larger list smaller list empty list

02:08:51

all the test cases that you have if a list was sorted k times sorted list was rotated k times

02:08:57

then the smallest number in the list ends up at position k counting from zero further it is

02:09:04

the only number in the list which is smaller than the number before it and you can see this once again

02:09:09

the smallest number is at position three and all of these numbers are higher than the numbers

02:09:14

that come before them except the number one which is smaller than seven so we simply need to

02:09:25

check for each number in the list whether it is smaller than the number that comes before it

02:09:30

if there is a number before it then our answer is simply which is the number of rotations is

02:09:35

simply the position of this number right so if you can find the position of the number

02:09:41

which is smaller than the number that comes before it the position of the number is also equal

02:09:46

to the number of times to sorted list was rotated and if we cannot find such a number then the

02:09:51

list wasn't rotated at all and that's it you can see here in this list now applying this logic

02:09:59

three is the number the smallest number and not only that three is the only number which is

02:10:05

lower than the number that precedes it the predecessor which is 29 and since three occurs at

02:10:11

position four well actually three occurs at position three 0 1 2 3

02:10:20

the list was rotated exactly three times now we can use the linear searcher algorithm as a first

02:10:35

attempt to solve this problem in the linear search simply involves working through this list

02:10:40

working through this list from the left to the right so now the task for used to describe the

02:10:46

linear search solution in your own words and please write it in your own words but here's how I'm

02:10:51

going to write it let's say create a variable position with values 0 so this is the position

02:11:03

for tracking this for tracking the position then look at the number at the given position and not

02:11:15

only look at it but compare the number at the let's say the current position to the number before it

02:11:27

now if you're starting position with the values 0 maybe we may not there's no number before it so

02:11:33

we may not be able to compare it with something we may even just start with the value 1 that's all right

02:11:40

if the number is smaller than its predecessor then return position because position is the answer

02:11:58

we found the number that is smaller than its predecessor there's only one such number

02:12:01

otherwise increment position and repeat till we exhaust all the numbers

02:12:18

okay simple now you can add more steps if your description of the algorithm requires more steps

02:12:29

that's perfectly all right but at this point we have a very clear description of the solution

02:12:34

now we're starting with the position is 1 not 0 because we also want to track the previous position

02:12:41

now we import Jovian here and commit a project once again I keep saving your work after every step

02:12:47

so that you can continue your work so now we're talking about implementing the solution and testing it

02:12:55

so let's implement the solution we said that we want to start with position

02:13:00

we want to start with position 1 and while venture the loopy terminated well while position

02:13:09

is less than the length of numbers okay that's fair and then what is the success criteria so we have

02:13:16

if position greater than 0 and norms of position less than norms of position minus 1 okay

02:13:26

that's the success criteria here now you can see that there's a condition if position greater than 0 here

02:13:31

so we don't really need to start position at 0 we can start position at or we don't really need to

02:13:36

start position at 1 we can start position at 0 as well and all that will happen is this conditional

02:13:42

gets kept and position will get incrementing and this is a good practice because whenever you iterate

02:13:49

over a list you normally just want to start with 0 just to avoid any confusion later when you're

02:13:55

reading the code that did you intend to write 0 here or 1 etc etc so just put in position equal 0 here

02:14:02

and simply skip the check here or simply skip this comparison if position is not valid

02:14:09

so whenever you're accessing an element from inside a list or inside a dictionary you always want to

02:14:15

make sure that that index or that key is valid okay here we are making sure that the key position

02:14:23

minus 1 is valid by checking position greater than 0 in any case we now have the logic and finally

02:14:31

we are saying that if the number at position is less than the number that comes before it then we return

02:14:37

that and that's just going to if it's not then it's going to increment the position and it's

02:14:42

going to check again and again and again till we run out of numbers now if you've exhausted the

02:14:47

entire list then it follows that there were no rotations or there were end rotations exactly

02:14:56

in either case the number we return should be 0 okay so keep this in mind some you may have the

02:15:04

doubt should you be returning minus 1 here or should you be returning 0 here well the question does

02:15:11

specify clearly that you are given a sorted rotated list and you have to find the number of times

02:15:16

it was rotated now obviously minus 1 rotations are not possible so minus 1 would not be a valid

02:15:22

return value from your function and this is the reason we write test cases 2 now let's evaluate the

02:15:29

test case so let's call evaluate test case for a single test case on countertations linear and let's see

02:15:36

what the test cases this is the test case here and this is the output we call evaluate test case

02:15:44

with countertations linear and test and that gives us a linear search result and you can see here

02:15:51

this was the number the list of numbers this was the expected output then this was the actual output

02:15:57

so great our functions seem to have path to test case now we can evaluate all the test cases by

02:16:03

calling countertations linear on all the test cases together and give that gives us a whole list of

02:16:09

test results test case 0 and 1 and 2 and 3 all of them are passed now if you had put in minus 1 here

02:16:22

you would see that one of the test cases would fail which is the case where the list was

02:16:31

rotated at all or was rotated n times okay so that should tell you that the answer here should be

02:16:39

0 so that's our linear search algorithm and at this point you may face issues you may

02:16:50

feel stuck you may not be able to figure out how to write the code and that's perfectly all right

02:16:56

that's part of learning you may face errors you may face exceptions for instance if you did not have

02:17:02

this check here position greater than 0 or maybe what you had here was some other condition like

02:17:11

position less than equals position plus 1 and that's okay then you can go to the forum and post your

02:17:19

issue so let's open up the forum here this is the forum discussion for assignment 1 and you can

02:17:30

go into the original topic here which is a longer discussion so this is where everybody's posting

02:17:38

small issues so you can see that there's about 321 messages that have been posted you can start

02:17:43

looking through this forum you can start reading through some of the posts you can even search

02:17:47

if you press control left and you can even search for questions here now if you want to post your

02:17:52

own question scroll down to the very end or you can just click this button here and click reply

02:18:00

okay and mention your question here I have an issue should I return minus 1 or 0 in the case the

02:18:11

list has not been rotated okay maybe that's and if you want if you have code that's not working

02:18:19

or there's an error you can also include a screenshot of your code or I'll show you another trick

02:18:25

you can actually include let's say you commit your notebook so let me come up here I've

02:18:32

committed my notebook and if you have a particular line of code that you want to share you can

02:18:41

actually click copy cell link and paste it here so that will give a link to the entire cell

02:18:51

and if somebody clicks on the link then they can view that specific cell of the notebook directly let's see

02:19:02

you can see here that it brings us directly to this specific cell there's another option you can even

02:19:12

click on embed cell okay for embed first secret notebooks we know not allow embedding but copying

02:19:20

the cell link should work and then click reply and your question will be posted and somebody will

02:19:30

reply to your question just come back to the forum in a few hours or maybe the next day and you should

02:19:35

see an answer you will also receive an email so that's the discussion topic you can also go back

02:19:45

to the topic here the category here and create a new question you can see if you want to start your

02:19:51

own thread if you think your question deserves a deeper discussion where multiple people can reply

02:19:58

you can also create a new thread by clicking new topic okay so keep this in mind and do make use of

02:20:05

the forum what we've seen is people who are active on the forum are at least four to five times

02:20:10

more likely to complete the course and on the certificate of accomplishment and continue working

02:20:16

on these topics after the course as well okay so the next step is to analyze the algorithms

02:20:28

complexity and the way to do this if you've seen lesson one is to simply count the number of iterations

02:20:34

number of executions of the while loop now if you have a list of numbers of size n

02:20:43

then you can see here that this is the key loop here while position less than the length of numbers

02:20:50

so then there will be n loops or n iterations and then inside eat each iteration we're performing

02:20:55

certain comparisons and returning things so all of these are in effect constant time and based on

02:21:01

this you can probably tell that the complexity of linear search is order of n so you can just put in

02:21:09

a big o and in the big o notation this will be order n so that's the first part of the assignment

02:21:16

linear search now the next step is to apply the right technique to overcome the inefficiency

02:21:21

and that's where you can now you can now read through the rest of the assignment now the idea here is

02:21:27

this binary search is the technique we'll apply and the key question we need to answer and

02:21:34

binary searches given the middle element can you decide if it is the answer which means if it is

02:21:43

let's say the smallest number in the list or whether the answer lies to the left or the right

02:21:49

of it okay so given the middle element if the middle element is smaller than its predecessor

02:21:55

then it is the answer we already know that because there's only one number in the list at a

02:21:59

smaller than its predecessor so you can see here for example now if the middle element was one

02:22:05

which it's not but suppose that the middle element was one and you can see that one is smaller than

02:22:10

eat then we know that one is the answer so the position of the middle element is the answer

02:22:15

however if it isn't then we need a way to determine whether the answer lies to the left of the

02:22:21

middle element or to the right of it and consider these examples so here you can see that the

02:22:26

middle element is three and the answer is the position two so in this case the answer or the

02:22:37

smallest element lies to the left on the other hand in this case you can see the middle

02:22:42

element is four and the smallest element minus one lies to the right of it so now you need to

02:22:48

apply your mind and think of a check that will help you determine if the middle element

02:22:53

given the middle element if the answer lies to the left or the right of it right and we're

02:22:58

looking for the smallest element remember so the logic here if you just spend a couple of minutes

02:23:02

you will come up with this quite easily if the middle element of the list is smaller than the

02:23:07

last last element of the list okay or the last element of the range that we're currently

02:23:13

looking at that means that all the numbers here are in increasing order so then the answer lies to

02:23:20

the left of it on the other hand if the middle element of the list is larger than the

02:23:30

last element of the range that means that because we know that the list is rotated

02:23:36

list so that means that the numbers increase up to 0. and then there's a decrease and then they

02:23:41

continue increasing that's the only way in which the final element can be smaller so that means

02:23:49

the answer lies to the right of it so that's the logic here for binary search and now what you

02:23:54

have to do is describe the binary search solution in your own words so here once again you have

02:24:00

these four or five lines and it's very important that you do this because if you cannot express it

02:24:06

then coding it is also going to be difficult for you so always do this exercise of expressing

02:24:12

the solution in your own words when you're practicing when you're solving a coding challenge

02:24:16

or something even in an interview it's also very important because the first thing you need to

02:24:21

do is to communicate to the interviewer your thought process and how you're thinking about the

02:24:26

problem so the first thing you need to do is describe a simple solution in your simple words

02:24:30

and then they may or may not ask you to code that solution and then the next thing is to

02:24:34

identify the complexity or identify the inefficiency then the next step for you is to

02:24:41

describe the optimal solution or the binary search solution in your own words okay now

02:24:48

if you don't describe the solution in your own words and you start writing the code they may

02:24:52

not be able to follow your code so even if you written mostly correct code maybe with one or two

02:24:57

edge cases wrong they may still have a feeling that you don't know what you're writing but if you

02:25:03

explain the solution clearly to them they will know that now you understand the solution

02:25:08

and they will be able to follow the code as you write it and they will be able to pick up mistakes or

02:25:13

errors and help you with the errors once secret is that interviews are always open to helping you

02:25:20

unless you make them really confused to keep that in mind and describe the solution in your

02:25:27

words once you do that you can commit now the next step is to implement the solution now the

02:25:34

implement the binary search solution as described in the previous step let's run this again

02:25:50

so you run count rotations define the function count rotations binary now you may want to review

02:25:58

lesson one here on how to start it out you'll see that low starts out at zero and high starts out at

02:26:05

length numbers minus one and I will not solve the rest of this but there is a certain condition here

02:26:14

between low and high so in binary search we are starting with the entire list as the range then we're

02:26:19

looking at the mid number so we're getting the first the mid position and we look at the number

02:26:23

at the mid position then we check if the middle position is the answer so if the middle

02:26:29

position is the answer we return the middle position then we check if the answer lies in the left

02:26:34

half so here's a condition where you decide if the answer lies in the left half and we

02:26:39

once the if the condition holds true all we do is we change the high so which we change the end point

02:26:44

of the range, 2 made minus 1, and then we check if the answer lies in the right half,

02:26:50

in that case we change the starting point of the range to make plus 1 and the y loop

02:26:54

repeats.

02:26:55

Okay, so that's the general logic of binary search.

02:26:58

And one thing you have to keep in mind is if none of the elements satisfy the criteria

02:27:03

that you have, what is the answer?

02:27:05

And this is a very important condition.

02:27:07

This is where it is very easy to go wrong, this is also called the edge case or the

02:27:14

trivial case.

02:27:15

So you should handle and think about this carefully.

02:27:18

And then once you've done that, you can evaluate the test case and you can a single

02:27:22

test case, you can evaluate multiple test cases.

02:27:25

Now, if your test cases are failing, you may want to enable this print statement inside

02:27:30

by uncommenting it, but make sure to comment it out at the end once again.

02:27:35

And the print statement will help you see what the low high end mid points were.

02:27:39

Now, you may want to then take up pen and paper, look at an example that is failing.

02:27:44

Let's see if the printed numbers match what you expect to see.

02:27:49

Debugging your function is a very important scale.

02:27:53

So keep that in mind and use a debugging technique like this by adding print statements

02:28:00

and working out the same problem side by side on paper to fix your issues.

02:28:05

Otherwise you may feel lost if you are not able to look into the internal workings

02:28:09

of the function.

02:28:14

Next, you have to analyze the algorithm's complexity and identify inefficiencies.

02:28:19

This should be straight forward enough.

02:28:20

We've already looked at the complexity of binary search, but all you need to do is make

02:28:24

sure that what you're doing within the algorithm matches the analysis that we've done

02:28:28

earlier.

02:28:29

So the problem size reduces by half each time and then we're doing constant work in each

02:28:34

step before solving a problem of half the size.

02:28:37

So that should roughly give you an answer.

02:28:41

And keep committing your work.

02:28:45

Now finally, to make a submission, you have two options.

02:28:47

Now, one option is to take this link, so your notebook has been committed here and you

02:28:54

can come to the assignment page, let's open up the assignment page, binary search practice.

02:29:00

Come down here and paste this link here and click submit.

02:29:07

Now once you click submit, the assignment will be submitted and it will go into automated

02:29:14

evaluation.

02:29:15

So in about a couple of minutes, maybe up to an hour, depending on the queue of submissions

02:29:22

from different participants, you will receive a grade over email.

02:29:28

Let's just refresh the page and it seems like there was an issue here, the issue was

02:29:35

that count traditions binary was not defined.

02:29:38

So it's possible that this happened, count traditions binary did not get defined because

02:29:44

there are a bunch of question marks here.

02:29:47

So we may need to then fix the issue and then come back and make a submission once again.

02:29:54

Okay, so I've received a fail grade, I will go back and I will fix the issue and then come

02:29:59

back.

02:30:00

Now it's very important for you that's why to have a good set, good set of test cases,

02:30:04

or you to test your function, so that when you submit it or when you get an error,

02:30:14

you can maybe look at your functions, performance on the test cases and fix anything that

02:30:21

needs to be fixed and add new test cases if you need to.

02:30:24

Now, one other way you can submit is by simply running the code by joven.submit assignment

02:30:31

equals Python DSA, I think assignment one.

02:30:33

The code is mentioned here, you can see here that the submission was made and you can verify

02:30:38

your submission on the page.

02:30:41

Okay, so that's assignment one, so what should you do next, review the lecture video if you

02:30:51

need to and execute the Jupyter notebook, you may need to keep, you may want to keep

02:30:56

the Jupyter notebook running side by side, I'll do working on the assignment, then complete

02:31:00

the assignment and even attempt the optional questions if you scroll down here on the assignment

02:31:05

notebook, you will find that there are some optional questions for you.

02:31:12

Here's one bonus question, use the generic binary search algorithm, so inside the Python

02:31:17

DSA module in joven, there is a function called binary search, you can use the generic

02:31:22

binary search example, then here's an optional bonus question to handle repeating numbers,

02:31:28

we did say that you can assume that there are no repeating numbers in the list, but here's

02:31:36

one list with repeating numbers, can you modify your solution to handle the special case,

02:31:41

and then here's an optional bonus question three about searching in a rotated list, so

02:31:46

you're given a rotated list.

02:31:48

Now, instead of finding the number of times it was rotated, you're trying to find the

02:31:52

position of a certain number, well, instead of the position of six, can you apply binary

02:31:56

search and modify your previous solution slightly, to search within the rotated list

02:32:02

and find the position of a given number, okay, now here's a hint, you can simply identify

02:32:10

two sorted sub arrays within the given array and perform a binary search on each sub array,

02:32:18

using, so to identify the two sorted sub arrays, you can use the counteritations binary

02:32:22

search functions, so that's one potential solution, another way is to modify the counteritations

02:32:27

binary function to solve the problem directly, so it's a very interesting problem to solve,

02:32:34

and if you found the assignment easy, then you should definitely solve these bonus questions,

02:32:38

and if you can solve this question by yourself without taking additional help, then you

02:32:45

can solve pretty much any problem related to binary search that may be asked in an interview,

02:32:52

because most of the questions are some variations of something like this, and this is pretty much

02:32:56

the hardest problem you may get asked. You can also test your solution by making a submission

02:33:02

on lead code, and this is only for the final optional question, and there's a thread on the

02:33:08

forum where you can discuss the bonus questions separately as well, so do make use of the forum

02:33:13

thread too. Here it is, optional bonus questions discussion, so that was assignment 1 of

02:33:23

data structures in algorithms, so it's called binary search practice. Hello and welcome to data

02:33:31

structures in algorithms in Python, this is an online certification course by Jovian,

02:33:37

my name is Akash and I am the CEO and co-founder of Jovian, you can earn a certificate of a

02:33:43

accomplishment for this course by completing four weekly assignments and doing a course project,

02:33:48

today we are on lesson 2 of 6. Now if you open up pythonda.com, you'll end up on this course website

02:33:56

where you will be able to find all the information for the course, you can view the previous lessons,

02:34:00

which is lesson 1, and you can also work on the previous assignment, which is assignment 1,

02:34:07

and you can also check out the course community forum where you can get help and have discussions.

02:34:13

So let's open up lesson 2. This is a lesson page here, you will be able to see the video for this

02:34:21

lesson. You can watch live or you can watch a recording here, and you can also see a version of

02:34:28

this video lecture in Hindi. And in this lesson we will explore the use cases of binary search

02:34:35

trees and develop a step-by-step implementation from scratch, solving many common interview

02:34:40

questions along the way. So here is the code that we are going to use in this lesson, all the

02:34:47

different notebooks containing the code are listed here and let's open up the first one.

02:34:58

So here you can see all the explanations on the code for this lesson. This is binary search

02:35:03

trees, traversals, and balancing in python. And this is the second notebook in the course,

02:35:09

you can check out the first notebook in lesson 1. And if you're just joining us,

02:35:15

this is a beginner friendly course, and you do not need a lot of background and programming

02:35:19

with a little bit of understanding of python and a little bit of high school mathematics,

02:35:23

you should be able to follow along just fine. If you do not know these, then you can follow

02:35:29

these tutorials to learn the prerequisites in just about an hour or two. Now the best way to learn

02:35:36

the material that we're covering in this course is to actually run the code and experiment with it

02:35:42

yourself. So to run the code, and you can see here if we scroll down, you can see that there is

02:35:47

some code here on this page as well. Now to run the code, you have two options, you can either

02:35:53

run it using an online programming platform, or you can run it on your computer locally.

02:36:01

So to run this code, we will scroll up and click on the run button and then click run on binder.

02:36:08

And this is going to start executing the code that we were just looking at. So once again,

02:36:13

you can go on the course page by kndsa.com, open up lesson 2, and you can watch the video there,

02:36:21

and on lesson 2, you can open up the link to the code where you can read the code and the

02:36:27

explanations here. And if you want to run the code, just click the run button, and that will execute

02:36:33

the code for you. So once you click the run button on binder, you should be able to see an interface

02:36:42

like this. This is the Jupiter notebook interface, the same explanations that we were seeing on the

02:36:47

lesson page. You can see here, the same explanations are now available here. But the differences,

02:36:52

you can now edit these explanations and you can go down and you can actually run some of the code

02:36:58

in this tutorial. You can see here that you have a run button and when you click the run button,

02:37:01

that is going to run the code in this particular cell. And this is a Jupiter notebook made up of cells.

02:37:07

Now we'll do a couple of things here. The first thing we'll do is we click on kernel and click on restart

02:37:13

and clear output. But this will do is we'll clear all the outputs of the code cell so that we can

02:37:18

execute them ourselves. And then I'm just going to zoom in here and hide the interface so that we can

02:37:29

look at the explanations and the code. So finally we have some running code and in this notebook,

02:37:38

we will focus on solving this specific problem. And this is a common question. A question of

02:37:44

this sort can be asked in interviews. So this is an interview question, but along the way,

02:37:48

we will also learn how to build binary trees and binary search trees and how to apply them to several

02:37:53

other questions. So here's the question. As a senior back-end engineer at Jovind,

02:37:59

you are tasked with developing a fast-end memory data structure to manage profile information,

02:38:03

which is username, name, and email for 100 million users. It should allow the following

02:38:08

operations to be performed efficiently. You should be able to insert the profile information for

02:38:13

a new user, find the profile information for a user given their username, and then update the

02:38:18

profile information of a user once again given their username, and list all the users of the

02:38:23

platform sorted by username. And you can assume here that using names are unique. So this is a very

02:38:30

realistic problem that you might face if you're working at a company where you have a lot of users.

02:38:36

So let's see how we solve this problem. Now here's a systematic strategy that we'll apply

02:38:43

for solving problems, not just here, but throughout this course. This first step is to state the

02:38:48

problem clearly and in abstract terms, and then identify the input and output formats.

02:38:54

Then come up with some example inputs and outputs to test any future implementations

02:38:59

and try to cover all the edge cases. Then come up with a simple correct solution for the problem.

02:39:06

It doesn't have to be efficient, it just has to be correct and stated in plain English.

02:39:10

And then implement the solution and test it using some example inputs. Fix bugs if you face any.

02:39:17

And finally analyze the algorithms complexity and identify inefficiencies if any.

02:39:24

Now once you identify it, inefficiencies, then we apply the right technique and that's where

02:39:28

data structures and algorithms comes into picture. So we apply the right technique to overcome

02:39:33

the inefficiencies and then we go back to step three. So come up with a new correct solution,

02:39:38

which is also efficient, state it in plain English, implement it and then analyze the complexity.

02:39:44

Now if you follow this process, you should be able to solve any programming problem or interview

02:39:48

question. So step one, we state the problem clearly and we identify the input and output formats.

02:39:56

Now we can reduce the problem to a very simple, single line statement. We need to create a

02:40:01

data structure which can efficiently store 100 million records and we should be able to perform

02:40:08

insertion, search, update and list operations, all of them as efficient as possible.

02:40:16

Now the input, the key input to our data structure, the solution that we are building is going to

02:40:22

be user profiles which contains username, name and email for user. Now before we come up with a

02:40:31

solution, we need a way to represent user profiles and a Python class would be a great way to

02:40:35

represent the information for a user. So you may have heard of the term object oriented programming

02:40:40

and that is what we're going to look at now. If you're not familiar with the class, it's very

02:40:46

simple. A class is simply a blueprint for creating objects and what's an object well,

02:40:51

everything in Python is an object whether you're looking at a number, a dictionary, a list,

02:40:55

anything and you can create your own custom objects with custom properties and custom methods

02:41:01

by creating your own custom classes. So here's the simplest possible class in Python with nothing

02:41:07

inside it. We're creating a class user. So this is how you declare a class and then we're putting

02:41:12

nothing inside it. So whenever you put nothing inside a function or a class or anything you can

02:41:17

you need to put the past statement because Python cannot accept empty blocks of code. So here

02:41:23

we're creating a class which does not have anything inside it and we can create an object or

02:41:30

it's often called instantiation which is creating an instance of a class, instantiate an object

02:41:36

of the class by calling it like a function. So we say user one is user. So this creates an object

02:41:44

and the variable user one points to that object. Now we can verify that the object is of the

02:41:49

class user by simply printing it or by checking it's type. User one and type user one are both

02:41:56

user. Now the object user one does not contain any useful information. So let's add what's called

02:42:03

a constructor method. So constructor method is used to construct an object to store some

02:42:10

attributes and properties. So now we're defining the class user once again but inside it we're

02:42:15

defining this function and you can see that this function is inside the class because there is some

02:42:20

indentation here. So we define this function underscore underscore in it and it takes four arguments.

02:42:26

Now the first argument is a special argument called self and we'll talk about this and then we have

02:42:32

three arguments username, name and email. And inside in it what we're doing is we're

02:42:38

setting self dot username. So we're setting a property on self to username. We're setting a property

02:42:43

on self to name and we're setting a property on self to email. And finally we're printing user

02:42:48

created. So let's see let's create another user user to and you can see that user to is also

02:42:56

an object of the class user. Now here's what happened in conceptually when we do this. The first

02:43:02

thing that happens is when you invoke this function, when you invoke user as a function,

02:43:07

Python first creates an empty object of the class user and then stores it in the variable user

02:43:14

to and then Python invokes the init function and to the init function it passes user to the

02:43:22

object that was just created as self and then the other arguments that were passed while creating the

02:43:28

object as the rest of the arguments. So you can imagine that we are basically doing

02:43:33

your basically calling user dot underscore and score in it. The function with user to an empty

02:43:42

object and these arguments John, John, and John dot com. And then inside the init function,

02:43:49

we simply set these properties on user to. So now we get user to dot username is John,

02:43:55

user to dot name is John, and user to dot email is John, John, and do dot com. So that's basically

02:44:01

how classes work in Python. And that's why you always have this additional extra argument in all

02:44:08

class methods which will refer to the object that finally gets created. So once user to is created

02:44:19

with the values John, John, and John dot com, you can check that user to dot name is John,

02:44:27

and user to dot email is John, and user to dot username is John. Now you can also define

02:44:36

some custom methods inside a class. So obviously we had the init method, but here we are also

02:44:42

defining another method called introduce yourself. Now introduce yourself takes again two arguments,

02:44:48

the first argument is self which will refer to the actual object that gets created later.

02:44:54

And then we have a guest name, and we basically say hi, guest name, I am such and such,

02:45:00

contact me, it's such and such. So these blanks are filled in using the guest name,

02:45:04

self.name and self.email. So that's how you define a method in a class. So here we have

02:45:12

another user we're creating, Jane and Jane do a Jane at do dot com. And you can see here that when

02:45:18

we call introduce yourself with David. So user three which is Jane becomes self and then David

02:45:26

becomes guest name, and that's why we get hi David, I am Jane do contact me at Jane at do dot com.

02:45:34

So that's a quick refresher on classes and Python. Now there's a lot more to classes,

02:45:38

but the simplest thing you need to know is you how to define a class, how to create a constructor

02:45:44

which is underscore underscore in it, how to set some properties, like we said the properties,

02:45:49

name, email, and username, and finally how to define methods, like we defined the method

02:45:56

introduce yourself. And that's all we will need today. So we won't need much more than that.

02:46:06

And one final thing that we're doing with our class is we're defining two other special

02:46:12

functions underscore underscore REPR, rapper and underscore underscore STR.

02:46:22

So now these two functions, these two functions are used to create a string representation

02:46:30

of the object. And you can see here once we create an object user for, now and if you try to

02:46:35

print user for, you can see that user for is now printed like this. So use it three,

02:46:41

you was not printed, I mean user three was a printed just as a user, but with user four,

02:46:46

we have all this information printed here as well. So now here's an exercise for you,

02:46:52

which also brings us to the first quiz of the day. Now we are going to do three quizzes

02:46:59

in this video and you can answer these quizzes on LinkedIn. So go to our LinkedIn profile

02:47:06

if you see the posts, you will see a new post here, which will give you a question. And the

02:47:12

question is what is the purpose of defining the functions STR and rapper within a class?

02:47:18

And how are these two functions different? Now leave a comment with your answer and we will pick

02:47:25

the right answer one right answer and one lucky winner will get us to act back from us.

02:47:30

So that was the input. We now have a way to represent users by creating classes.

02:47:38

And then the output that we want the final output that we want to create for our problem is a data structure.

02:47:45

So a data structure is once again something that we can define using a class. So we can define,

02:47:50

we can expect our final output to be a class called user database, which has four methods.

02:47:56

Insert, find, update and list all. An insert takes a user and inserts it into the database,

02:48:03

find takes a username and returns the user, update takes a user,

02:48:09

and updates the data for that user and finally list all returns a list of the users. So this is

02:48:14

what the class will look like and we have not implemented it yet, but we now have an interface.

02:48:22

So now the next step is to come up with some example inputs and outputs.

02:48:26

So let's create some sample user profiles that we can use to test our functions once we implement them.

02:48:32

So we're going to create these seven user profiles and you can see that we're creating these seven

02:48:37

user profiles with a username, name, and an email and storing them in these variables.

02:48:42

Using the user class that we have just defined earlier and we're also going to store the list of users

02:48:51

in this variable called users. And as you can see, we can access different

02:48:56

fields within a user profile using the dot notation. So you can check barrage.username is barrage

02:49:02

and barrage.email is barrage.example.com and barrage.name is barrage task.

02:49:10

Now you can also view a string representation of the user as we have seen. So if we print the user,

02:49:14

you can see some information about the user and here is the full list of users that we have created.

02:49:19

So it's always a good idea to set up some input data set up some test inputs that you can

02:49:24

use to test with your implementation later on. And since we haven't implemented our data structure yet,

02:49:31

it's not possible to list any sample outputs, but you can try to come up with some

02:49:36

different scenarios to test any future implementations. So let's list some scenarios,

02:49:41

we're testing the methods of our user database class. So the methods are insert, find, update,

02:49:48

and list all. And for inserting, you may want to test that you're inserting a user into an

02:49:54

empty database of users. So that's what's called in hk's. And then the general case is to insert

02:50:00

a user into the database assuming that the user already does not exist. Then another hk's is trying

02:50:07

to insert a user with a username that already exists. So these are all the different ways in which

02:50:12

we can use the insert function and there can be some more. So here's an exercise where you try

02:50:17

coming up with all the different scenarios in which you would like to test the different functions

02:50:21

insert, find, update, and list. So that completes step two. Now we have some sample inputs and

02:50:30

then we have some scenarios in which you're going to finally test our function. So the next step

02:50:39

is to come up with a simple correct solution and then state it in plain English. Now here

02:50:44

is a simple and easy solution to the problem. We simply store the user objects in a list sorted

02:50:51

by user names. That's simple enough and suppose we do that. So inside our data structure we have

02:50:59

a list which simply contains a bunch of user objects. Then the various functions can be implemented

02:51:06

like this. So you have the insert function, the insert function simply requires looping through the

02:51:12

list and then adding the new user at a position that keeps the list sorted. So for instance if

02:51:18

you have the users, a cache, a month and so on already and then you're inserting the user

02:51:25

barrage then you can tell that barrage should go between a cache and a month in alphabetical order.

02:51:32

So that's how you insert a new user and maintain the sorted property of the list.

02:51:37

Then to find the user we simply loop through the list and then find the user object

02:51:42

with the username matching the query. So that's, if you're looking for a

02:51:46

payment for instance you start from the beginning you go through a cache, perage and finally

02:51:50

hit a payment and then you can retrieve the user object associated with a payment.

02:51:56

And then you have update. Now updating is very simple as well. It's similar to find. So you find

02:52:01

the user object matching the query and then update the details of that user object.

02:52:06

And then finally because our internal representation is already a list of user objects sorted by

02:52:12

user names so we can simply return that list when we want to list the users.

02:52:20

So that's our plain English description and it's always a good idea to describe your solution

02:52:25

in plain English so that you can clarify any doubts you have. Even during interviews it's a good

02:52:33

idea to have a conversation with the interviewer before you actually implement the solution.

02:52:42

And now one fact that we can use is that using names which are strings can be compared using

02:52:50

the less than greater than or equal to operators. So we can compare strings just like numbers and

02:52:56

Python. So that'll make it easy for us to implement these functions. And that brings us to the

02:53:02

implementation and the code for implementing these is also fairly straightforward. So now we have

02:53:07

the user database class. We are actually implementing this class and here you see that we have

02:53:12

a constructor and the constructor does not take any additional arguments apart from self.

02:53:17

And all we do is inside self we set a property dot users and that property dot users is set to an

02:53:25

empty list. Then we come to insertion. So now assume that we already have some users in a user database.

02:53:33

So we start out with a pointer set to zero and we go through all the valid positions in the

02:53:39

users list. So which is from zero to n minus one if there are n users. And then we find the first

02:53:45

user name greater than the new users use a name. So for instance if you're inserting a

02:53:50

himant then you go through a cache and barrage and then finally you realize that the next value is

02:53:55

probably Siddhan. So you want to insert himant before Siddhan. Right. So you want the first

02:54:01

user name that's greater than the new users use a name. And you check this property and as soon

02:54:07

as you find the next that the next user is greater than the user that needs to be inserted.

02:54:15

We break out and then we insert that user at that position. So this is the insertion you can

02:54:22

you know it's just four or five lines of code. So you can work through this code try to read this

02:54:26

code line by line and see how it works. Now similarly you have the fine function, the update

02:54:33

function and the list function they're all pretty straightforward. There's really not much here.

02:54:37

So this is an exercise for you because this is also the brute force of the simple implementation.

02:54:42

So this is an exercise for you to go through each of these functions and try it out.

02:54:46

And use the interactive nature of Jupiter to experiment and add print statements inside each of the

02:54:55

functions if you need inside each of the loops if you need more visibility into what's happening.

02:55:00

Okay. But what we will do is we will try and test this implementation out.

02:55:06

And the first thing we do is instantiating a new database of users using the user database class.

02:55:13

So here we say user database and that gives us a database of users.

02:55:24

And now let's insert some entries into this database. So we can now insert for instance,

02:55:30

we can insert the value hemant, Akash and Sadhanth. So here we have inserted three values into the

02:55:37

database. And now we can retrieve the data for a given user given their username using the

02:55:42

find method. So now we say database dot find Sadhanth that returns a user and we can check the value

02:55:50

of user. And you can see that now we have retrieved the data for Sadhanth, which is user names

02:55:54

Sadhanth names Sadhanth and emails Sadhanth at example.com. Now let's try changing the information

02:56:03

for a user. So to change the information we can call database dot update and then simply

02:56:08

pass in a new user object. So let's say we want to change the information from Sadhanth

02:56:13

Sadhanth you. So this is how we do it. We call database dot update.

02:56:22

And now if we find the information once again, if you can't find call database dot find once again,

02:56:28

we get back a user object and this time with the updated information. So we have created the

02:56:35

database, we have inserted some values into it and then we have retrieved values out of it and we

02:56:39

also updated them. And finally we can retrieve a list of the users in alphabetical order.

02:56:47

So now if we list it out, you can see here that we have the username Akash, we have the username

02:56:53

Himant and we have Sadhanth. These are the three values that were inserted and they are all

02:56:57

in alphabetical order of username. Now if we insert a new user, let's say let's say we insert

02:57:04

a barrage. We can make sure that barrage is inserted into the right position.

02:57:11

Okay, so that's how we use the data structure that we just created and you can use the empty

02:57:17

cells here to try out the various scenarios when you run the notebook. So just to recap,

02:57:25

we created a simple class inside which we are storing a list of users in sorted order of

02:57:30

username and then insertion is pretty easy. We simply loop through find the right position and insert

02:57:36

any new values. Finding values is very easy as well. We simply loop through and keep comparing

02:57:43

and updating values is simply a matter of finding them and then updating that specific value.

02:57:47

And listing is simple because we can simply return the internal list representation that we're

02:57:53

already storing in the sorted order of username. So that's the simplest solution or one of the

02:57:59

simplest solutions. There can be even simplest solutions maybe. So the next step now is to

02:58:06

analyze the algorithms complexity and identify any inefficiencies. So typically in an interview

02:58:13

setting, you may not want to implement the simplest solution. So you can actually skip step 4.

02:58:18

Now when you've described what the simplest solution is in English, in plain English, which was step 3,

02:58:24

you can directly jump to analyzing its complexity and then move on to optimization and

02:58:29

implementing the optimized version. But when you're practicing or when you're learning, it's always

02:58:34

a good idea to implement even the brute force solutions. So let's analyze the complexity. The

02:58:41

operations insert, find, update, involve, iterating over the list of users. And in the worst case,

02:58:46

they may take up to n iterations to return a result. Where n is a total number of users.

02:58:53

Now the list all function is slightly different because it simply returns an existing list.

02:59:00

So the list all function does not take linear time, it takes constant time.

02:59:05

Now based on this information, it's very easy to check to guess the time complexities of the various

02:59:10

operations. Insert, find and update have a order n post case time complexity, which means they can

02:59:18

take up to n iterations. However, the list function has an order 1 complexity, which means

02:59:25

irrespective of how many users you have in your database, it returns the list in the same amount of time.

02:59:33

Now if you want to display the list or if you want to iterate over the list, that may take you

02:59:37

additional effort. But getting the list itself is a constant time operation.

02:59:42

So that was the time complexity and an exercise for you is to verify that the space complexity

02:59:49

of each operation is order 1. And if you're wondering what we mean by complexity, then you can

02:59:55

go back and watch less and 1 where we talk about analysis of all the algorithms, complexities,

03:00:01

and the big own notation. What we're calling order of n, the big own notation, all of these

03:00:07

explain in a lot more detail. So you can go back to less and 1 and check it out.

03:00:12

Now we've created a simple solution and our first question might be to wonder if this is good enough.

03:00:18

And to get a sense of how long each function might take if there are 100 million

03:00:23

numbers, users on the platform. Let's create a while loop. Let's create a for loop.

03:00:30

And let's run it for let's see how many this is 1, 2, 3, 4, 5, 6, 7, 8. So let's run it for

03:00:38

10 million or 100 million numbers. So here we are creating a range of 100 million numbers.

03:00:43

And we're running a for loop which iterates over the entire range. And we're simply

03:00:48

performing a simple operation which we're not really using. We just multiplying the number by itself

03:00:53

to simulate what might happen if we have a database of 100 million users and we're trying to

03:01:00

access find a user. Now what is the worst case scenario here? Let's run this and you can already

03:01:07

see that it is taking a while. For 100 million users, the loop takes about 10 seconds to complete.

03:01:15

Here it took about 9.45 and a 10 second delay for fetching user profiles will definitely lead to

03:01:22

a suboptimal user experience and that may cause users to stop using the platform all together.

03:01:27

Now imagine you came to joven.ai and it took 10 or 15 seconds to load your profile and then

03:01:34

maybe even longer to load the other information and display it. You would not be happy with

03:01:38

the experience. And then a 10 second processing time for each user for each request each profile

03:01:44

request will also significantly limit the number of users that can access the platform at a time.

03:01:50

Because if you're running the back end server on one computer which has 8 cores, then each core

03:01:57

will be busy for 10 seconds each time a user tries to access the platform. So you can only

03:02:01

serve about 8 users in 10 seconds time. Now that's pretty bad. That could significantly limit

03:02:10

the number of users. You will have a significant outage if a lot of users come to the platform.

03:02:15

Or on the other hand, you may have to increase the cloud infrastructure at more servers at

03:02:19

bigger hardware, more cores, more RAM. And that could increase the cloud infrastructure cost for

03:02:26

your company by millions of dollars. So as a senior back end engineer, you must come up with a

03:02:31

more efficient data structure. And this is why choosing the right data structure for the

03:02:35

requirements at hand is a very important skill. Now we can clearly see that using a sorted list of

03:02:43

users may not be the best data structure to organize a profile information. So let's see what

03:02:48

better we can do here. And before we do that, let's save our work. So remember that

03:02:56

this notebook, we were running it on an online platform called binder and binder can shut

03:03:02

down at any moment because it is a free service. So what you want to do is run PayPal install

03:03:09

a Jovian and then import the Jovian library and you can then run Jovian.com it.

03:03:19

Now when you run Jovian.com it, what this does is this captures a snapshot of your Jupyter

03:03:25

notebook, whether you're running it on binder or you're running it on your own local computer and

03:03:31

it saves a snapshot of this Jupyter notebook on your Jovian profile. So here you can see now on my

03:03:37

Jovian profile. I have this notebook and I can go back on my profile and view the other

03:03:46

notebooks that I've created in the past. So your Jovian profile becomes a collection of all the Jupyter

03:03:51

notebooks that you're working on. So always just it takes just a couple of lines import Jovian

03:03:56

and run Jovian.com it. So always run Jovian.com it inside your notebooks. And if you want to resume

03:04:02

any work that you were doing, then all you need to do is click on the run button and then click

03:04:06

run on binder once again and then you can start executing the code within the Jupyter notebook once again.

03:04:13

So remember that binder is a free service so it will shut down after about 10 minutes of

03:04:19

inactivity which is if your computer goes to sleep or you change your tab, keep running Jovian.com

03:04:24

it from time to time. So now we have a simple implementation and we've analyzed it and determined

03:04:30

that it is not efficient, it is inefficient. So now we need to apply the right technique to overcome

03:04:35

the inefficiency. And we can limit the number of iterations required for common operations like

03:04:41

find, insert and update by ditching the linear structure that we hide earlier and organizing our

03:04:48

data in a more tree-like structure. So this is the structure that we use for our data and we will

03:04:53

call this a binary tree. Now this is called a tree because it vaguely resembles an inverted tree

03:05:00

trunk with branches. So you can think of this as the root. So this has the root and then you can

03:05:06

see each of these are like branches and then there are nodes where branches then split into multiple

03:05:12

branches. So these are called nodes and finally at the end you will have individual nodes which

03:05:18

do not have any more branches and those are called leaves. So these are some terms that I used.

03:05:24

The tree represents the entire structure. The top node is called the root and each element in the

03:05:30

tree is called a node. The top node is called a root and then the bottom most nodes which do not

03:05:36

have any sub trees or what are called children which do not have any children are called

03:05:41

leaves. So the root node has two children and then each node there with can have 0, 1 or two children.

03:05:48

So it's not necessary to have exactly two children but up to two children is what determines

03:05:54

a binary tree. So that's a binary tree. But the binary tree that we need will have some

03:06:04

additional properties which was what will make it efficient for our purposes. So you can see

03:06:10

one thing you can observe here is that the root node seems also seems to be the central value

03:06:16

if you sort the keys in increasing order. So what you will notice is on the left we have

03:06:24

keys which have which lie before jadish and on the right we have keys which lie after jadish.

03:06:29

So that's one thing and that is actually the second property listed here that the left

03:06:35

subtree of any node consists only of nodes which have keys that are lexical graphically smaller

03:06:44

than the nodes key. So the key for this node is barrage and that is lexical graphically smaller

03:06:48

than jadish and similarly himantanakash are all smaller than jadish and then this property

03:06:53

holds at every node. So at every node if you check sonach you can see that siddhanth is

03:06:58

less than sonach and vishal which comes to the right is a more than sonach and then siddhanth

03:07:03

vishal all three are greater than jadish. So when a binary tree satisfies this property it is

03:07:11

called a binary search tree. So that's what we are looking at here this is a binary search tree.

03:07:17

So that's the first property but we need the second property is that our nodes will have both

03:07:22

keys and values. Now sometimes you can create binary nodes, binary trees with just keys each

03:07:28

node will have a single number or a string inside it and you can call it the key or value or

03:07:33

element or whatever you wish. But what we want is we want the keys to be usenames so that we can

03:07:39

compare the keys easily. But along with each node we also want to associate a value which is the actual

03:07:46

user object. So if we are looking for himant let's say we started the root node. We see that jadish

03:07:52

is the root node and since it is a binary search tree we know that himant lies to the left

03:07:57

then we reach barrage. We know that himant will lie to the right of barrage so we go right

03:08:02

we reach himant and then we access the value stored at himant which is the user details for himant.

03:08:08

So we need both keys and values in a binary tree and this is what is called a tree map or a map.

03:08:16

In many languages. And then finally this tree that we will create this data structure that we will

03:08:24

create it will be balanced. So here what we are looking at is each node has two children left and

03:08:30

right but it is also possible to have an unbalanced tree where you only have one child on each

03:08:35

on maybe one of the sites. So we will require it to be balanced which means that it does not

03:08:41

skew too heavily in one direction and we will talk about what balancing means and we will talk

03:08:47

about how to check if a tree is balanced and how to keep a tree balanced. So we will go over all

03:08:51

of these things step by step but these are some of the properties that we want our final data

03:08:56

structure to have. So one important property of a tree of a binary tree is the height of a tree.

03:09:08

In fact if you start counting you can say this is level zero where you have one node and this

03:09:12

is level 2. This is level 1 where you have two nodes, the left and right,

03:09:18

the left and right child. Of the root node and then this is level 3 level 2 where you have 4 nodes

03:09:25

the left and right child of the first node on level 1 and the left and right child of the

03:09:30

second node on level 1. So you can see that the number of nodes in each level in a balanced binary

03:09:37

tree is double of the number of nodes of the previous level. So if you have a tree

03:09:46

of height k, which means a tree which has exactly k levels, then here is the list of the

03:09:51

number of nodes at each level. Now level 0 will have one node, the root node. Level 1

03:09:56

will have two nodes, its children. Level 2 will have four nodes, their children, so that's

03:10:01

four nodes is two times two or two to the power two. Level 3 will have eight nodes,

03:10:07

two nodes for each of these four nodes, so that's two to the power three. And similarly,

03:10:11

if you keep going down, level k minus one, the final level will have two to the power

03:10:15

of k minus one nodes. So that if the total number of nodes in the tree is n, then it follows

03:10:21

that n is 1 plus 2 plus 2 square plus 2 cube plus so on plus two to k minus one. So what we're

03:10:31

trying to determine here is what is the relationship between the height of the tree and

03:10:34

the total number of nodes in the tree. And this is the relationship and we can simplify

03:10:38

it a bit. If we add 1 to each side, you can see here that this side we get n plus 1 and

03:10:43

this side we get 1 plus 1, which gets simplified as 2 or 2 to the power 1. And then we

03:10:48

can add 2 to the power 1, we're 2 to the power 1 and that gets simplified as 2 to the

03:10:51

power of 2. Then we can add 2 to the power of 2 and 2 to the power of 2 and that gets

03:10:55

simplified 2 to the power of 3. And we can keep performing this reduction, we can keep adding

03:11:00

these together, till we finally end with 2 to the power k minus 1 plus 2 to the power

03:11:04

k minus 1, which is simply 2 to the power of k. So what that gives us is that k, the

03:11:10

height of the tree is log of n plus 1, which is approximately or in almost never case less

03:11:17

than log n plus 1. So that's a bit of an approximation we're doing here, but it is the height

03:11:23

of the tree is less than log n plus 1. So to store n records, we require a balanced binary

03:11:30

search tree of height no larger than log n plus 1. Now this is a very useful property

03:11:36

in combination with the fact that nodes are arranged in a way that it makes it easy to find

03:11:40

a specific key simply by following a path down from the root, the binary search tree property.

03:11:46

And we'll see soon by the end of this lesson that the insert find and update operations

03:11:54

in the balanced binary search tree have complexity order of log n. So in our original

03:11:58

implementation a brute force implementation they had order n and this time we reduce the complexity

03:12:03

to order log n and that is far better and we'll see how that happens. So that's a quick introduction

03:12:12

to binary search trees we've had enough theory now let's get into some implementation.

03:12:18

But before that we have the second question. Now binary trees are very commonly used as data structures

03:12:34

for our idea of different in a variety of different languages for instance java c plus plus python

03:12:39

java and c plus plus have this concept of a map which is represented using a binary tree.

03:12:44

And it is also used in file systems. So binary trees are also used in file systems to store

03:12:52

indexes of files. So when you browse your file system or when you search for a specific file it

03:12:57

is a binary tree that is used to look up the file and find the location of the file.

03:13:03

Now that's where that brings us to our second question of today.

03:13:07

Now you can find the second question on a LinkedIn profile. So once again go to

03:13:11

LinkedIn.com slash school slash java in AI and you will find the second question here.

03:13:21

The second question is which tree-based data structure is used to store the index

03:13:27

in the Windows file system and who invented the state structure. So like this question follow us

03:13:36

and comment with your answer and you can start chance to win a swag pack.

03:13:44

So if you repeat the question which tree-based data structure is used to store the index

03:13:48

in the Windows file system also known as anti-FS and who invented the state structure.

03:13:59

Okay so let's get to the implementation of binary trees. And here's a very common interview

03:14:04

question that you might get. Implement a binary tree using python and then show its usage

03:14:08

with some examples. So what we will do as we implement binary trees and binary search trees is

03:14:14

to also cover many common interview questions. In fact we will cover exactly 15 so that's a quite

03:14:19

a few. And the first one is to implement a binary tree. And to begin we will create a very simple

03:14:25

binary tree. So we will not have any of the special properties like key value pairs and binary

03:14:31

search tree and balancing rather and we will also use key numbers as keys within our nodes

03:14:37

because they are simpler to work with. So here is an example binary tree so we have a root node

03:14:42

and then we have a left child in right child. And here's a simple class representing this

03:14:47

representing a single node within the tree. So we are calling this class tree node and it has a

03:14:53

constructor function. It simply takes a key and it sets self dot key to key. It also has a

03:15:00

couple of other properties self dot left and self dot right which are initially set to none. So

03:15:05

each node when it's created exists independently of other nodes. And now let's create nodes

03:15:11

representing each of these nodes. So we have node 0 we're calling it we're calling tree node

03:15:19

with the value three then we have node one and node two. So there you go now we've created the

03:15:24

nodes and we can verify that it is of the type to a node you can see here. And if we check the key

03:15:30

of node 0 you can see that it has value three. And we can now connect the nodes by setting

03:15:35

the dot left and dot right properties of the root node. So if you go to node 0 and set dot

03:15:40

left to node one, now we've connected node 0 to node one. And similarly if we set node 0 dot right

03:15:47

to node two, now we've connected node 0 and node two. And that's it we're done. So now we have

03:15:53

three nodes and then we've connected each of those nodes and we may also just want to track

03:15:58

which is the root node. So we can create a new variable called tree and simply point it to node

03:16:03

0. So tree points to the root node of the tree and then the root node is connected to its

03:16:09

children and the children will be connected to their children and so on. So you can check here

03:16:14

that if we check tree dot key, we get three and if we check tree dot left dot key.

03:16:21

So three is the root node. It has a value three. 3 dot left is this node. So it should have the

03:16:25

value four and tree dot right dot key should have the value five. Okay so pretty straight forward

03:16:32

and that's pretty much the answer to the question implement a binary tree and Python.

03:16:38

Now going forward we will use the term three to refer the node root node to refer to the root

03:16:44

node and the term node can be used to refer to any node in a tree not necessarily just the root.

03:16:49

Okay so here's an exercise for you. Try to create this binary tree. So now you have a root

03:16:55

node here and then you have a left child and right child and then this left child has another

03:16:59

left child but does not have a right child. Similarly here you have another right child and then

03:17:05

it has a left child which does not have a left child but has a right child. Okay so this is slightly

03:17:10

more complicated tree structure and try to use these cells these empty cells that are given here

03:17:15

to replicate this tree structure and then try to view the different levels of that tree manually.

03:17:22

Okay. Now please do that because that's a great exercise in understanding how the structure

03:17:27

works and how to connect the nodes but it's a bit inconvenient to create a tree by manually

03:17:32

connecting all the nodes. In fact here you may have to make a total of 1, 2, 3, 4, 5, 6,

03:17:37

7, 8, 9 connections. Right. So what we can do is we can write a helper function which in

03:17:42

convert a tuple and the tuple will have this kind of a structure. So a tuple is simply

03:17:49

is kind of like a list except that it is represented with these round brackets of parenthesis.

03:17:56

So a tuple will have this kind of structure it will have three elements and then the middle

03:18:00

element will represent the value or the key within the root node. The first element will itself

03:18:07

also be either a tuple if the left child is an entire subtree or if it is a single number then

03:18:15

it will be just a number and then the right element will represent the right subtree. Okay. So here's

03:18:22

an example here is one tree tuple. Now if you see this tree tuple it has three elements this is the

03:18:28

first element. This is the second element and then this is the third element. So this first element

03:18:34

two represents the root node and then this so this second element two represents the root node.

03:18:43

This first element or element at position 0 represents this subtree. So you can see here that

03:18:50

in this subtree if you look at just that subtree of that tree. 3 is the root in that subtree and then

03:18:56

Then one is the left side and there is no right side, so that's what this represents.

03:19:01

And then for this subtree where 5 is the root node and then you have 2 other subtries,

03:19:06

that's represented here.

03:19:07

So 5 is the root node and then you have a subtree here and a subtree here.

03:19:11

So this is a very easy way, this is a this is a convenient way for us to represent a binary

03:19:16

tree.

03:19:17

And what we can do is we can define a function past tuple and this past tuple function can

03:19:22

take a tuple like this and then convert it into a tree like structure of length

03:19:28

nodes using the 3 node data structure using the 3 node class that we have defined above.

03:19:34

So we call the past tuple function with some data for instance this tuple and the past

03:19:40

tuple first checks if data is of the type tuple and it has a length 3 if these two

03:19:49

things hold true.

03:19:51

Then first we create a node, we create a node with data 1 so in this case we create a node

03:19:55

with 2 as the key and then we set the left and the right subtries of the node.

03:20:03

And then we are doing something very interesting here, we are calling the past tuple function

03:20:07

once again.

03:20:08

So we call past tuple this time, so this is called recursion when a function calls itself

03:20:14

inside it that's called recursion.

03:20:17

So we call past tuple with the first element which itself is a tuple.

03:20:22

So once again that calls another invocation to past tuple and for a moment let's assume

03:20:27

that that returns the proper subtree, the proper node so we set that node which got

03:20:32

created to node.left and similarly we create the right subtree using these values and

03:20:39

then we set that node to node.right okay.

03:20:43

Now you might wonder in the function we're calling itself so when will the stop can't

03:20:49

it go on forever and that's where you have to track the actual function calls.

03:20:53

So when we call past tuple with the entire tuple first it calls past tuple with this and

03:21:00

when you call past tuple with this then you can see that 3 is used to create a node

03:21:06

and then past tuple is called with 1.

03:21:09

So when past tuple is called with 1 this condition no longer holds true and we also

03:21:16

check the next condition which is if the data if 1 is none and 1 is not none so this

03:21:21

condition does not hold true.

03:21:23

So we fall into the else condition and we simply create a node right.

03:21:26

So we just create a node and this time we are not calling past tuple once again right.

03:21:31

So this is called a terminating condition of the recursive function and similarly once

03:21:37

we get back the result from 1 then we call past tuple with the value none.

03:21:41

Once again this condition is not entered and this condition matches so we set node equal

03:21:46

to none and then we return the node okay.

03:21:49

So when we reach either a leaf node which is either a single number or we reach the

03:21:53

value none that is when we stop in walking the function recursively and then the function

03:21:59

returns and that's how the entire tree gets converted.

03:22:03

So this is a very powerful idea in programming the idea of recursion the idea of functions

03:22:08

calling themselves and it can seem unintuitive and confusing at first.

03:22:14

So one thing you can do is you can add a print statement here inside this function

03:22:18

to see how it works to see how the different calls are going so when you call past tuple

03:22:22

with the entire tuple what are the internal calls that are made and study how the result

03:22:29

comes out maybe try it on pen and paper.

03:22:32

But it's a very important technique for you to learn you will be asked or you will find

03:22:37

applications of recursions in many places throughout your programming or data science career.

03:22:43

So do learn it.

03:22:44

So let's now call past tuple with this tuple as an input and let's see okay so that return

03:22:51

a tree and then that tree is of the type tree node that's great and now let's examine

03:22:57

the tree to verify that it was constructed as expected.

03:23:01

Now we check tree to dot key so tree to dot key should be pointing to the root node which has

03:23:06

the key to and then let's check the level one so that was level zero let's check level one.

03:23:12

So let's check tree to dot left dot key and tree to dot right dot key.

03:23:18

You can see we get the values three and five let's check the next level

03:23:22

on this level we have tree to dot left dot left and then we have tree to dot left dot right

03:23:27

but there's no value there so we can't really check for a key here then we have tree to

03:23:32

right dot left and tree to dot right dot right so you can see that tree to dot left dot left dot

03:23:38

key is one but tree to dot left dot right is none because there is no child here no right child

03:23:45

then we have left dot key and right dot key and that gives you three and a seven and similarly

03:23:51

you can now check level four level three as well so here are all the levels of the tree so it

03:24:00

looks like the tree was constructed properly and you can see the power of recursion at play here

03:24:05

that the recursive function can now construct trees of any levels now you can create

03:24:10

tuples within tuples and as long as they have the right structure as long as you have this

03:24:16

three element structure whether left element represents a left subtree the right element represents

03:24:22

the right right subtree in the middle element represents the current node you can construct a tree of

03:24:27

any size so here's an exercise for you we've defined a function to convert a tuple into a tree

03:24:39

define a function now to convert a tree back to a tuple so if you have a binary tree

03:24:44

can work return a tuple representing this same tree for instance for the tree created about

03:24:48

three two calling tree to tuple should return this tuple original tuple which is used to create the tree

03:24:54

and here's a hint on how to do this use recursion so do fill this out

03:25:01

and see if you can figure out how to do this so now we have defined a class for a binary tree

03:25:08

and we also have a way for creating a binary tree from a tuple so now let's create another

03:25:14

helper function to display all the keys of the tree in a tree like structure for easier visualization

03:25:20

so here we'll just use we'll call this function display keys and we'll not get into the code

03:25:25

for this because it's once again it's a pretty straightforward but there are a few conditions we need

03:25:29

to handle but here's what it will give us when we call display keys on a tree

03:25:36

then we'll get this kind of a representation of a tree and you can see that this is not exactly

03:25:40

the same representation as this you will have to take this representation and then mentally rotated

03:25:45

by 90 degrees in the clockwise direction to get a representation like this but you can see roughly

03:25:51

that the root node is two and then it has a left child three and it has a right child five

03:25:56

then three again it has a left child one and there is no right child now five has a left child three

03:26:02

and three has no left child and three has a right child four and so on so the exact same

03:26:07

structure has been replicated here for us to view visually this is a very useful thing we're

03:26:12

spending all this time here or talking about how to create trees and how to

03:26:18

visualize trees because the easier you make it for yourself to create trees the more likely you

03:26:23

are to test the easier it is for you to test different scenarios out so always spend a little bit

03:26:29

of time coming up with good string representations for any data structure you create something

03:26:33

that helps you visualize them and an easy way to create these data structures okay

03:26:40

so now we have a way to visualize the tree as well that's great now here's an exercise for you

03:26:46

try to create some more trees and visualize them using display keys and you can use this tool

03:26:51

xcalidraw.com and that's where how that's how these diagrams were created as a digital

03:26:57

whiteboard so you can create some trees you can create trees like this and then try to create

03:27:01

come up with tuples for those trees try to create those trees using the

03:27:06

powerstupile function and finally try to display them okay so experiment with it and see

03:27:11

explore what are all the different tree structures that you can create.

03:27:16

Now the next one of the frequently asked questions in interviews is to traverse a binary tree

03:27:23

traversals are very common so you may face one of these three questions write a function to

03:27:28

perform the in order traversal of a binary tree or write a function to perform the pre-order

03:27:33

traversal of a binary tree or write a function to perform the post-order traversal of a binary tree

03:27:38

so what do you mean by a traversal a traversal refers to the process of visiting each node of a tree

03:27:44

exactly once. Now what do you mean by visiting by visiting it could mean any operation but

03:27:50

generally it refers to either printing the key or the value at the node or adding the node's key

03:27:57

to a list and then there are three ways to traverse a binary tree and return a list of visited keys.

03:28:05

So the first one is called in order traversal and the in order traversal now traversal is defined

03:28:10

recursively because binary trees have this recursive structure so you will see that almost all the

03:28:16

functions that we write will have some sort of a recursive structure. So in order traversal

03:28:23

involves first traversing the left subtree recursively in order then traversing the current node

03:28:30

and then traversing the right subtree recursively in order. So what does that mean? Well we start

03:28:36

out with this tree and we are traversing it in doing an in order traversal. So we try we look at

03:28:43

the root node and then we realize that there it has a left child so it has a left subtree.

03:28:48

So we do not visit it yet which means we do not print it or we do not add it to our list yet.

03:28:52

Rather we follow the path on the left side and then we come across three and then we realize that

03:28:59

okay three also has a left child so we don't visit it yet. So then we go down to one

03:29:05

we go down to one and now it does not have a left child or a right child so we can visit one

03:29:11

then we go to three and now we so we've visited the left subtree of three. So now we can visit three

03:29:19

and then the next step is to visit the right subtree of three but of course three does not

03:29:23

have a right child so there is no right subtree to visit. So we can move back up to two.

03:29:28

So now we've visited the left subtree of two. So now we can visit two so we print one three two

03:29:35

and now once we've visited two we can now visit the right subtree of two.

03:29:40

So to visit the right subtree we go to five. Once again we realize that five has a left subtree

03:29:45

so we go to three. Now three doesn't have a left subtree so we can visit three. Then we visit four.

03:29:51

Then now since we visited the left subtree of five we can now visit five

03:29:55

and similarly we then visit six, seven and eight okay so that's the in order traversal of the

03:30:01

tree and then there is another traversal called preordered traversal which is slightly different

03:30:06

where you traverse the current node first. So here we start out at two and we say that

03:30:10

okay we're going to visit two first so we visit two or print it or add it to a list.

03:30:14

Then we traverse the left subtree and then we traverse the right subtree. So we go we visit three

03:30:19

and one and then we come to the right side we visit five and three. So you can compare these two

03:30:24

diagrams and see how in order and preordered traversal are different. Now these are very important

03:30:29

for you to understand because they're great examples of different

03:30:36

functions which have very similar implementations but there are just one or two things you will need to

03:30:41

change and these are recursive as well. So do understand the subtle difference between them and

03:30:47

second they are very commonly asked in interviews you will most likely face some coding

03:30:52

assignment or an interview where you will be asked to perform a traversal of a binary tree.

03:31:00

And then finally there's another order called another we traversal called the post audit

03:31:06

and I'll let you guess how it works you can also look it up. And here's an odd implementation

03:31:11

of in order traversal. Now it may seem a little complicated but it's actually pretty straightforward.

03:31:15

So let's look at it here what we do is given a node. We first traverse the nodes left subtree

03:31:23

then we create so that should return a list a list of all the keys and then we create a list

03:31:30

with just the nodes key. So we get the list of keys from the left subtree in with the in order

03:31:35

traversal then we get add to it the current node's key and then we call traverse in order

03:31:42

with the right subtree and that recursively keeps adding these keys each one and the end condition

03:31:49

so the terminating condition for the recursion is when we hit none so when we hit a node which

03:31:54

does not exist so that means we come there from a parent which does not have a left or right child

03:31:59

then we return the empty array. Okay so let's try it out with this tree so this is the

03:32:04

tree we have and we just saw it's traversal. Now if it traversed we tree in order we get the values

03:32:12

one three two three four five six seven eight and we can verify here we have one three two three four

03:32:20

five six seven eight. So that was the in order traversal of a tree. Now the exercise for you is to

03:32:27

print the pre order and post order traversal of the binary tree and you can test your implementations

03:32:32

by making submissions to these problems on leadcode.com okay so that was our discussion about

03:32:40

traversals another thing that you may get asked commonly is writing functions to calculate the

03:32:48

height or the depth of a binary tree and the writing of function to count the number of nodes in a

03:32:52

binary tree once again these can be expressed recursively as well now the height of a tree

03:32:59

given a node is simply one plus maximum of the height of the right subtree or the left subtree.

03:33:08

The height of a tree is defined as the longest path from a root node to a leaf so you can see that the

03:33:14

longest path from root node to the leaf is of length four so two five three and four.

03:33:21

And the way to do get the longest length of the longest path is by checking the max of the left

03:33:26

height, right height and then adding one to it and of course the terminating condition here also

03:33:31

is if you hit a node that does not exist you return 0. So that's how you get the height of a tree

03:33:39

and you can check that the height of a tree is four then here's another function to counter

03:33:44

number of nodes in a tree once again really simple all you do is this time instead of checking the

03:33:49

maximum we simply get the size of the left subtree get the size of the right subtree add them and add

03:33:54

one to it. So here you can see that there are nine elements in the street three six and nine

03:34:01

so we get three size of three as nine. Now here are a few more questions relating to the path length

03:34:07

in a binary tree so you can just check there's a concept of maximum depth and minimum depth and

03:34:12

then there's also the concept of a diameter so you can try out both of these. Now as a final step

03:34:21

what we can do is we can compile all the functions we've written all the methods as methods within

03:34:26

the tree node class itself and this technique is called encapsulation where we are encapsulating

03:34:31

the data as well as the functionality related with the data of the data structure within the same

03:34:35

class and this is really good programming practice. So as you write more code try to think about how

03:34:43

you can create these classes with not just the information inside them but also with the relevant

03:34:48

methods inside them. Okay so we've now added the methods height size, traverse in order

03:34:54

display keys to tuple and we've also added these methods STR and rapper and remember quiz one

03:35:00

or you can go on LinkedIn and post an answer to what these functions do and finally pass tuple

03:35:05

as well. So all of these functions are now added within the class and you can try it out here.

03:35:12

So for instance here we have a tree tuple and we can call tree node dot pass tuple

03:35:17

to convert this tree tuple into a tree. So you can see that now we are also representing the

03:35:23

binary tree itself using this tuple like representation but we can also display it in this

03:35:28

hierarchical structure using display keys. Then we can check the height using tree dot height

03:35:33

we can check the size using tree dot size and we can traverse the tree in order using

03:35:38

traverse in order and we can convert the tree to a tuple using tree dot tuple. So do create

03:35:47

some more trees and try out the operations that we've just defined or you can also try adding

03:35:51

more operations to the tree node class and before continuing we can just save our work so I'm just

03:35:58

going to import Joven and run Joven.com it. So that concludes our discussion on binary trees.

03:36:11

Next let's talk about binary search trees. Now a binary search tree or a bst is a binary tree

03:36:19

that satisfies these two conditions. The left subtree of any node should only contain nodes

03:36:27

with keys less than the current node ski and then the right subtree of any node should only

03:36:33

contain nodes with keys greater than the current node ski and we can see that this is let's just copy

03:36:40

this over. So we can see that this node this tree here is actually a binary search tree and you can

03:36:52

verify that these two properties hold for each of these nodes and it should follow from these

03:36:58

two conditions that every subtree of a binary search tree must also be a binary search tree. So I

03:37:04

can let you verify that that if you pick up any subtree inside so you pick up any node and you see

03:37:09

the tree under that node you will see that it is a binary search tree. So here are some questions

03:37:16

that are often asked relating to binary trees and binary search trees and we've lumped them together

03:37:21

because we'll answer them with a single function. So here's a function that you might be expected

03:37:26

to write. So write a function to check if a binary tree is a binary search tree which means

03:37:32

ensure that these two conditions hold and second write a function to find the maximum key in a

03:37:38

binary tree so this could be a generic question finding the maximum key and here's another question

03:37:46

that you might face write a function to find the minimum key in a binary tree. So what we will

03:37:51

do is we'll answer all of these questions together with a single function called is bst so is bst

03:37:58

takes a node and then is bst returns three things. So if you look at the return value it returns

03:38:04

whether the node and the tree under that node is a bst. So here so this is going to be the value

03:38:14

determining it's going to be either true or false telling us whether the tree under that node

03:38:20

with that node as root is that a bst. It also returns the minimum key from that entire tree

03:38:26

and it also returns the maximum key from that entire tree. Now why are these two useful we'll see

03:38:31

in just a moment. So the way we calculate is bst node is by actually looking at the left

03:38:39

subtree and the right subtree recursively. So we call is bst on the left subtree of the node

03:38:45

and we call is bst on the right subtree of the node. So we get back three values which is

03:38:51

is the left subtree of binary search tree is the right subtree binary search tree. The minimum

03:38:57

key in the left subtree the minimum key in the right subtree and then the maximum key in the left

03:39:01

subtree and the maximum key in the right subtree. So now what we can do is we can say is bst node

03:39:07

So, is the entire tree of binary search tree?

03:39:11

Well, if the left subtree is a binary search tree and the right subtree is a binary search tree

03:39:15

and then we verify these two properties which is the maximum key in the left subtree is either none,

03:39:24

which means that there is no left subtree or the current node's key is greater than the maximum key.

03:39:29

And the minimum key in the right subtree, the smallest key in the right subtree is either none,

03:39:37

which means that there is no right subtree or the minimum key in the right subtree is greater than the current node's key.

03:39:43

So, that this was condition 1 and condition 2 and that tells us whether this entire tree is now a binary search tree.

03:39:53

And then finally, we can also calculate the minimum key in maximum key simply by computing the minimum of the left minimum,

03:40:01

node dot key and right minimum and the maximum can be calculated by checking the maximum of the left maximum,

03:40:07

node dot key and right maximum.

03:40:09

So, what we return from the is BST function is whether a node and the tree represented rooted at that node is a binary search tree

03:40:20

and then the minimum and maximum key out of it.

03:40:22

So, if we look at this tree right here, let us verify whether this is a BST and we will before we check,

03:40:30

we can probably tell that it is not because you can see that 3 appears as a left subtile of 2,

03:40:35

but 3 the key is greater than 2 and that is a problem.

03:40:39

So, this is a violation of the property elsewhere this property satisfied.

03:40:43

You can check any other node here and you will find that the left subtree is always smaller than the node and the right subtree is larger than the node.

03:40:52

So, let us check is BST tree 1, it is not, so that is false.

03:40:58

Now, on the other hand this tree is a BST, this tree that we have been looking at all this while.

03:41:05

So, once again, we can create this using tree node dot parse tuple and node that keys can,

03:41:12

the way we implemented tree node keys can not only be numbers, but they can also be strings.

03:41:18

So, we do not need to change anything here and that creates tree 2 and we can even display tree 2.

03:41:24

So, if you do 3 to dot display keys, you can see that it has this structure where jadish is at the center.

03:41:34

And then on the left you have barrage on the right you have sonach, barrage in sonach.

03:41:38

Then you have akash, shaman, siddhant and vishal.

03:41:43

And this is a BST, so you get back through here and the smallest value here is akash and the highest value is vishal.

03:41:51

As you can verify in alphabetical order.

03:41:54

So, that is pretty handy, now we have a way to check if a binary tree is a binary search tree.

03:42:01

And this is again a very common interview question that you might face.

03:42:05

Next, remember that we need to store not just keys, but also user objects within each key with each key within our BST.

03:42:14

So, what we do is we will define a new class called BST node to represent the nodes of our binary search tree

03:42:20

and BST node will not only have the key, but in the constructor it can also accept a value and this is optional.

03:42:28

So, we will set the key and we set the value, we will also set the left and right.

03:42:32

Apart from this we also set another property called parent and the parent will point to the parent nodes.

03:42:37

So, for instance if this node is a left subtree of this root then the parent of barrage will point to jadish.

03:42:43

And this will be useful for upward traversal.

03:42:46

Now, if you are given the pointer to a node and you have to go back and find the root of the tree, the parent will be helpful there.

03:42:53

So, this is our BST node.

03:42:56

And let us try to recreate this BST right here with usenames as keys and user objects as values.

03:43:03

So, first we create level 0. So, level 0 we create BST node. Now, the key is jadish or username, which will be just the string jadish.

03:43:12

And then the value will be the jadish user object. So, we have created that and we can check its key and value.

03:43:18

You can see that jadish is the key and then the user object is the value.

03:43:24

Let us create level 1.

03:43:28

Now, level 1 is we set tree dot left to BST node, barrage dot username and barrage.

03:43:35

Now, one other thing that we should do here is once we set it, we should set tree dot left dot parent.

03:43:41

To tree.

03:43:43

And similarly, we set tree dot right.

03:43:45

It's not tree dot right is sonach. So, we set BST node with sonach to username as the key and sonach is the value.

03:43:52

And then we can set tree dot right dot parent as tree.

03:43:57

Now, you can view these values. So, now you can see that we have inserted barrage and the username barrage.

03:44:04

We have inserted sonach and the user sonach here as keys and values respectively.

03:44:10

Now, the exercise we used was then try to add the next level of keys and values and then verify that they were inserted properly.

03:44:17

Well, you can see now that we now have a way to represent the data.

03:44:24

Both the user names and the user objects in a binary search tree. So, we are getting pretty close to the data structure that we want to create.

03:44:31

Once again, we can display the keys of the tree by calling the display keys function.

03:44:37

Now, this is also rather nice. This is a good thing about python that because python functions are dynamic because you do not need to specify the types of the objects while defining the function.

03:44:48

The same display keys function can be used both with tree node and BST node classes.

03:44:54

So, all it requires is that the object of your class should have a property dot key for it to be able to display the keys in this visual setting.

03:45:04

And in the same as true with most of the other functions that we have defined. In fact, any function we have defined for tree node will also work for BST node.

03:45:14

Okay, so moving right along.

03:45:17

Now, we have a way to construct a BST but it it's a bit inconvenient to insert values manually because what we're doing so far is we're manually checking whether we should insert a value in the left of the right.

03:45:30

Rather, there should be a way to do it automatically. We should be able to call a function insert.

03:45:34

And here's this is a common question as well, write a function to insert a new node into a binary search tree.

03:45:40

So, we'll use the BST property to perform insertion efficiently.

03:45:46

Once again, let's grab a copy of this tree here so that we can think about it easily.

03:45:56

Okay, so now we have the tree and let's say we want to insert a new user with the username Tanya into this tree.

03:46:04

So, first we started the root and then we compare the key to be inserted with the current node's key.

03:46:10

So, the current node is the root. So, we compare Tanya with Jadish and we see that Tanya is greater than Jadish because T comes after J.

03:46:18

So, obviously, Tanya should not be inserted into the left subtree. Rather, Tanya should be inserted into the right subtree.

03:46:24

So, if the key is smaller, we recursively insert it into the right subtree and if the key is larger, we recursively insert it into the right subtree.

03:46:32

So, then we encounter Sonaksh. Tanya is also greater than Sonaksh, T is greater than S, T comes after S.

03:46:38

So, once again, we call recursively called insert on this subtree that subtree rooted at Vishal.

03:46:44

This time, we notice that Tanya is smaller than Vishal, so T is less than V.

03:46:48

So, then we need to recursively insert in the left subtree, but there is no left subtree here.

03:46:54

And this is the point at which we can create a new node and attach it as the left child of Vishal.

03:47:00

So, you can see that the node Tanya will get added here at this position in the tree.

03:47:06

So, here is the recursive implementation of insert exactly what we just discussed.

03:47:12

First, we check if the key is less than the current nodes key and if that is the case, then we insert it into the left subtree.

03:47:20

Then we check if the key is greater than the current nodes key and if that is the case, we insert it into the right subtree.

03:47:26

And the ending condition is that if the node is none, which means if we hit a position where we do not have a left subtree and we need to go left

03:47:36

or we do not have a right subtree and we need to go right, then we create a new node.

03:47:40

So, we create new node, node equal to BST node and then we return the node.

03:47:44

So, we return the node and this is an interesting thing that we are doing here.

03:47:49

We are returning the root node back from insert.

03:47:52

So, when we called insert with node.left, we get back the pointer to the left subtree.

03:47:58

So, we can set it back to node.left and we can also set the parent of the left subtree to node.

03:48:04

So, this is just updating the parent.

03:48:06

So, just study this function carefully.

03:48:10

See how it works?

03:48:11

It does exactly what we just talked about.

03:48:14

And it finally returns a pointer to the tree once again.

03:48:20

So, let us use this to recreate the tree that we had here.

03:48:24

Now to create the first node, we can call the insert function with none.

03:48:28

So, initially we do not have a tree to begin with.

03:48:30

We just called insert with none.

03:48:32

And remember that insert after performing an insert in social returns the pointer to the tree.

03:48:40

So, we call insert with none.

03:48:42

And we want to insert the value jadeesh.usename.

03:48:45

And we want to insert the key jadeesh.usename with the value jadeesh.

03:48:49

So, that gives us a tree.

03:48:52

And now the tree has one element.

03:48:54

You can see tree dot key and tree dot value.

03:48:58

And now the remaining nodes can just be inserted into tree.

03:49:03

So, now we call insert with tree.

03:49:05

And call it with barrage.usename and barrage.

03:49:08

Then we call it with sonach.usename and sonach.

03:49:11

Akash.usename and Akash.

03:49:13

And this way.

03:49:14

So, we are adding barrage.

03:49:15

Then we are adding sonach.

03:49:16

Then we are adding Akash.

03:49:17

Himant.

03:49:18

Siddhanth Visal.

03:49:19

And see that we are not specifying exactly where these nodes need to be inserted.

03:49:24

But you can see that once these nodes are inserted,

03:49:27

then they are inserted in the right places.

03:49:29

So, jadeesh.

03:49:30

You can see that the binary society properties preserved here.

03:49:34

And also we have exactly replicated the tree structure that we had here.

03:49:38

So, the left subchild of jadeesh is barrage and the right child is sonach.

03:49:43

For barrage, the left child is Akash and the right child is Himant.

03:49:46

And so on.

03:49:48

Now, note however that the order of insertion of nodes can change the structure of the resulting tree.

03:49:54

So, for instance, if we insert all the nodes in the increasing order of username.

03:50:01

So, for example, here we are inserting Akash.

03:50:04

Himant.usename.

03:50:05

So, this is the lexicographic increasing order.

03:50:08

And we try to display that tree.

03:50:10

This is what we end up with.

03:50:12

So, we end up with an unbalanced or a very skewed tree.

03:50:15

And you can see why it was created as a skewed or unbalanced tree.

03:50:18

Well, let's look at it.

03:50:19

So, we start out with Akash.

03:50:21

So, we have a single node.

03:50:22

And then when we try to insert barrage, we realize that we need to go right.

03:50:25

So, we insert barrage here.

03:50:27

Then we try to insert Himant.

03:50:28

Then we realize that we need to go right from Akash and right from barrage and go to Himant.

03:50:34

And then we keep going this way.

03:50:36

So, how you set up the root node and how you set up each subtree.

03:50:39

And the order and image you insert the nodes is very important.

03:50:42

And that can create a huge skewed within the tree.

03:50:45

Now, skewed or unbalanced trees are problematic because the height of such trees

03:50:51

is no longer logarithmic compared to the number of nodes in the tree.

03:50:55

So, earlier we had deduced that in a balanced tree.

03:50:59

If containing N nodes, the height is log N or log N plus 1.

03:51:04

And that makes the operations like insert, update and find, very efficient.

03:51:09

But here where you have a very skewed tree, the height can actually match the number of nodes.

03:51:13

For instance, this tree has 7 nodes and it has the height 7.

03:51:18

And in these skewed trees, once again you may get back the fact that insertion, finding and update can be order N.

03:51:27

Because you may have to traverse the entire height of the tree, which is equal to the number of nodes of the tree.

03:51:34

And that may once again defeat the purpose of using a binary search tree in the first place.

03:51:39

So, maintaining the balance of a binary search tree is very important and we will see how to do that.

03:51:46

So, we have seen how to insert a node.

03:51:48

Now, the next thing is to find the value associated with a given key in a binary search tree.

03:51:53

So, once again we can follow a recursive strategy here.

03:51:56

Similar to insertion.

03:51:58

So, we check, we start from the top.

03:52:01

Let's say we want to find the key hemant.

03:52:03

We start from the top and we compare it with the root node.

03:52:06

Now, here if it matches the root node, we can simply return this node.

03:52:10

If it does not, then we check whether we need to go left or right.

03:52:13

Since hemant comes before each other, we need to go left.

03:52:16

Then we encounter a barrage and here we realize that we need to go right.

03:52:19

And finally, we encounter hemant and we return.

03:52:22

Another option is that we have a value, let's say Thanya, which does not exist here.

03:52:26

So, if we try to search that, we may go in this kind of a direction and we end up at an empty place.

03:52:31

So, in that case, we simply return none.

03:52:34

So, you either find a node and return it or you return none.

03:52:38

So, you can see here that if we call fine tree with hemant, we get back the details for hemant.

03:52:46

And very interestingly, because there is a balance tree, we only had to take two steps and not go through the entire tree.

03:52:54

And in the worst case, you can check that any path from the root to any leaf in a balance tree will only be two steps long.

03:53:02

And that's what makes it so convenient.

03:53:05

Now, on the other hand, if we try to find the ketanya, you can see that it's not form.

03:53:18

To try creating larger BSTs and try finding some more nodes, it's important to experiment with these operations once they are defined,

03:53:26

because now it's simply a matter of calling the function, we've written the code for it.

03:53:30

So, experiment with it, try creating larger trees with multiple levels and dozens or maybe hundreds of nodes, try generating some fake data,

03:53:39

putting it into the trees and see how trees build up.

03:53:43

And that'll give you a feel for how binary search trees work.

03:53:47

Next, let's talk about updating a value in a BST.

03:53:54

Now, updating a value is fairly simple, we already have a way of finding a node.

03:53:59

So, if you want to update a node, let's say we want to update the node hemal, the key hemal.

03:54:04

And here, we want to update it, we want to update it to this value, which is the new value of the user hemal.

03:54:10

And we're changing the name and we're changing the email here.

03:54:14

So, we first find the node and if the node is not non, then we simply change the value at that node.

03:54:20

It's as simple as that.

03:54:23

And what we're also seeing is we are reusing the find function here, and this is a good practice to always incorporate into your programs into your functions.

03:54:31

Whenever you find yourself copy, pasting some code, and maybe changing one or two things here and there,

03:54:37

think about whether you can extract that piece of code into a function and then reuse that function.

03:54:42

So, always try to make your code more and more generic.

03:54:45

Let's code you write the less, there are the chances for errors, the easier it is to understand and the smaller your functions become.

03:54:53

So, write small reusable generic functions whenever you can.

03:54:57

And this is a principle called the DRY principle or the dry principle, which stands for don't repeat yourself, whenever you're writing programs.

03:55:07

So, in update, we are not repeating ourselves by using the find function to find the right node and simply updating it.

03:55:14

By setting it's value.

03:55:16

So, let's update him on here to the new value.

03:55:20

And you can see that now we have the updated data here, so we have him and she and him and she at example.com.

03:55:27

Now, the value of the node was successfully updated and you can easily check that the time complexity of update is same as that of find.

03:55:40

Now, finally, we have the last operation that was required and this was to write a function to retrieve all the key value pairs stored in a binary search tree in the sorted order of keys.

03:55:56

This is a question that you might face once again.

03:55:59

And this is simply the in order traversal, it's a different way of stating the in order traversal.

03:56:05

Now, what you will have to figure out or reason about is why the in order traversal of a binary search tree produces a sorted array of a sorted list of keys.

03:56:18

Think about it.

03:56:21

So, here's a list all function.

03:56:23

All we do here is we call list all on node.left and then we call list all on node.right and in between them and these give us two arrays.

03:56:32

So, we assume that list all.node.left gives us the list of key value pairs from the left sub tree in sorted order.

03:56:40

Similarly, here we get the list of key value pairs from the right sub tree in sorted order and between them.

03:56:45

We simply insert this key value pair from the current node and recursively it automatically fills out the entire array.

03:56:53

And this is the end condition where we end counter an empty node, we simply return the empty array.

03:56:59

You can see now when we pass in this tree, we get back the list of users key value pairs arranged by the sorted order of keys.

03:57:11

Now, here's an exercise for you.

03:57:13

Determine the time complexity and state complexity, space complexity of the list all function.

03:57:19

Now, you can do this for a balanced tree or an unbalanced tree and here's a hint it will not make a difference.

03:57:26

But think about it.

03:57:29

So, once again let's save our work and now we've talked about binary trees and operations on binary trees.

03:57:40

Now, the next thing is to look at balanced binary trees and this is once again a very common question that gets asked right of function to determine if a binary tree is balanced.

03:57:53

And here's a recursive strategy to do this.

03:57:58

In fact, this is really the definition of balanced binary trees.

03:58:01

The left sub tree should be balanced.

03:58:03

The right sub tree should be balanced.

03:58:06

And the difference between the heights of the left and right sub tree should not be more than one.

03:58:11

So, this is an important thing now.

03:58:13

When we're looking for balance, we're not always looking for perfect balance because it may not always be possible to create a tree with perfect balance.

03:58:19

Because to have a perfectly balanced tree where for every node the left sub tree and the right sub tree have the exact same height, you will have to fill out all the nodes at all the levels.

03:58:31

And that can only have that can only happen for certain numbers.

03:58:35

For example, you can have one node which satisfies this property or you can have a tree with three nodes which satisfies this property.

03:58:41

Or you can have a tree with seven nodes which satisfies this property.

03:58:45

You may not be able to get a tree with six nodes to satisfy that property.

03:58:48

For instance, if you remove Vishal here, you will see that the left sub tree and right sub tree of this node so nox will not be of equal height.

03:58:56

That's why for balancing, we relax the criteria slightly.

03:58:59

We simply need to ensure that the difference between the heights of the left and the right sub trees is not more than one.

03:59:07

So here's the code for is balanced.

03:59:11

Once again, pretty straightforward but we will return two things here.

03:59:15

This is balanced will not only return whether the tree a node is balanced.

03:59:19

It will also return the height of the tree which is rooted at that node.

03:59:24

So the way we implement it is first calling is balanced on node.left and then calling is balanced on node.right.

03:59:31

And by the way, this is exactly how we implement recursive functions as well.

03:59:36

Sometimes we write the recursive functions signature.

03:59:41

Then we immediately write the return value.

03:59:43

And then we assume that recursive call is going to return these values.

03:59:48

So recursive call is balanced.left is going to return whether the left sub tree was balanced and the height of the left sub tree.

03:59:56

And then we assume that is balanced for node.right is going to recall is going to return whether the right sub tree is balanced

04:00:03

and the height of the right sub tree because that's what we return here.

04:00:06

Then the entire tree is balanced if the left sub trees balanced and the right sub trees balanced

04:00:12

and the absolute value of the differences in the height is less than one.

04:00:16

Which means the height l minus height r is either minus one zero or one.

04:00:20

And finally we calculate the height of the tree itself which is simply one plus the maximum of the height of the left sub tree and the right sub tree and we return it.

04:00:28

So that's how you implement a recursive function or think recursively.

04:00:32

And there's one last thing which is the end condition and the end condition although it's often the last thing you think about.

04:00:38

It's the first thing that you have to put in.

04:00:40

The end condition is to check whether a node is non because as we call node.left you may not have a left sub tree.

04:00:46

So you may call is balanced with non.

04:00:48

And if the node is non we simply return true because an empty tree is balanced by default because there's no imbalance there.

04:00:57

And it's height is zero.

04:01:00

So that's our is balanced function.

04:01:02

It's just four or five lines of code.

04:01:05

But if you are not able to reason about recursion easily you may get stuck with this and you may spend an entire 45 minutes trying to write this function and debug it.

04:01:15

So always try to think in recursive terms and that's why always it always helps to write down what you want to do in plain English.

04:01:22

So that you can determine what should be the inputs and outputs to your function.

04:01:27

Maybe also have some test cases ready and then start implementing your function and it becomes really easy.

04:01:33

So this tree for instance is balanced.

04:01:36

Here you can check is balanced.

04:01:39

You get back true.

04:01:41

But this tree here, you're looking at this is not balanced.

04:01:44

So this was tree two and if you check is balanced here you get back false.

04:01:49

So here you also get the height of the tree which is three and here you can get the height of the tree which is seven.

04:01:56

Now here's another tree.

04:02:01

Is the tree is this tree shown here balanced? Why or why not?

04:02:05

Now create the tree and check if it's balanced using the is balanced function.

04:02:15

So there's another concept called complete minorities which is slightly similar to balance minorities but it's a slightly stricter criteria.

04:02:23

So you can check out this problem here and you simply need to modify the is balanced.

04:02:28

The code for is balanced slightly to get the code for complete binary trees.

04:02:32

So do check out this problem on leadcode.com.

04:02:36

Alright, so we've looked at binary search trees and we've looked at balanced binary trees.

04:02:41

Now let's bring them both together into balanced binary search trees.

04:02:46

And here's one question that you will face at some point.

04:02:50

Write a function to create a balanced binary search tree from a sorted list of key value pairs.

04:02:58

So you have a sorted list of key value pairs.

04:03:00

So the key is for example could be username.

04:03:02

The values could be the user objects and they are sorted by key and you have a list.

04:03:07

And you have to create a balanced binary search tree from it.

04:03:11

And here's the basic logic which is somewhat similar to binary search which is something that we've covered in lesson 1.

04:03:18

Do check it out.

04:03:21

What we can do is we look at the middle element.

04:03:25

For instance if you have a list of 15 elements then the element at position 7 counting from 0.

04:03:33

The element at position 7 is the middle element.

04:03:36

Now we can take a middle element and then create a new binary search tree with the middle element as the root node.

04:03:43

So you make the middle element the root node.

04:03:46

And then you take the left half of the list and use that to create a balanced BST and make it the left child of the middle element.

04:03:57

The root node and then you take the right half which both of the halves will have 7 elements each.

04:04:03

So you take the right half and you create a balanced BST out of it and then make it the right child of the middle element.

04:04:12

So that's the idea here.

04:04:14

And how do you make a balanced BST for the left or right child a recursion?

04:04:20

So once again here's a recursive solution.

04:04:24

Make balanced BST takes data which is a list of key value pairs.

04:04:29

It's a low and high and it also takes a parent and we look at those.

04:04:33

Now low is set to zero by default and high by default is set to the last index in the data.

04:04:39

So we use that to get the middle index.

04:04:42

So for instance if low is zero and highest 14 the middle index is 7 then we get back the key and the value from the middle index.

04:04:50

So we calculate we find data made and that gives us the key and the value for instance.

04:04:54

Since the username and the user object then we create the root node.

04:04:58

So we create the root node using BST node and then we call make balanced BST on data.

04:05:05

But this time from low to mid minus one.

04:05:07

So from the indices zero to six and make that the left child of the root and we call make balanced BST on the right node.

04:05:17

So on the right half so from mid plus one which is index 8 to 14 and we make this the right subterry.

04:05:23

And then we return the root and that's it.

04:05:26

That's pretty much it.

04:05:27

The only thing that we might need here is the terminating condition.

04:05:30

When low becomes less and high which means that we have no more elements to create trees out of.

04:05:34

We simply return none.

04:05:36

So the left or right subterry for those for the parents of those nodes gets set to none.

04:05:42

So that's your makes balanced BST function.

04:05:45

We also have this other thing called parent going around and this I will let you figure out what the parent does here.

04:05:51

But this is the basic idea.

04:05:55

So here is a list of key value pairs.

04:05:58

You have a key value pairs sorted in increasing in the increasing or lexical graphic order of keys.

04:06:05

And we're calling make balanced BST with data and that gives us a tree.

04:06:10

And let's view the tree here.

04:06:12

So there you go.

04:06:13

Now we've created the tree perfectly as we wanted it.

04:06:16

Jada is here at the center and we have barats on each side and then the appropriate nodes on each side as the children of those nodes.

04:06:25

Now recall that the same list of users when inserted one by one resulted in a skewed tree.

04:06:30

Here we are getting the list of users using name and user from data and inserting them.

04:06:35

And you can see.

04:06:39

Calling display keys on tree three.

04:06:44

Returns a skewed tree.

04:06:46

So whenever you have a sorted array and you want to create a balanced BST the way to do it is to start from the middle out.

04:07:00

Now finally one other question you may be asked is to balance an unbalanced binary search tree.

04:07:08

And this is pretty simple at this point and this is kind of a trick question because

04:07:13

if you were given this question directly you may not be able to think about what to do.

04:07:17

How do you balance an unbalanced binary search tree.

04:07:20

But now that we have a way to create a balanced binary search tree from a sorted array of key value pairs.

04:07:31

And we have a way to get a sorted array of key value pairs.

04:07:36

So now it simply becomes calling the sorted array.

04:07:39

So calling list all on the node which is also the in order traversal.

04:07:42

So doing an in order traversal of the binary search tree which gives us a sorted array of key value pairs.

04:07:48

And then passing that into the make balanced BST function.

04:07:51

Okay.

04:07:52

So that's the trick here.

04:07:53

It's a two part question and once again we see the benefit of reusing a functions here.

04:07:58

Now we this now balancing an unbalanced BST now becomes a single line of code.

04:08:03

That's very nice.

04:08:05

So we created tree here with the value none and now we insert into it the values one by one.

04:08:12

And you can see that that creates a skewed tree because we are inserting the values in increasing order.

04:08:20

Electrical graphic order.

04:08:21

So we keep adding right children and we never add a right left child.

04:08:25

But then we call the balance BST function which internally takes this gets in order traversal.

04:08:31

So the in order traversal lists all the keys and key value pairs and sorted order.

04:08:38

And then we call the make balance BST function which starts from the middle and then creates a balanced binary search tree out of it.

04:08:46

So there you see this is how you balance a binary search tree.

04:08:51

And what we can do now to maintain the balance as we grow our data structure is a simple thing that we can do is to insert

04:08:59

to balance the tree after every insertion.

04:09:05

And that brings us to the complexities of the various operations in a balanced BST.

04:09:10

So if we are doing an insertion that takes order log in because now if a tree is balanced its height is order log in.

04:09:18

So for insertion you may have to traverse a path from the root down to a leaf and that path can be of length.

04:09:25

At maximum equal to the height which is order log in.

04:09:28

But if we are also doing a balancing with every insertion then we also have an order end term added here.

04:09:33

And order end plus order log in because a log in becomes much smaller than n as n grows.

04:09:39

So order end plus order log in is the same as order n.

04:09:43

So that makes insertion order n.

04:09:45

Finding a node becomes order log in, updating a node becomes order log in and you can verify that listing getting a list of all the nodes is order n.

04:09:54

So what's the real improvement between order n and order log in?

04:09:58

So let's think about it.

04:10:00

If you're looking at 100 million records then log to the base 2 of 100 millions about 26 or 27.

04:10:08

So it only takes 26 operations to find or update a node within a balanced BST.

04:10:14

As opposed to 100 million operations.

04:10:17

So you can see here 26 or a loop of size of length 26 and we're doing some operation inside it.

04:10:24

Only takes about 19.1 microseconds.

04:10:27

That is 1 microsecond is 10 to the power minus 6 seconds.

04:10:32

On the other hand, order n involves looping through the entire list.

04:10:37

So looping through 100 million numbers rather than 26.

04:10:40

And that obviously takes far, far longer.

04:10:43

And we saw that it took about 10 seconds, right, about 9.98 seconds.

04:10:48

So to find an update, finding an updating a node in a balanced binary search tree is 300,000 times faster than our original solution.

04:10:59

And all we've changed here is the data structure.

04:11:02

And that's the importance of data structures because now each user will be able to view their profile in just 19.1 microseconds.

04:11:11

At least that part of the request will take only this long.

04:11:14

So the user experience will be better and your CPU will be busy for a shorter time.

04:11:19

So you will be able to serve not 8, but hundreds of thousands of users every second.

04:11:29

And finally, your hardware cost will also be far lower because now your CPU is busy for a lesser time.

04:11:35

So you do not need to use a very large machine or you do not need to use too many machines to support hundreds of millions of users.

04:11:44

And that is the benefit of choosing the right data structure.

04:11:48

Now there's one tip here how do you speed up insertions.

04:11:53

So what we may do is we may choose to perform the balancing periodically instead of every insertions.

04:11:58

For example, we can balance for every 100th insertion or every 1000th insertion or every 100th insertion.

04:12:05

Whatever, and that's where we have to balance how often do we need to insert things versus how often do we need to restore the balance.

04:12:12

Another idea is to do the balancing maybe periodically at the end of every hour.

04:12:18

So for a second or two, there may be a slight dip in the performance because you may be performing the balancing.

04:12:26

But even that, there's a way to do it. So you can take a copy of the tree and then balance it and then simply replace the pointer to the original tree.

04:12:34

So there are many other tricks that you can apply.

04:12:36

And in fact, there's also an algorithmic trick which brings insertion and balancing together into an order login operation,

04:12:45

which we look at right at the very end, so stay till the end.

04:12:49

But before we do that, let's come back and answer our original problem statement.

04:12:54

So remember now, as a senior back in engineer, you are tasked with developing a fast in-memory data structure to manage profile information.

04:13:01

Use a name name an email for 100 million users and it should allow insertion, find, update and listing the users by using it all as efficiently as possible.

04:13:11

And to answer this question, instead of creating a user database class, we can create a generic class called tree map because we have been making things more and more generic as we have gone along.

04:13:22

So let's define a function called tree map, which internally stores a binary search tree, a balance binary search tree inside it.

04:13:35

So when we initialize a tree map, we set self dot root to none, which means we have not created a tree so far.

04:13:41

And then instead of defining functions insert, update and delete, we are going to use some special functions in Python classes.

04:13:49

So we're going to use the function set item.

04:13:53

We're going to use the function set item here and set item is just like insert, except it is a combination of both insert and update.

04:14:02

So to set item, we will pass a key and a value.

04:14:05

And of course, self will refer to the tree map object itself.

04:14:10

So the first thing we do is we get the root, which is basically the binary search tree that we are storing internally here.

04:14:17

So we get the binary search tree and then we find we look for the key inside the binary search tree.

04:14:23

So if the key is found, so if we find the node in our tree, then we come into this else position and then we simply update its value.

04:14:32

And if we do not find the node, so which is what happens initially because initially our self dot root is none.

04:14:38

So when you call find with none and pass a key, you get back none.

04:14:42

So then we first set self dot root by inserting the key into the tree.

04:14:51

So if a key exists within our binary search tree, then we updated and if the key does not exist within our binary search tree, then we insert it into our binary search tree.

04:15:05

So we've combined insert and update into the single operation called set item.

04:15:10

And similarly we define another operation called get item. This is the find operation.

04:15:15

All we do here is we find the node inside self dot root using the find function we had defined earlier.

04:15:21

And if the node is present, if it is found, then we return the value of the node otherwise we return it on.

04:15:28

So given a key, we retrieve the value.

04:15:31

And then we have we defined one last function called iter. And this is the replacement for our list all function.

04:15:38

So what we do is we simply say we call list all on self dot root. So that gives us a list of key value pairs.

04:15:44

And then we have the special syntax we say x for x in this list.

04:15:51

And we put these round brackets around it. So what this round brackets around it does is that this creates a generator out of it.

04:15:59

So now this is no longer list, but this is a generator. And a generator is something that you can use within a for loop.

04:16:06

So the iter function will allow our class to be used directly within a for loop and we'll see the example in just a second.

04:16:14

And finally we have another function called underscore underscore lene.

04:16:18

So remember there are double underscore here. So there's double underscore set item double underscore get item double underscore similarly double underscore lene double underscore.

04:16:28

Here we simply return the size of the self dot root. So here we simply return the size of the binary tree.

04:16:33

And then we have this function called display this is going to simply display the keys.

04:16:38

Okay, so now we've defined this frame up structure and it has all of these funny looking methods like we know in it, but what about all of these.

04:16:45

But we see what these do in just a moment and we know what the what the functionality is, but you may be wondering why we've defined them like this.

04:16:52

So the reason is these are special methods that are treated specially in python. So here's how you can use them.

04:17:02

Let's first get a list of users that we'll later insert into a tree.

04:17:07

Let's get a tree map. So we instantiate the tree map function or the tree map class and that gives us a new tree map inside it. There is no binary tree you can check.

04:17:18

If you check tree map dot root, you will see that it is none. There's no value here.

04:17:27

And if we try to display it, you can see that this tree map is empty.

04:17:32

Then to insert instead of calling tree map dot insert or instead of calling tree map dot underscore underscore set item, we can use this indexing notation.

04:17:41

So we open these square brackets and we put in the key that we want to insert. So if we want to against the key and so against the key akash, which is a string.

04:17:50

If we want to insert the value akash, then we simply say tree of akash is akash.

04:17:57

And similarly tree of a certain key with the indexing notation set to this.

04:18:02

So this is going to first look for the key as we have defined inside item. If it finds the key, then it is going to update the value for the key.

04:18:11

If it does not find the key, then it is going to insert that key value pair as a new node into our tree.

04:18:17

So let's check it out now.

04:18:21

And let's see here, if we now check tree map dot root, you will see that now it is a BST node.

04:18:28

And if we try to display it, you can see that now it has a structure jadeesh sonaksh and akash.

04:18:35

Also note that this is a balanced tree. If you go back here to set item, you will notice that whenever we insert, right after this we also balanced the tree.

04:18:44

And now you can change the logic here so that we do the balancing not after every insertion, but maybe after every 100 insertions.

04:18:51

You may need to track somewhere what is the current insertion counter.

04:18:55

And when it gets to 100, only then do the balancing and then set the counter back to 0.

04:19:00

So that's an exercise for you, perform the insertion, perform the balancing at only at certain intervals.

04:19:10

And here's a way to retrieve an element. So retrieving an element is also now really simple.

04:19:15

We have called tree map with jadeesh as the index and that gives you the value. If it is found and if it is not found, it simply returns none.

04:19:24

Now because we've defined the function underscore underscore underscore underscore.

04:19:29

So you can see here that has the value 3 because now we can use it with the length function which is used for lists and dictionaries.

04:19:38

And let's add a few more things and let's set the values and let's see here.

04:19:43

So you can see all this works exactly as expected. Now we are able to set values.

04:19:50

We are able to update values. We are able to display the tree. It is remaining balanced.

04:19:55

And remember I mentioned that you can use this in a for loop. So you can now put the tree map directly into a for loop.

04:20:00

And what this will do is because we have defined the underscore underscore itter function and the itter function returns a generator.

04:20:07

So now you can use this in a for loop and you get back the key value pairs from the list all function that was used inside itter.

04:20:16

You can print the keys in the values. And in fact if you want to convert it to a list all you need to do is pass it into the list.

04:20:22

And once again because this is a generator because this is an iterable. This is now an iterable class and you have defined that the way to iterate over this class is to get elements out of the key value pair list.

04:20:35

So when you call list you get back this list of key value pairs.

04:20:39

Okay. So now we've made it a very Python friendly class. You know instantiating it is very easy. We simply create a new tree map adding values is very easy.

04:20:48

We simply use the indexing notation removing elements is very easy. Well not removing a finding elements is very easy.

04:20:54

We simply use the indexing notation updating elements is the same as inserting.

04:20:59

We can also check the size of the tree quite easily using the length function and then we can also use iterate over the keys iterate over all the users in a for loop quite easily.

04:21:11

And we can also update values as you see here values have been updated.

04:21:16

Now the the purpose of doing this is to make it easier for other people to use this data structure.

04:21:26

As a senior back in the engineer you may have designed this data structure and you may have implemented binary search trees inside it.

04:21:33

But it's not important for other people on the team or other people using your data structure to know what the internal implementation is.

04:21:41

What's important for them is to be able to use it easily. So that's why always think carefully about the interface or the API of your functions or of your modules or of your classes.

04:21:51

Try to make them as Python friendly as possible. This was something that will be appreciated in interviews and by co workers.

04:22:00

So make them Python friendly so that when people want to use something you've created it is extremely intuitive.

04:22:07

And they do not need to really understand the underlying details.

04:22:11

For instance, I could be using this class and I could have no idea that it is a binary search tree. All I know is how to insert and how to get a value out of it.

04:22:18

And I know that it is super efficient because you have designed it. And I don't don't need to worry about the internal details.

04:22:25

So encapsulation and good APIs are very important skill to have to cultivate. So do that as you work on programming problems.

04:22:34

Now once again let's save our work before committing. Now I did tell you that there is a way to create self balancing binary trees.

04:22:46

And a self balancing binary tree remains balanced after every insertion or deletion.

04:22:51

And in fact several decades of research has gone into creating self-balancing binary trees and not just binary trees but other trees as well, which are not binary nature.

04:22:59

And many approaches have been devised. For instance, red black trees, AVL trees and B trees. So here's an example.

04:23:07

This is an AVL tree. So here whenever a node goes out of balance, we rotate the tree. And you can see visually what we're doing here.

04:23:15

Whenever you see that there is an imbalance in the tree, we rotate it. And how do you do this? We do this by tracking the balanced factor, which is the difference between the height of the lead subtree and the right subtree for each node.

04:23:30

And then rotating unbalance subtries along the path of insertion or deletion to balance them.

04:23:35

So you can see the balance factor is 0 right now. The balance factor becomes 1 and the balance factor becomes 2.

04:23:41

Then we rotate it to set the balance factor back. And then the balance factor here becomes minus 2.

04:23:46

So here we do a right rotation. Here are the balance factor.

04:23:51

It becomes minus 2 here and minus 1 here. So here we do two rotations. So there are four cases in total.

04:23:57

There's the left right case, the right left case, the left left case in the right right case. All four cases were demonstrated here as well.

04:24:04

And then you may need to do this rotation, not just once, but you may need to do this multiple times along the path of insertion.

04:24:11

So when you insert a node and that node creates an imbalance, then you need to work backwards so you need to keep going from parent to parent.

04:24:19

And keep rotating nodes whenever you need to rebalance them based on the updated balance factor of each node.

04:24:26

So it seems a little complicated, but it's actually not. It's just that there are multiple cases to handle.

04:24:33

So you will need to write a couple of helper functions. You'll need to write a function, left rotate which rotates node left while still preserving the binary search state property.

04:24:43

You will need to write a function, right rotate which rotates the function which rotates it to the right while preserving the BST property.

04:24:51

And then in the insertion, you will also need to perform the rotation at the right places.

04:24:57

And you will need to track the balance factor inside each node. So there are a few things to work out here.

04:25:02

And don't worry, you will normally not be asked to implement an avial tree within an interview or within a coding assessment.

04:25:08

So you do not really need to learn the implementation, but it's nevertheless a very interesting data structure to study.

04:25:14

And here are a couple of resources you can check out. So you can check out this YouTube video which explains it very wonderfully.

04:25:21

And you can check out this implementation on geeksbookgeeks.org. And the important thing for us to take away here, and which is something that you may be asked if not the implementation, but just the complexity.

04:25:32

The important thing is that each rotation takes constant time and at most log and rotations may be required because if you are starting with the balance tree and you're inserting a new node.

04:25:44

Then you may traverse a path of height at most length at most log in. So you may need to perform at most log and rotations. Maybe twice of that.

04:25:53

So what that means is in order log in time, you will be able to insert and maintain the balanced property of a binary tree. So you do not need to recreate the entire tree again.

04:26:03

And that makes your tree very efficient because now when you're working with 100 million records, inserting will also take 20 steps and finding will also take 20 steps and updating will also take 20 steps.

04:26:16

And all of these will work in micro seconds.

04:26:19

So that makes your data structure very efficient.

04:26:23

And with that we conclude our discussion of binary search trees.

04:26:28

So here's a quick summary. We looked at this problem of creating a data structure which allows efficient storage, retrieval and updation.

04:26:39

Also efficient iteration in a sorted order. We first started out with a list of sorted list of values sorted by the keys.

04:26:49

And we realized that that was probably not the right idea because we were working with really large number of records. Then we created this binary tree structure.

04:26:57

So we looked at binary trees. We looked at how to create them. We looked at easy ways to visualize them easy ways to create them from tuples.

04:27:05

We looked at how to calculate their heights. Their sizes. How to traverse them in order pre-order post order.

04:27:11

We then looked at binary search trees which have this property that the left subtree has keys that are smaller than the root nodes keys.

04:27:20

And the right subtree keys are larger than the root nodes keys. And that property holds at every subtree.

04:27:26

And that makes it really easy to find to locate a specific element or find the position to insert an element.

04:27:33

So we created binary search trees. We created the operations insert update find and list all in a binary search tree.

04:27:41

We also determined ways to check if a binary tree is a binary search tree or not.

04:27:48

Then we talked about balancing and we saw how to create balanced binary search trees.

04:27:54

And binary search trees form the basis of many modern programming languages, language features. For instance,

04:28:00

maps in C++ in Java are binary search trees and data storage systems like file system indexes or relational databases.

04:28:08

Also use something called B trees which is an extension of binary search trees.

04:28:14

So it's very important to know about binary search trees even if you may not ever need to implement them.

04:28:24

You may be asked about them and in many cases you may need to pick a binary search tree as a data structure for a problem like we did in this case.

04:28:32

Now you may wonder if dictionaries and Python are also binary search trees well they're not.

04:28:37

Dictionaries and Python are not binary search trees. So we will soon release an assignment that you can find on the lesson page and you will work on hash tables in the assignment.

04:28:48

And here are some more problems that you can try out. So you can try to implement rotations and cell balancing insertion.

04:28:55

You can try to implement the deletion of a node in a binary search tree that's slightly more complicated because what you do if you have to delete a node that has both left and right subtree.

04:29:06

You can try deletion with balancing if you really are up for a challenge.

04:29:11

Here are a couple more finding the lowest common ancestor of two nodes in a tree.

04:29:15

So the common node which is a common parent of both nodes.

04:29:20

Here you can use a parent property finding the next node in lexical graphic order.

04:29:25

So given a node how do you find the next node? What's its complexity?

04:29:29

Or given a number k how do you find a kth node in a binary search tree?

04:29:33

So to do this you will have to employ some clever tricks and then there are a couple more resources here.

04:29:38

You can open up these and find more questions.

04:29:41

The important thing to take away is that almost all of these will involve some form of recursion.

04:29:47

So you will either work with the left subtree or the right subtree or both.

04:29:52

And some of them may also require you to store additional information within the node.

04:29:56

For instance, for this one the given number k find the kth node.

04:30:02

This may require you to store the size of each balance binary search tree in each node.

04:30:10

So what to do next? You should review the lecture video and execute the Jupiter notebook experiment with the code yourself.

04:30:18

Then complete the assignment hopefully.

04:30:22

The next lesson is called divide and conquer and sorting algorithms.

04:30:26

This is data structures and algorithms and Python and I will see you next time.

04:30:32

Thank you and goodbye.

04:30:34

Let's look at assignment two of data structures and algorithms and Python.

04:30:38

The topic of the assignment is hash tables and Python dictionaries.

04:30:42

Let's get started.

04:30:44

First thing we will do is go to the course website pythondsa.com

04:30:49

And on the course website you can find all the lessons and previous assignments.

04:30:56

We are looking at assignment two.

04:30:58

So you may want to open that up and assignment two is based on or inspired from some of the topics discussed in a

04:31:04

course lesson two. So you may also want to watch lesson two and complete the notebook before you work on assignment two.

04:31:09

Let's open it up.

04:31:11

Now in this assignment you will apply some of the concepts learned in the first two lessons to implement a hash table from scratch in Python.

04:31:19

That's very interesting.

04:31:21

You will and hash tables are very important data structure.

04:31:24

They present in pretty much every programming language and are a common topic discussed and asked in coding interviews.

04:31:31

So we'll see how to implement them from scratch.

04:31:34

And one of the central problems in hash tables is called collisions.

04:31:37

So we'll see how to handle hashing collisions using linear probing.

04:31:42

And we will also replicate the functionality of Python dictionaries.

04:31:46

So Python dictionaries are actually implemented using hash tables.

04:31:50

So we'll see how to replicate the way Python dictionaries are created and used and modified and the way we

04:31:59

access keys and iterate over keys and set values and change values and so on.

04:32:06

So we'll pretty much re implement the Python dictionary.

04:32:09

Now we have an assignment, start a notebook here.

04:32:12

So we can click on view notebook to open up the notebook.

04:32:16

Once again, this is a Jupyter notebook.

04:32:19

And as you walk through the notebook, you will find question marks in certain places to complete the assignment.

04:32:25

You have to replace all the question marks with appropriate values, expressions or statements to ensure that your notebook runs properly and to end.

04:32:33

Okay, so make sure that you run all the code cells do not change any variable names.

04:32:38

And in some cases, you may need to add code cells or new statements.

04:32:42

And since you'll be using a temporary online service for code execution, keep saving your work by running JoVin.commit at regular intervals.

04:32:50

There are some optional questions. They are not considered for evaluation, but they are for your learning.

04:32:55

Okay, so let's run the code.

04:32:57

The recommended way to run the code is using free online resources, binders specifically, but you can also run it on your computer locally.

04:33:06

So we're going to click run and click run on binder.

04:33:10

Once again, this may take a few minutes, sometimes depending on the current traffic on the platform.

04:33:17

There we have it. Now we have the Jupyter notebook running.

04:33:20

The first thing I like to do is click kernel and restart and clear output, so that we can execute all the code cells and see their outputs from scratch.

04:33:28

I'm also going to hide the header and the toolbar and zoom in here a little bit.

04:33:33

So we can see things a little better.

04:33:36

The first thing we will do is set a project name, import the JoVin library and run JoVin.commit.

04:33:43

This will allow you to save a snapshot of your work to your JoVin profile.

04:33:47

So now you have a copy of the assignment starter notebook.

04:33:51

Any modifications that you make every time you run JoVin.commit will get saved to your personal copy.

04:33:56

And it is this personal copy that you will submit at the very end.

04:34:00

So let's talk about the problem statement.

04:34:04

In this assignment, you will recreate Python dictionaries from scratch using a data structure called hash tables.

04:34:10

And dictionaries in Python are used to store key value pairs.

04:34:14

So keys are used to store and retain values.

04:34:18

Here's an example. Here's a dictionary for storing and retrieving phone numbers using people's names.

04:34:23

So we have a dictionary called phone numbers and the way you create a dictionary is using this special character, the brace or the curly bracket as it's called.

04:34:31

And then in a dictionary you have these key value pairs.

04:34:34

So this is one key value pair where you have a key.

04:34:38

The key in this case is a string or a cache.

04:34:41

And here you have a colon.

04:34:44

And then here you have a value. The value in this case is a phone number.

04:34:48

So that's how you create a key value pair.

04:34:50

And comma separated key value pairs is what you need to create a dictionary.

04:34:54

You can see once the dictionary is created, it has displayed in the exact same way.

04:34:59

And then you can access a person's phone number using their name.

04:35:03

So if you have variable phone numbers and we use the indexing notation, so this is the square bracket.

04:35:09

And we pass in a key here.

04:35:11

We get back their name.

04:35:13

And you may wonder what happens if the key is not present.

04:35:16

The great thing about Jupiter is you can insert a new cell.

04:35:19

Like you can just click insert cell below.

04:35:22

Or you can use a keyboard shortcut B as I just did.

04:35:25

And check maybe.

04:35:27

Let's check the key Vishal.

04:35:29

Okay. And you get back a key error.

04:35:32

And you may also wonder what happens if is it case sensitive?

04:35:36

Does that matter? You can check it very easily.

04:35:39

So a lot of the questions that you might get.

04:35:42

A lot of the questions that you may want to even ask on the forum or look up online.

04:35:46

Can be resolved simply by creating a new cell and typing out some code.

04:35:50

So what happens if questions can all be answered by writing some code?

04:35:56

So now let's add some new phone numbers.

04:35:58

So this is how you create an initial set of phone numbers.

04:36:01

This is how you access a phone number.

04:36:03

And this is how you add new values.

04:36:05

So adding new values is like accessing them.

04:36:07

But instead of accessing it, you put an equal to and then you actually set the value here.

04:36:12

So we can add a new value here.

04:36:14

The phone number for Vishal.

04:36:16

And we can also update an existing value in a dictionary simply by accessing that value and putting an equal to.

04:36:23

And putting a new value there.

04:36:25

You can see now that the dictionary is updated to contain the new phone number 7878 and not the original phone number 948948.

04:36:34

You can also view all the names and phone numbers stored in the phone number dictionary using a poop.

04:36:39

So you can say far for name in phone numbers.

04:36:42

So when you put a dictionary into a for loop, you get back a key within each loop.

04:36:47

You can see here that the name and the phone number here is displayed for you using the print statement.

04:36:55

So those are some things that you can do within a dictionary.

04:36:58

And dictionary isn't Python are implemented using a data structure called a hash table.

04:37:02

And hash table uses a list or an array to store key value pairs.

04:37:08

And uses a hashing function to determine the index for storing or retrieving the data associated with a given key.

04:37:16

So here's what it looks like here.

04:37:18

You have the key, John Smith.

04:37:21

And you have a function called hash and the function hash takes any key.

04:37:26

And it returns an index within the list.

04:37:29

So why do we use a hashing function?

04:37:31

Well, one approach as we've discussed in lesson two is we can store or key.

04:37:36

And store or key value pairs in a list.

04:37:39

And we can simply search through the list each time we want to look up the value for a key.

04:37:43

But that is inefficient because that requires looking through potentially all the keys before we get to the key that we want.

04:37:50

Or maybe half of the keys.

04:37:52

So that makes it an order and operation.

04:37:55

If N is a size of the list.

04:37:58

That's pretty inefficient.

04:38:00

We want something faster and a hash function actually operates in constant time.

04:38:05

Simply takes the key.

04:38:07

And it converts a key into a number.

04:38:09

So in that sense, it gives you the index of the specific key value pair in constant time rather than ordering.

04:38:17

And that is what makes hash tables so efficient.

04:38:20

So hash function.

04:38:23

There's not required looping through the list.

04:38:25

It simply takes a key gives you the index.

04:38:27

And you can simply then get the key value pair or the value from that index.

04:38:32

Now your objective in this assignment is to implement a hash table class which supports these operations.

04:38:38

And insert operation.

04:38:40

The way to insert a new key value pair.

04:38:42

A find operation.

04:38:43

To find the value associated with a given key.

04:38:46

And update operation.

04:38:47

To update the value associated with a given key.

04:38:50

And then list operation.

04:38:51

To list all the keys stored in the hash table.

04:38:54

And here's where we are going to use Python classes.

04:38:59

And there's a brief introduction to Python classes in lesson 2 of this course.

04:39:05

So do check out lesson 2 if you want to refresh on Python classes.

04:39:09

We have the class hash table.

04:39:12

And inside the class hash table, you have a bunch of methods.

04:39:17

Now the insert method apart from taking the self argument.

04:39:20

And remember that the self argument is refers to the object of the class that will be created.

04:39:27

So this is equivalent to this variable in Java or C++.

04:39:34

But these are the actual arguments of the method.

04:39:37

The actual arguments are key and value.

04:39:40

So the insert function of the insert method will take key and value.

04:39:45

Then the find method will take a key.

04:39:48

The update method will take a key and value once again.

04:39:51

So the find method takes a key and your job is to return the value.

04:39:54

The insert method takes a key and value and you insert the key value pair into the hash table.

04:39:59

Then you have the list all method which is used to list all the keys from the table.

04:40:05

So before we begin our implementation, let's just save and commit our work.

04:40:09

So we're running Jovind.commit here.

04:40:11

Let's just run that once again.

04:40:13

There we go.

04:40:14

The notebook has now been committed.

04:40:17

So what you can do is you can come back to this particular page.

04:40:21

And you can find this from your profile.

04:40:23

And then you can click run to continue your work based on the modifications that you've already made.

04:40:29

Okay.

04:40:30

So we build hash table class step by step.

04:40:33

And the first step is to create a Python list which will hold all the key value pairs.

04:40:37

Now remember that hash table internally uses a list to store the key value pairs.

04:40:41

And we will create a list of a fixed size.

04:40:44

So we'll set this variable max hash table size of size 4096 initially.

04:40:50

And we're going to create a Python list of this size.

04:40:53

And how do you create a Python list of the size?

04:40:55

And we want all the values to be set to non.

04:40:58

So this is the way to do it.

04:41:00

You can of course you can start typing non and that would take a long time.

04:41:05

Or you can use a simple technique.

04:41:08

Just put in non times 4096.

04:41:11

And there's one of the great things about Python.

04:41:13

It is such an expressive language that creating a list of 4000 elements simply requires this single.

04:41:19

Expression here.

04:41:22

You can check that here.

04:41:23

You can even check the length of the data list.

04:41:27

Now if the list was created successfully, here are some test cases.

04:41:31

Here is one check that the length of the list is 4096.

04:41:34

Here's another check.

04:41:36

And we simply picking a random value from the list 99.

04:41:39

And just checking if that is equal to non.

04:41:42

But if you really want to have a short short test here, what you should be doing is you should be checking for.

04:41:49

Item in detail list.

04:41:51

Item equals non.

04:41:54

And here's a trick you can do.

04:41:56

You can write a word called assert.

04:41:59

And what assert does is if this comparison is true, then it does nothing.

04:42:06

It lets your code proceed as usual.

04:42:09

But if at any point this comparison becomes false, then it throws an error.

04:42:14

Let's see here.

04:42:15

You can see here there was no error.

04:42:16

So that means it worked fine.

04:42:18

But if this comparison was wrong, so let's say if it.

04:42:22

If we had here, we wanted the items to be equal to.

04:42:26

Seven if you put it and and the tell is does not contain the item seven at certain position.

04:42:32

And you will get an assertion error here.

04:42:36

Okay.

04:42:37

This is how you can create your own test cases by putting in assert.

04:42:41

But the idea here is that whatever you try to do, make sure that you're adding some.

04:42:45

You're adding some more test cases and not just depending on the test cases that are given here.

04:42:49

It these are simply to guide you in the right direction.

04:42:53

Okay.

04:42:54

So next up we have a list.

04:42:56

Now we need a way to store or insert key value pairs into a list.

04:43:00

That's where the hashing function comes into picture.

04:43:03

The hashing function is used to convert strings and other non numeric data types into numbers,

04:43:08

which can then be used as list indices.

04:43:11

For example, if a hashing function converts the string or cosh into the number four,

04:43:16

then the key value pair or cosh and the phone number 7878787878787878.

04:43:22

We'll be stored at the position four within the data list.

04:43:26

And here's a simple algorithm for hashing, which can convert strings into numeric list indices.

04:43:31

And a hashing algorithm does not have a single definition.

04:43:35

You can come up with a hashing algorithm.

04:43:37

And in fact, coming up with a good hashing algorithm is an area of research in itself.

04:43:43

Now of course, python dictionaries use hashing that is in built into python.

04:43:48

And that's a fairly optimized hashing algorithm that's probably the result of several years of research.

04:43:55

But here's one very simple technique.

04:43:57

We iterate over the string character by character.

04:44:00

And then we convert each character into a number using python's built in ORD function.

04:44:05

And you can see here that if you call ORD on the character x,

04:44:10

you get back a number.

04:44:12

It's already gives you a way of converting characters into numbers,

04:44:15

but not entire strings.

04:44:17

That's why we need to iterate over the string character by character.

04:44:20

Then we simply add the numbers for each character to obtain the hash for the entire string.

04:44:25

So very simple technique.

04:44:26

We just keep if you have the number hello,

04:44:29

we take the odd for hello, the odd for either odd for L, the odd for L, the odd for L,

04:44:33

the odd for O and add them together.

04:44:36

And since we want that number,

04:44:38

the final result to be an index or a position within the list.

04:44:42

So we take the remainder of the result with the size of the detail list.

04:44:46

So it's possible that once you add the numbers together,

04:44:49

you may end up with a pretty big number.

04:44:51

But if you take the remainder with 0096 or the max hash table size variable,

04:45:01

you get back a number that is smaller than 4096.

04:45:05

So you can use just let it remain as the index.

04:45:08

So let's first define a function called get index.

04:45:11

All it does is it takes the detail list and it takes a string.

04:45:15

And it returns.

04:45:17

It applies this hashing algorithm to return an index for that string for that key.

04:45:23

So for a character in a string,

04:45:26

we need to convert the character to a number.

04:45:29

So we convert the character from the string into a number by calling awardee on a character.

04:45:37

Great.

04:45:38

Then we update the result by adding the number.

04:45:40

So we say result plus equals a number pretty straight forward.

04:45:45

And that repeats for all the characters in the string.

04:45:49

And then we get back the final result.

04:45:51

Now that result may be longer than the actual size of the list.

04:45:57

And this is where we may then want to check the size of the list.

04:46:00

Okay.

04:46:01

Now remember there's one,

04:46:03

I could also have probably written max hash table size here.

04:46:07

But that would be wrong,

04:46:09

isn't it?

04:46:10

Because we are passing in a detail list here.

04:46:12

We are passing in a detail list.

04:46:14

And although we have so far created it in detail list of size 4096,

04:46:20

your function should ideally be looking at the size of the detail list that you have.

04:46:26

And not any global variables.

04:46:29

So keep that in mind.

04:46:31

And the right thing you should check here is land detail list.

04:46:34

And what this will allow is now this will allow your function to work with

04:46:38

Data lists of different sizes.

04:46:40

And not just the standard size 4096 that we have.

04:46:43

Defined above.

04:46:44

Okay.

04:46:45

Very important thing.

04:46:46

Always make sure that your functions.

04:46:49

Use the arguments that there are passed into them.

04:46:53

They are generic that they can work with any input and not just a particular input that

04:46:58

have been that has been defined earlier.

04:47:02

Okay.

04:47:03

So there you go.

04:47:05

Now you have.

04:47:06

This is our function get index that has been defined.

04:47:09

And here are some tests.

04:47:11

Now if you pass in the detail list and you pass in the empty string,

04:47:14

because there are no characters.

04:47:15

The result is likely to be zero.

04:47:18

Great.

04:47:19

Here's another one.

04:47:20

The result here is 585.

04:47:22

Here's another one.

04:47:23

The result here is 941.

04:47:25

Great.

04:47:26

Now this is where you should be testing your function with some custom test cases.

04:47:31

So I'm going to create a new.

04:47:33

Data list 2.

04:47:34

And this is going to have the size 9 times 48.

04:47:37

So this is only going to have the size 48.

04:47:40

And I should be testing get index with this data list as well.

04:47:45

So let's say we're looking at the key a cache.

04:47:50

Now we know that let's see.

04:47:53

We can actually test this out here.

04:47:55

What happens if you add ORD of A plus ORD of A plus K A S and H.

04:48:06

That number is 585.

04:48:08

But since the size of the list is 48,

04:48:11

what we should be getting back as the result is 48 divided by 585.

04:48:16

So we should be getting, oh sorry.

04:48:19

585.

04:48:21

And it's remainder with 48.

04:48:23

We should be getting back the number 9.

04:48:25

This should be equal to 9.

04:48:27

Okay.

04:48:29

So let's check that if this is equal to 9.

04:48:32

And indeed this is equal to 9.

04:48:35

On the other hand, if we had max hash table size,

04:48:44

you will see here that since we are not taking into consideration the actual size of the list

04:48:49

that was passed into the function,

04:48:52

we are getting back the value 585 because we are taking the remainder with 4096.

04:48:58

Okay.

04:48:59

So remember to take the result remainder with the size of the detail list that was passed in.

04:49:05

So this is one of the several gotchas in this assignment and they're there for a reason

04:49:09

because this is something that you need to keep in mind.

04:49:14

A function which only uses its arguments and does not depend on any external global variables

04:49:22

or constraints and things like that is called a pure function.

04:49:27

Of course a pure function also does not modify any external global variables.

04:49:32

So it simply takes some arguments and it turns the result irrespective of anything else outside.

04:49:38

So now we can to insert a key value pair into hash table.

04:49:44

We can simply get a hash of the key.

04:49:46

So here we have a key value pair and we simply get a hash of the key by calling get index for

04:49:51

data list and key.

04:49:52

We get back the index 585.

04:49:54

And then inside the data list at the given index,

04:49:56

we can simply set the key value pair as the element stored at that index.

04:50:02

And the same operation can be expressed in a single line of code.

04:50:07

Here we're calling get index for data list and he month.

04:50:12

And that's going to give us an index.

04:50:14

And we're going to then invoke a set at that particular index within data list.

04:50:20

The the element.

04:50:23

He month comma, he month's phone number.

04:50:26

Now to retrieve or find the element associated with a pair,

04:50:30

we can simply get a hash of the element the value associated with the key.

04:50:36

We can simply get a hash of the key and look up that index within the data list.

04:50:40

But here we have the key a cache and we have the data list.

04:50:43

And we call get index.

04:50:45

So we get the index of the key a cache.

04:50:49

And that gives us the index here.

04:50:52

And we can then call data list and pass in the position IDX.

04:50:57

And that should give us a key value pair.

04:50:59

Remember that we stored a key value pair at the given index.

04:51:02

So we should get back that value here.

04:51:05

So now we know how to store a value.

04:51:07

You get it's hash for the key and you stored the key value pair.

04:51:10

Now to retrieve a value.

04:51:12

So you get a hash for the key and then you retrieve the key value pair.

04:51:16

And from there, you can get the value.

04:51:19

You can also list the keys to list the keys.

04:51:22

Here is some special code we are using.

04:51:25

So let's see.

04:51:26

This is called list comprehension.

04:51:27

And let's take a quick look at list comprehension.

04:51:30

So list comprehension works like this.

04:51:32

If you create a list. Why from a list X?

04:51:35

Let's say let's call this list one.

04:51:38

And list two.

04:51:40

Good variable names always help.

04:51:42

So if you have a list one.

04:51:45

And you write this X for X in list one.

04:51:49

What does that do?

04:51:52

That for X in list one.

04:51:55

Petch is elements one by one from the list.

04:51:58

And then here you can specify what to do with the numbers that we've fetched.

04:52:01

So right now I'm not doing anything.

04:52:02

I'm simply returning that number.

04:52:04

And then I'm putting the entire thing into a list.

04:52:06

What this does is this creates a new list.

04:52:09

So you can see this is a copy of the original list.

04:52:11

What I could do is I could write X times two for X in list one.

04:52:16

And now I would end up with a list which in which each element is the double of that particular element.

04:52:23

I could also do X times X.

04:52:25

If I wanted.

04:52:27

I could also call a function on it.

04:52:31

Let's see what function we can call here.

04:52:33

Let's maybe put in some numbers here.

04:52:35

1.3.

04:52:36

2.4.

04:52:38

3.2.

04:52:40

So we could put maybe the function maths dot round X.

04:52:46

Also our math dot seal.

04:52:48

This is going to give us a ceiling.

04:52:50

1.3 becomes 2.4 becomes 3.

04:52:53

So you can do any operation with each element of the list.

04:52:57

And once you put that in a bracket and you have this for here.

04:53:01

That's going to apply that same operation to the entire list.

04:53:05

And this is called list comprehension and python.

04:53:07

It's a very powerful way to express complex operations on lists and dictionaries.

04:53:13

And there's one final thing in list operations which is the if condition.

04:53:17

So for X in list one can be followed by an if condition.

04:53:21

And the if condition can once again apply on X.

04:53:24

So if X is greater than 3 let's say we put this condition.

04:53:30

Then what happens is we choose only those numbers from list one,

04:53:35

which satisfy this condition X greater than 3.

04:53:38

So that means we would skip 1.3 we would skip 2.4.

04:53:41

You would get 3.2 we would get 6 we would get 7.

04:53:44

And we would apply math dot seal to them.

04:53:47

And that's how we get back 4.67.

04:53:49

Let's list comprehension in a nutshell.

04:53:52

So to get a list of keys all we can do is for key value pairs in detail list.

04:53:58

If the key value pair is not none remember that we have a lot of non values and it's a huge list.

04:54:04

If the key value pair is not none then we simply return kv0.

04:54:08

So remember if you have a key value pair.

04:54:10

If you have like a key value pair that's a cache and a phone number.

04:54:16

And you can also put because these are tuples.

04:54:20

You can also put a round bracket here if you want.

04:54:23

But even without it it's the same thing.

04:54:25

That's a key value pair.

04:54:27

So kv0 is going to give you the key and kv1 is going to give you the value.

04:54:33

So we simply get the key for those key value pairs in detail list.

04:54:37

Where the element at that position of the key value pair is not none.

04:54:41

And that should not be called pairs that should probably be called keys.

04:54:45

You can see that the keys are a cache in payment.

04:54:49

So that's how we can now use the get in next function.

04:54:52

And the next step for you is to complete the hash table implementation here.

04:54:58

By following the instructions given in the comments.

04:55:01

So now you have this basic hash table class and in this class you have a constructor.

04:55:07

Now the constructor takes the object self or this.

04:55:12

And the self is going to point to the actual object of the actual hash table that gets created using the class.

04:55:18

And then it takes a maximum size.

04:55:20

Now what are we doing here?

04:55:21

We want to make our hash table configurable.

04:55:24

We don't always want to have 4096 elements in our internal list.

04:55:30

If we may need a hash table that can store more values or we may need a hash table that can only store fewer values.

04:55:36

So we are going to set a default value for it which is the max hash table size.

04:55:41

So if you do not provide this argument by default it will create a list of size 4096.

04:55:47

But we also want the option to specify a maximum size.

04:55:51

Now you need to create a list of size max size with all the values set to none.

04:55:55

Now you may be tempted to do this.

04:55:58

But that would be wrong.

04:55:59

Remember that always use the arguments to a function.

04:56:03

Try not to depend on an external value or external constant.

04:56:07

So this would be wrong.

04:56:09

You may also be tempted to do this.

04:56:12

Detail list.

04:56:13

Detail list equals details that we've already created.

04:56:15

This would also be wrong.

04:56:17

Not just because you're not using the max size but also because.

04:56:21

Now you're tying this class implementation to a global variable.

04:56:26

And that global variable is a list which can be modified.

04:56:30

So if you all the objects of this class any number of hash tables that you create using this class.

04:56:35

We'll all use the same data list.

04:56:37

And that's not what you want each hash table that you create.

04:56:40

Maybe you have a hash table for phone numbers.

04:56:42

You have a hash table for addresses.

04:56:44

You have a hash table for something else.

04:56:46

Each of them should have their own internal data list.

04:56:50

And this is not going to create a copy of that original list.

04:56:52

It's simply going to point to the original list.

04:56:54

So what you want to do is you want to do.

04:56:58

None.

04:57:00

And you want to multiply it with max size.

04:57:03

There you go.

04:57:04

This is the correct way to do this.

04:57:07

Now we're looking at insert here.

04:57:09

Now to insert.

04:57:11

We did see that to get the index all you need to do is you need to pass the key.

04:57:15

And remember here you need to pass not data list but self dot data list.

04:57:19

Right? Because now we want to use the.

04:57:21

Data list that is stored inside this specific object of the class.

04:57:25

We do not want to use the global data list.

04:57:27

And this is something that is.

04:57:29

And mistake that we often make initially.

04:57:32

I've still make this mistake where I have certain global variables defined.

04:57:36

And I'm using those global variables inside my class.

04:57:39

Why doing that?

04:57:40

Anything that you want to put inside a class object.

04:57:43

You need to put inside self like we've done here.

04:57:46

And then to access it, you need to use self dot to access that specific property or.

04:57:53

Element or even method.

04:57:56

So now we have self dot data list.

04:57:58

And we pass in the key and the data list into get index.

04:58:03

And that gives us the index.

04:58:06

Now the get index function was defined earlier.

04:58:08

We've seen it already.

04:58:10

Now we want to store the index inside the list.

04:58:13

So we call.

04:58:15

Self dot data list IDX and we want to store the key value pair there.

04:58:19

So we can simply put in key comma value here.

04:58:22

If you wish, we can also put in the brackets, but they're not necessary.

04:58:26

And that's going to insert the key value pair.

04:58:28

Now how do we.

04:58:30

Find the value associated with the given key.

04:58:33

First, we get the index for the key.

04:58:36

So we call get index on self dot data list and key.

04:58:41

Then we retrieve the data stored at the index.

04:58:44

So this would be simply.

04:58:47

Self dot data list of IDX.

04:58:51

And then if the key value pair is not non.

04:58:54

If the key value pair is non well.

04:58:56

There's nothing that index.

04:58:57

We can return none.

04:58:59

Another option would be to also maybe raise an index error.

04:59:04

And with a message.

04:59:07

It's a threat.

04:59:09

But return on is good enough for now.

04:59:12

Then if not from the key value pair, we get back the key and the value.

04:59:16

And then we return the value.

04:59:18

Keep that in mind.

04:59:19

If you simply return this, you would get an error.

04:59:22

You would get an exception that may go unexplained.

04:59:25

So whenever you are destructuring or you're trying to get to values out of it.

04:59:28

To make sure that the tuple is not non, especially in this case,

04:59:31

because we're starting with a list of nons in a place where we're supposed to be storing key value pairs.

04:59:39

So that's fine.

04:59:41

Now update is going to be pretty much identical to insert.

04:59:46

I don't see any difference here.

04:59:48

So we can simply say get index or self dot data list and key.

04:59:57

And then now we simply store the key value pair inside it.

05:00:00

So we can simply store the key comma value inside self dot data list IDX.

05:00:06

Then for list all, again straight forward,

05:00:10

self dot if kv is not known.

05:00:12

So get all the key value pairs that are not empty.

05:00:16

And then we simply get kv zero is going to give us the key from kv.

05:00:21

So that there it is.

05:00:23

Here you can see already that we are creating a basic hash table.

05:00:26

Of max size 1024.

05:00:29

So the first thing that we can verify is that the length of the basic of the

05:00:33

detail list is 1024.

05:00:35

There you go.

05:00:37

Then you insert some values here.

05:00:39

So we insert the value a cache.

05:00:43

We insert the key value pairs.

05:00:45

So we insert the value 9999 for a cache.

05:00:48

So this is one key value pair.

05:00:50

We are inserting a month and 80.

05:00:53

And what this will do is when you call basic table dot insert,

05:00:57

it will call this insert function.

05:01:00

And self will now point to the basic table that we have just created.

05:01:04

Because we're calling insert on that specific basic table.

05:01:07

So self will point to the basic table.

05:01:10

So self dot data list will become basic table dot data list.

05:01:13

And then the remaining arguments are cache in 9999.

05:01:17

We'll get passed in as the key and the value.

05:01:20

So this code will execute, we will get the index within self dot

05:01:25

data list for the key a cache.

05:01:27

And then within self dot data list or basic table dot data list in this case,

05:01:31

add the given index that we just computed.

05:01:34

We will store the key value pair, which is a cache and the phone number.

05:01:38

And that's how it will work.

05:01:40

So we're inserting some values and then we're finding a value.

05:01:43

So once we insert the two values and then we find a value,

05:01:46

that should give us the value 80, 80, 80.

05:01:49

You may want to then maybe modify the test case to also include the test for

05:01:53

the other values that we inserted.

05:01:55

Feel free to modify the test cases or add new test cases.

05:01:59

So that we're checking not just one value but both the values.

05:02:03

Next, let's see how we can update a value.

05:02:05

So we call basic table dot update and we set 7777.

05:02:09

Now suppose you're not implemented update here.

05:02:12

Let's for a moment return.

05:02:15

Suppose you're not implemented update here.

05:02:18

Then if you called update, you would get false here because the value

05:02:23

did not get updated.

05:02:25

And you can check that by simply checking basic table dot find a cache.

05:02:31

You can see that it still has a value 9999.

05:02:34

That's how test cases are helpful.

05:02:36

Let's remove the return.

05:02:40

Okay.

05:02:41

So now the value seems to have been updated just fine.

05:02:44

Then let's get a list of all the keys and the list of keys should match true.

05:02:49

Once again, if we did not have this KV is not none then we would get back not just

05:02:54

this key value pair but we would get back all the nons and we don't want that.

05:02:59

So these were some test cases but you need to dark create more test cases and

05:03:04

test them out to make sure that your implementation is correct.

05:03:07

Now once you've done that, you won't want to run Joven.com it.

05:03:12

Now the next step and this is something that you may have thought about while working

05:03:16

through the assignment is that how do you ensure that different keys do not point

05:03:22

to the same index because we're doing all these things where we're converting

05:03:27

each character into a number and then adding up the characters.

05:03:30

Now obviously if you have words which have the same characters but in different

05:03:35

orders now obviously are different keys but they do not have they have the same

05:03:41

hash listen and silent have exactly the same keys.

05:03:48

Exactly the same hash so for instance you can check get index listen and get index

05:03:55

silent.

05:03:58

Okay.

05:03:59

We also need a data list let's put in a data list here.

05:04:02

So them have the hash 655 that means if you insert a value at with the key listen

05:04:09

and then you insert a value with the key silent.

05:04:12

The data at this position will get overwritten.

05:04:15

So when you try basic table dot find listen you will get the value associated with silent

05:04:20

and that's bad and this is called collisions.

05:04:25

This is called a collision because here the two keys are colliding in some sense

05:04:31

because they're leading to the same hash.

05:04:33

And any hash table that you implement is ultimately going to have collisions

05:04:37

because the number of strings of the number of keys is possibly infinite

05:04:41

but you have a limited number of positions or indices in your table.

05:04:47

So our hash table implementation is incomplete because there can be data loss

05:04:50

and it does not handle collisions and there are multiple techniques to handle collisions

05:04:54

and we the technique we will use in this assignment is called linear probing

05:05:00

and here's how it works while inserting a new key value pair.

05:05:04

If the target index for a key is occupied by another key then we simply try the next index

05:05:10

and if the next index is also occupied by another key we try the next and then we try the next

05:05:15

and then we try the next till we find the closest empty location.

05:05:18

And then while finding a key value pair we apply the same strategy

05:05:21

but it's searching for an empty location this time we search for a location

05:05:25

which contains the key value pair with the matching key.

05:05:29

We get the hash of the key that we want to find

05:05:32

and then we check if that position is occupied by another key not the same key.

05:05:37

Then we try the next index and then we try the next index

05:05:40

and then we try the next index till we find a position which is occupied by a key value pair for the same key.

05:05:46

And if we find an empty position that means the key does not exist

05:05:51

because if it did exist then it should have been somewhere in that string of searches

05:05:56

that we just did.

05:05:58

Now by updating the key value pair again we apply the same strategy

05:06:02

but instead of searching for an empty location we look for a location which contains a key value pair

05:06:07

with the matching key and update its value.

05:06:10

So that's how you handle collisions in a hash table.

05:06:14

And to handle collisions we will define a function called get valid index

05:06:19

which first gets the hash using get index

05:06:22

and then start searching the data list and it turns the first index which is either empty

05:06:27

or contains a key value pair matching the given key.

05:06:30

So we are now addressing two requirements in one shot with the get valid index function.

05:06:38

For insertion we are looking for an empty position for a find and update

05:06:44

we are looking for a position which is occupied by the given key value by the given key value pair.

05:06:50

Or the given key specifically.

05:06:52

So here is the get valid index function and I will let you work through this.

05:06:58

So you will start with the index return by get index then while true

05:07:04

because we don't know how long we may need to iterate get the key value pair stored at the index.

05:07:09

This is where you may have to.

05:07:11

It's simply a question of putting the index into the data list getting the key value pair.

05:07:16

Now if the key value pair is none which means that there is nothing at that index it is empty.

05:07:21

That's great we are done we can simply return the index.

05:07:24

On the other hand if it does have values so then we get the key and value out of it.

05:07:29

If the key matches the key that we want to store.

05:07:34

Great then we can return the index once again.

05:07:37

If neither of these hold true we move the index to the next position.

05:07:41

But as we move to the next position it's possible that we may run out of indices.

05:07:45

So the index may become equal to the length of the data list.

05:07:48

Then we wrap around and go back to the 0th position.

05:07:51

So this is an important part where we go around.

05:07:53

So now our list is in some sense circular where we can keep looping around it so that if we have something

05:07:59

that needs to be stored at the last position.

05:08:02

But the last position is occupied then we move back to the 0th position and so on.

05:08:07

And then you can check if get valid index was defined correctly.

05:08:12

And if it was then these cells should output true.

05:08:15

Once again these are just some sample test cases.

05:08:18

So you should include some more of your own test cases here.

05:08:21

And finally once you're done just save your work.

05:08:25

Now the next step is to incorporate linear probing into your hash table.

05:08:29

So here's a new class called probing hash table.

05:08:32

Here you need to use not get index but get valid index.

05:08:36

It has pretty similar code so I let you work this out.

05:08:40

Be aware not to simply copy paste code and you will run into issues if you copy paste code.

05:08:48

So always make sure that you are writing the code yourself and carefully writing each word or each variable and each method and each argument of the code.

05:08:59

Then there are some test cases here for you to test a probing hash table.

05:09:03

Once again you can try it out with some examples and see if it works fine.

05:09:09

Specifically here we are taking the same example listen and silent.

05:09:13

Both of which in basic hash table would have the same key but in probing hash table would have different.

05:09:19

We'll have the same position but in probing hash table will have different positions.

05:09:25

And that's it.

05:09:26

We have at this point you're done with the assignment so you can make a submission.

05:09:31

If you have run jobin.com it you can take this link and make a submission on the assignment page.

05:09:38

Or the other option for you is to simply run jobin.submit.

05:09:42

I can do a assignment too.

05:09:44

And once you make a submission it will be evaluated automatically so let's click through here.

05:09:49

So it will be evaluated automatically and if you scroll down here you will see that you will get a great not just great but you will also get comments for each question.

05:10:00

So if you see here there are question numbers here.

05:10:04

You can see that there's question 5, question 4 and so on.

05:10:07

So it seems like we since we implemented the get index function since we implemented the detail is correctly question 1 was a pass.

05:10:14

Let's see what question 1 was very quickly.

05:10:18

Question 1 create a Python list of size max table hash size.

05:10:22

Question 2 was a pass so question 2 was the get index function.

05:10:27

Question 3 was a pass.

05:10:29

Question 3 was complete the hash table implementation.

05:10:33

Question 4 was a field get valid index we've not defined it yet and question 5 led to an exception.

05:10:38

Obviously because we have some code which will not execute because we have some blanks that need to be filled in.

05:10:44

So keep that.

05:10:46

Use this as feedback.

05:10:48

You will know exactly what to fix.

05:10:50

And if you are stuck at any point you know what to do.

05:10:55

You can go to the forum.

05:10:57

So let's see the forum here.

05:10:59

So this is the forum sub category for assignment 2.

05:11:03

You can create a new topic here if you want to have a longer discussion or you can simply go to the main topic.

05:11:08

Assignment 2 hash tables and Python dictionaries.

05:11:11

And you can ask a question here.

05:11:13

There are already a lot of discussions going on here.

05:11:16

So it's possible that your question may already have been answered.

05:11:20

And after this there are also some optional questions.

05:11:23

Now here the optional question is for you to implement a Python friendly interface for the hash table.

05:11:27

So instead of defining functions insert update and find you will define the functions get item set item.

05:11:34

And instead of list all you will define the function ater.

05:11:39

And also instead of using the hash function instead of using the custom hash function that we have defined.

05:11:46

You will define you will use the function that's inbuilt into Python called hash.

05:11:51

And it takes any string or any object and it returns a number for it.

05:11:56

Now since hash does not accept a list. So you will have to take the remainder manually.

05:12:03

So in this case, for example, you've taken the remainder and gotten back to number 3569.

05:12:07

So define a hash table here.

05:12:09

And once you have done that, you will be able to use it just like a Python dictionary.

05:12:14

You will be able to use it exactly like this. You create a hash table.

05:12:18

And then to insert a value, you use the indexing notation and insert a value.

05:12:22

To retrieve a value, you use the indexing notation to get the value back.

05:12:26

And here you can compare it with the number.

05:12:28

To update a value, you simply use the indexing notation again.

05:12:32

And to get a list of values, you simply call the list function or you can also use it within a for loop.

05:12:37

And we've also defined a function called rapper and STR.

05:12:42

What that will do is that we'll let Python printer representation like this.

05:12:46

When you simply run a cell, which just contains the name of this variable.

05:12:51

That's one. And then there are a bunch of improvements that you can try to hash tables.

05:12:56

This is a great exercise if you want to improve your Python programming skills and also understand how hash tables work.

05:13:02

If you can complete these four exercises, there's pretty much no question related to hash tables that you cannot answer.

05:13:09

You will know everything about them.

05:13:11

And each of these exercises may take another 30 minutes to 45 minutes.

05:13:16

But it's completely worth the time. Maybe spend a set aside a few hours on the weekend to work on these optional exercises.

05:13:24

Now here's one how to track the size of the hash table instead of having to loop through the entire table to get the number of key value pairs.

05:13:31

Can you store the length somewhere so that you can track it in size order one.

05:13:38

Here's one to implement deletion.

05:13:40

So to implement deletion, you have a topic called technique called tombstones that are used.

05:13:45

So you can use this tombstone technique and implement it just a little more code.

05:13:50

In your implement dynamic resizing.

05:13:52

So instead of starting out with a hash table of a given size or requiring the user to specify a size.

05:13:57

Can you or maybe start with a hash table of let's say 128 elements and then double it as soon as you reach 128 elements or maybe even before to avoid collisions.

05:14:07

You may want to double it as soon as you reach 64 elements like 50% of the capacity.

05:14:14

So dynamic resizing is the technique that allows you to automatically grow and shrink the data list internally.

05:14:20

And then here's another technique for collision resolution.

05:14:24

This is called separate training.

05:14:25

So instead of going to the next index, what you do is you maintain a length list at each position.

05:14:31

And for all the keys, you still use that position, but you look through the length list while looking for key or you add a new element to the length list for that position.

05:14:42

If you're adding a new key there.

05:14:44

So here's separate training explained in a YouTube video.

05:14:48

You can look through that and try to explain it on your own.

05:14:51

And one final thing here is also the complexity analysis.

05:14:55

And here's where you talk about average case time complexity because on average if you have a good hashing function and you've implemented some improvements like dynamic resizing.

05:15:05

Then the average time complexity for insert update find and delete are order one.

05:15:12

And list of course is still order and on the other hand, the worst case time complexity because there can be collisions are still order in.

05:15:20

So here's something for you to ponder upon what is average case complexity and how does it differ from worst case complexity.

05:15:27

It's also something that is discussed in lesson three of the course where we talk about quick sort.

05:15:32

And you see why insert find and update have an average case complexity of order one and a worst case complexity of order in.

05:15:39

If not, it is something that you can look up online.

05:15:42

Try to see if you can search it tutorial and learn why this happens.

05:15:47

Then how is the complexity of hash tables different from that of binary search trees.

05:15:52

We've discussed binary search trees in a lot of detail in lesson two.

05:15:57

It's now the question becomes when should you prefer using hash tables and when should you prefer using binary search trees or vice versa.

05:16:04

All these very interesting questions and you may get asked some of these questions and interviews as well.

05:16:10

It will help you, especially to ponder upon some of these questions even if you do not end up solving all of these optional questions.

05:16:18

Do look at the complexity analysis and think about it.

05:16:22

And there's a forum thread where you can discuss your thoughts.

05:16:26

So what do you do next review the lecture video review the assignment walkthrough video and execute the Jupiter notebook.

05:16:36

Complete the assignment and attempt the optional questions as well.

05:16:40

And do participate in forum discussions.

05:16:43

So this was a walkthrough of assignment two of data structures and algorithms in Python.

05:16:48

Hello and welcome to data structures and algorithms in Python.

05:16:51

This is an online certification course conducted by Jovian in today we're on lesson three.

05:16:58

My name is Akash and I'm the CEO of Jovian and I'm your instructor for the course.

05:17:04

If you follow along with this course and complete four weekly assignments and a course project.

05:17:10

You can earn a certificate of accomplishment for this course.

05:17:16

So let's get started.

05:17:18

The first thing we do is visit the course website.

05:17:23

PythonDSA.com.

05:17:26

So when you visit pythonDSA.com this will bring you to the course website.

05:17:31

Here you can find all the information and material for the course.

05:17:35

You can check out lessons one and two and assignments one and two.

05:17:40

Both of which are still open for submission.

05:17:43

And let's open up lesson three.

05:17:49

So the topic today is sorting algorithms and divide and conquer.

05:17:54

And you can watch a video recording of the lesson here.

05:17:57

You can also catch a version in Hindi.

05:18:00

Now the code used for the lesson is provided here.

05:18:04

So let's open up this link sorting and divide and conquer.

05:18:08

This is where all the code is present.

05:18:13

So here we have it.

05:18:15

Now we are looking at the tutorial and the code for this lesson.

05:18:19

If you scroll down you can see that there is some code here.

05:18:27

Now to execute this code you have two options.

05:18:30

You can either execute this code online using free online resources which is what we recommend.

05:18:36

Or you can download it and run it on your computer locally.

05:18:41

And the instructions for both of these are given here.

05:18:44

We are going to use the first one which is to click the run button at the top of this page and select run on binder.

05:18:52

So let us scroll up here and let us click the run button and then click run on binder.

05:19:00

Now once you do this it will open up and interface like this.

05:19:07

And what you're looking at here is a Jupiter notebook.

05:19:10

So a Jupiter notebook is an interactive programming environment where you can write code.

05:19:15

Look at the results and you can also write explanations.

05:19:20

And we've provided you with a cloud based Jupiter notebook setup.

05:19:23

So you don't have to install anything.

05:19:25

All the code that you execute here will be running on our cloud.

05:19:29

But you can also download it and run it on your own computer by following the instructions.

05:19:34

So the first thing we'll do is click on the kernel menu and click restart and clear output.

05:19:39

To remove any of the outputs from previous executions of the code.

05:19:44

So that we can execute the code and see the outputs fresh for ourselves.

05:19:52

Now I'm also going to zoom in a little bit here.

05:19:57

So we can look at the code and let's get started.

05:20:04

So this is a coding focus and practical course.

05:20:08

And we're talking about different data structures and algorithms.

05:20:11

The topic today is sorting algorithms and dividing conquer algorithms in Python.

05:20:16

So in every lecture we focus on a specific problem.

05:20:20

So in this notebook in this tutorial we will focus on this problem which you're looking at here.

05:20:25

So let's read the question.

05:20:27

You're working on a new feature on Joven called top notebooks of the week.

05:20:31

Write a function to sort a list of notebooks and decreasing order of likes.

05:20:36

Keep in mind that up to millions of notebooks can be created every week.

05:20:39

So your function needs to be as efficient as possible.

05:20:43

That is the key point here.

05:20:45

Now this is a classical problem in computing.

05:20:50

The problem of sorting a list of objects and it comes up over and over and computer science is software development.

05:20:56

And it's important to understand common approaches for sorting.

05:21:00

How they work, what the trade-offs are between them and how to use them.

05:21:05

So before we solve this problem we solve a simplified version of the problem.

05:21:09

It's quite simple to state.

05:21:11

Write a program to sort a list of numbers.

05:21:14

And sorting usually refers to sorting in ascending order unless specified otherwise.

05:21:20

So that's a question for today.

05:21:22

Write a program to sort a list of numbers and we'll expand upon it to answer this original question as well.

05:21:28

Now this is the method that we've been following throughout the course and we will continue to follow a systematic strategy for solving programming problems.

05:21:35

Step one, state the problem clearly.

05:21:38

Identify the input and output formats.

05:21:41

Step two, come up with some example inputs and outputs.

05:21:44

Try to cover all the edge cases.

05:21:46

And step three, come up with a correct solution for the problem.

05:21:49

State it in plain English.

05:21:51

Step four, implement the solution and test it using example inputs.

05:21:55

So this is very important that you implement the simple solution.

05:21:58

So you just need a correct solution first, not the efficient one and then you implement it and test it.

05:22:03

Then you analyze its complexity, identify inefficiencies and then you apply the right techniques to overcome the inefficiencies.

05:22:10

And that is where the knowledge of the right data structures and algorithms comes into picture.

05:22:14

And once you apply the new technique, then you once again state the solution, implement it and analyze its complexity and repeat if necessary.

05:22:23

So this is the strategy we'll follow here today as well.

05:22:26

So step one, state the problem clearly and identify input and output formats.

05:22:31

Now the problem is stated clearly enough for us.

05:22:33

We need to write a function to sort a list of numbers in ascending or increasing order.

05:22:38

Now here's the input.

05:22:40

The input is a single argument called norms and that is a list of numbers.

05:22:45

So for instance, here's a list of numbers.

05:22:47

You can see that they're not in any specific order.

05:22:50

And then the output is the sorted version of the input.

05:22:53

So here is the same list of numbers in sorted order.

05:22:57

And based on these two, we can now write a signature of our function.

05:23:01

So our function will be called sort or something else, but it will accept just one input.

05:23:06

And right now we've not written any code here, so we just put it in pass.

05:23:11

Now I'm running this code here using the shift plus enter shortcut,

05:23:15

but you can also use the run button on the toolbar.

05:23:18

So either run or shift plus enter.

05:23:21

And the great thing about Jupiter notebooks is that you can add more code cells anywhere and test anything that you want.

05:23:29

For instance, if you want to insert a code cell below, just click the insert cell below menu option.

05:23:34

Or click outside a cell on the left and press the B button.

05:23:39

And now you can write some code here and run it.

05:23:45

So please feel free to experiment with this notebook as you go along.

05:23:50

It's a step to come up with some example inputs and outputs.

05:23:53

Now this is very important.

05:23:55

You need to think about all the different scenarios in which you may want to test out your function before you put it into production.

05:24:00

So that you catch errors early on.

05:24:03

And thinking about scenarios will help you identify what are the special cases you need to handle and code.

05:24:08

And it's easier to do it right now than while writing your code because that may lead to bugs.

05:24:14

So here are some scenarios that I was able to come up with and there may be more.

05:24:18

So you can continue and increase this list.

05:24:21

So the first one is some list of numbers in random order.

05:24:24

So some numbers in any random order and you can try slightly smaller list and larger list and so on.

05:24:32

Second is a list that's already sorted.

05:24:34

We need to ensure that an already sorted list does not become unsorted.

05:24:39

A third is a list that's sorted in descending order.

05:24:42

We may want to check that.

05:24:44

See if we need to handle that case separately.

05:24:47

Somehow.

05:24:48

Then a list containing repeating elements.

05:24:50

This is something you may not have thought of.

05:24:52

But the question ever said that the numbers should be unique.

05:24:55

So there could be repeating elements here.

05:24:57

And empty list.

05:24:59

Interesting input.

05:25:00

The output is also an empty list.

05:25:02

Or a list containing just one element.

05:25:04

Or a list containing one element repeated many, many, many times.

05:25:08

Or even a really long list.

05:25:11

This is something that we may want to test because we want our algorithm to be efficient at the very end.

05:25:16

So a long list may help us just evaluate the efficiency empirically.

05:25:20

So these are the scenarios.

05:25:21

And what we now need to do is create some test cases for these scenarios.

05:25:25

So test cases involve creating an input and an output.

05:25:29

For instance, here's an input.

05:25:32

Number zero.

05:25:34

And this could be the list for three one.

05:25:38

And here's the expected output.

05:25:40

So let me call it output zero.

05:25:42

And this would be one three four.

05:25:45

Now this is a good way to put create a test case.

05:25:48

And you can use it later for testing.

05:25:50

But we will put our tests into this particular structure.

05:25:53

We'll create a dictionary.

05:25:55

And creating a dictionary like this will help us automate the testing of all our test cases with a single help.

05:26:01

So what we're going to do is for each test case created dictionary.

05:26:06

And then it will have two keys.

05:26:11

First key is called input in the second key is called output.

05:26:16

And in the inputs for each of the arguments that go into the function.

05:26:20

And remember there's just one argument here.

05:26:22

We will have one key.

05:26:23

So we will have the key norms.

05:26:25

And the key norms will have the input value for the test case.

05:26:28

And the output will simply contain the output returned by the function.

05:26:32

So that's how we'll set up our test cases.

05:26:35

So there's a test zero, a list of numbers in random order.

05:26:42

Then we have test one.

05:26:44

This is also another list of numbers in random order.

05:26:46

You can see here no specific order.

05:26:48

Now we have a list that's already sorted.

05:26:51

And the output obviously is the same.

05:26:54

Now for the random order list the output is the same numbers in sorted order.

05:27:00

Now we have a list that's sorted in descending order.

05:27:04

And the output is the same list in increasing order.

05:27:07

Then we have a list containing repeating elements.

05:27:11

You can see that the numbers 1, 2, 6 and 7 and even minus 12 repeat here.

05:27:18

Here we have the empty list.

05:27:20

Here we have a list containing just one element.

05:27:22

And here we have a list containing one element repeated many, many times.

05:27:26

And then the final test case which was to create a really long list.

05:27:31

That's where we can start with the sorted list.

05:27:34

Created using the range function and then shuffle it to create the input.

05:27:39

Otherwise you may spend a lot of time just creating a list and then writing the sorted version of it.

05:27:44

That's too much work.

05:27:45

So always use a computer, always use helper functions whenever you can.

05:27:49

Even to create test cases.

05:27:51

So we'll use the range function.

05:27:53

Now the range function takes either a single number or two numbers.

05:27:57

So you can have something like this, range 2 to 10 or just range 10.

05:28:01

And if you just look at it this way, it just prints range 0 to 10.

05:28:05

Now if you actually want to see what's in it.

05:28:07

There are a couple of ways you can do list, range 10.

05:28:10

And that gets converted into a list.

05:28:13

Or you can use it in a for loop.

05:28:15

So you can put for x in range 10, print x.

05:28:21

So you can see that it contains a numbers 0 to 9.

05:28:24

And that's important that the range does not include the end element of the range.

05:28:29

So just keep that in mind.

05:28:31

Now what's the difference between a range in a list?

05:28:36

A list contains all the 10 numbers together at once.

05:28:40

But a range internally simply maintains a counter.

05:28:43

So when you use a range in a for loop, it simply starts the counter from 0.

05:28:47

And increment it up to the starts a counter from the starting value.

05:28:51

So if it's 2 to 10, then it starts a counter from 2.

05:28:53

And increases it up to the end value minus 1.

05:28:56

So it does not use as much space as a list.

05:28:59

It simply uses a one single variable internally.

05:29:02

And that's why it's more efficient.

05:29:04

In any case, right now we need lists.

05:29:07

So what we will do is we will create a list of 10,000 numbers.

05:29:12

So 0 to 9,999.

05:29:16

That is our in list.

05:29:17

And then our out list is also going to be 0 to 9,999.

05:29:21

That's our out list.

05:29:22

Both of them are sorted.

05:29:23

Now what we do is we shuffle the in a list.

05:29:26

So we import the random module from Python.

05:29:28

And then we call random dot shuffle.

05:29:32

And we call random dot shuffle on in list.

05:29:35

And that shuffles the the first list, the in list.

05:29:39

So now we have that as the input.

05:29:42

And then the out list, the sorted list is the output.

05:29:46

Now once again, we can even check that in list is actually shuffled.

05:29:50

Maybe by looking in the first 10 elements.

05:29:52

You can see here that these are all shuffled numbers.

05:29:55

On the other hand, if you check the out list.

05:29:58

You can see that these are all in order.

05:30:03

So those are our test cases.

05:30:05

And it's very important to create good test cases.

05:30:07

Even in interviews before you start coding or before you even suggest a solution.

05:30:12

You should try and list out your test cases either verbally to an interviewer.

05:30:17

In a coding assessment, you may create a block of comments at the top and start listing some test cases at the top.

05:30:22

Or you can create proper test case dictionaries like this.

05:30:26

It takes a few minutes, but it's totally worth it because you can then test your algorithms very easily.

05:30:32

And finally, we'll take all our test cases test 0 to test 8 and put them into a single list called tests.

05:30:39

Great, so we made some good progress so far.

05:30:43

Next, let's come up with a simple correct solution and stated in plain English.

05:30:49

And coming up with a correct solution is pretty straightforward.

05:30:53

We have a list of numbers, so we iterate over the list.

05:30:58

Let's grab a list of numbers, so that we have something to look at.

05:31:01

Here you go.

05:31:06

So we have a list of numbers, so we iterate over the list of numbers starting from the left.

05:31:14

So we start from the very left.

05:31:16

And then we compare each number with the number that follows it.

05:31:19

So we compare 99 with 10.

05:31:22

And if 99 is greater than 10, then we can say for sure that 99 should appear after 10.

05:31:29

In the final sorted array and the sorted array by default, it means the increasing order of numbers.

05:31:37

So that's what we're solving first.

05:31:39

So what we can do is we can simply swap 99 and 10 because we know that 10 should appear before 99 and 99 should appear after 10.

05:31:48

Now as we continue the swap, we move to the next position and then we compare 99 with the next element 9.

05:31:55

That turns out to be higher as well.

05:32:00

So we swap it and then we keep going.

05:32:03

So we iterate over the list and for each element compare the number with the number that follows it.

05:32:08

And if the number is greater than the one that follows it, swap the two elements.

05:32:12

Now you do that once and that alone is probably not enough to compile the entire list because the entire sorted list because 99 in this way will end up at the end.

05:32:22

If you follow the process, but the rest of the list is still not sorted.

05:32:29

So we repeat these steps one to three.

05:32:32

So once again we start from the left and then we start comparing 10 with 9 and then 10 with 8 and so on.

05:32:38

And keeps stopping elements as we go forward.

05:32:43

Now I have a claim here that you may you will need to repeat the steps one to three at most and minus one times

05:32:51

to ensure that the array is sorted.

05:32:53

Can you guess why? And here's the hint.

05:32:56

After one iteration of the process, the largest number in the list will reach the very end.

05:33:02

So that means that each time you're putting one of the largest numbers at the very end.

05:33:08

So you need around end steps.

05:33:10

So here's an animation showing the same thing.

05:33:12

You'll be compare 65 and then we switch them.

05:33:16

Then we compare 63 and we switch them.

05:33:23

Then we compare 61 and we switch them.

05:33:29

Now we compare 68 and we don't switch them because they're in order.

05:33:35

Next we compare 8 and 7 and we switch them.

05:33:40

Next we compare 8 and 2 and we switch them.

05:33:45

And finally we compare 8 and 4 and we switch them.

05:33:50

And in this way, the largest number 8 has reached the very end.

05:33:54

So now we can throw free sets position and we can start again from the beginning.

05:33:59

And you can see that this time the next number 7 will end up here.

05:34:02

And then the next time the number 6 will end up here.

05:34:04

And then next time the number 5 will end up here and so on.

05:34:07

So in end repetitions of this process of comparison left to right.

05:34:14

We will have sorted the array.

05:34:16

And this approach is called bubble sort because it causes the smaller elements to bubble to the top or to the beginning.

05:34:23

You can see that the numbers 1, 3 slowly bubble up to the top and it causes the larger numbers like 8 and 7 to sink to the bottom.

05:34:31

And you can watch this entire animation to get a full sense of how bubble sort works.

05:34:35

What will also really help is if you can take an example on paper and work it out on your own step by step.

05:34:43

Especially with sorting algorithms, this really helps.

05:34:47

Okay, so now we've come up with a correct solution.

05:34:50

Let's implement it and let's test it using an example.

05:34:54

Now the implementation itself is also pretty straightforward.

05:34:58

So we have the bubble sort function here.

05:35:01

Deaf bubble sort, it takes a list of numbers.

05:35:04

Now we may not want to modify the list of numbers in place because then our test cases will not be reusable.

05:35:13

So just to avoid modifying our test inputs.

05:35:16

We're going to create a copy of the list to avoid changing it.

05:35:20

And the way to create a copy simply call the list function with the list as input.

05:35:25

So now we are set replacing norms with a copy of norms.

05:35:29

Now depending on your particular use case, this may not be necessary.

05:35:34

So this is something that you can actually check while you're in a coding assessment or in an interviewer or talking to an interviewer.

05:35:41

Just check with them, do they want an array to be sorted in place or do they want a new array to be created?

05:35:47

If they want, if they're okay with sorting it in place, then you probably don't need this.

05:35:51

But you may still just want to keep it in because otherwise you may end up modifying some of your test cases unintentionally.

05:35:58

And that may lead to problems.

05:36:01

So always go to create a copy of the input rather than modifying it in place.

05:36:07

So then let's come to steps 1, 2, and 3.

05:36:10

And then we'll see step 4, which is the outer most step really.

05:36:13

So we iterate all the array.

05:36:15

So we go from we take I and we check the range,

05:36:20

LEN norms minus 1.

05:36:22

So the number of elements in the array is N and N can be obtained using LEN norms.

05:36:28

Then we want to go from indices 0 to N minus 2.

05:36:33

So the total number of indices is 0 to N minus 1.

05:36:36

But if you go to the N minus 1 for the last element, there is no further element to compare it with.

05:36:42

So keep that in mind that you only want to run this iteration till your pointer comes to this point, not till the last element.

05:36:49

And that is why we check if we put I in the range 0 to LEN norms minus 1.

05:36:56

So the highest value that it can take is LEN norms minus 2.

05:37:01

Next we compare norms I with norms I plus 1.

05:37:05

So we compare the number with the element that comes after it.

05:37:08

And if it is greater, so that means these two are out of order.

05:37:13

So then we simply swap them.

05:37:15

So we set norms I, comma norms I plus 1, equal to norms I plus 1, comma norms I.

05:37:22

Now this is a very interesting way of sorting in C or C plus plus or Java, you would have to write three or four steps to swap numbers.

05:37:28

But in Python it is really simple.

05:37:30

First you say x, y is, let's say we're missing x, y are 2, 3.

05:37:35

So you can see they have the values 2 and 3.

05:37:37

And then we simply write x, y equals y, x.

05:37:39

So what happens is the value of y gets placed into x and the value of x gets placed into y.

05:37:45

So it's a single step for swapping two numbers, there you go.

05:37:49

So we swap the two elements, exactly what we are showing here, swapping the two elements.

05:37:57

Next we repeat this.

05:38:00

So now we're doing this from left to the penultimate element.

05:38:05

And in this way we've pushed the largest element to the end.

05:38:09

Now we need to repeat this process n minus 1 times.

05:38:12

So that each time we are pushing one of the largest elements to the variant.

05:38:15

And in n minus 1 repetitions of these three steps, we will end up with a sorted list.

05:38:23

And finally we return the sorted list and that's it.

05:38:27

So let's test it out with an example.

05:38:31

And by the way, if some of this doesn't make sense, so a simple way to debug it is to add print statements here.

05:38:41

So you can add a print statement and maybe just print this value.

05:38:45

So we've used underscore here because we don't actually use this value.

05:38:49

But let's say we wanted to use this value, then we can print that this is iteration j.

05:38:55

And then inside it, you can print that the value of i is i.

05:39:00

And you can also print the value of norms i.

05:39:03

And you can also print the value of norms i plus 1.

05:39:07

And at the very top, you can also print norms.

05:39:10

Now if you add all of these print statements and then execute your algorithm,

05:39:15

now you will be able to see exactly what is happening inside each iteration.

05:39:19

So that's a great way to debug your code if you're facing any issues and also understand what the code does.

05:39:25

But in any case, we won't need these.

05:39:29

So I'm just going to comment these.

05:39:35

So let's test it out.

05:39:37

So we get from test 0, we get the norms as input.

05:39:43

And then we get the output.

05:39:45

And we can print the input and the expected output.

05:39:48

And then finally calculate the result by passing norm 0 into bubble sort.

05:39:52

And then printing the actual output and finally weather the two match.

05:39:58

So you can see here now that the input was this unsorted list.

05:40:04

And then the expected output was this sorted version.

05:40:07

And that's what we got.

05:40:08

So in fact, there was a perfect match.

05:40:12

And that's it.

05:40:13

So we've implemented our first sorting algorithm.

05:40:16

It was pretty straightforward.

05:40:17

A few lines of code.

05:40:19

As an exercise, you can try to implement it once again from your memory.

05:40:23

It's just write it in plain English first and then try to implement it.

05:40:27

It's a good coding practice.

05:40:32

And we can also evaluate all the test cases that we have.

05:40:35

Remember, we had created about nine test cases.

05:40:38

And to help you evaluate the test cases, we've given you a helper function called evaluate test cases,

05:40:44

which is part of the joven library.

05:40:46

So we install the joven library here.

05:40:48

Pippen, install joven.

05:40:49

And then from joven.python dsa.

05:40:51

So python dsa is the name of the course.

05:40:53

So that's also the module where we have helper functions for this course.

05:40:57

Import evaluate test cases.

05:40:59

And evaluate test cases simply goes over the list of test cases that you have.

05:41:03

And it pulls out the inputs and passes them as arguments to the function provided here,

05:41:08

which is bubble sort.

05:41:09

And then gets the outputs and compares the outputs.

05:41:12

And also prints the information with like what was the input.

05:41:15

What was the expected output in the actual output and whether they match.

05:41:19

So let's check it out.

05:41:22

So you can see here this was test case zero.

05:41:25

And that work, which we just tested out.

05:41:28

Here's a larger list including some negative numbers.

05:41:31

This worked as well.

05:41:32

You can see the test result is passed.

05:41:34

Then you have another list here.

05:41:37

This seems to work fine too.

05:41:39

This is already sorted.

05:41:41

Here you have one which is sorted in decreasing order.

05:41:44

That works.

05:41:45

Here you have one with repeating numbers.

05:41:47

That works too.

05:41:48

The empty list works.

05:41:50

The single element works.

05:41:53

And this works too.

05:41:55

This is the same element repeated over and over.

05:41:58

And finally here is the final test case.

05:42:00

This had 10,000 elements remember.

05:42:02

So you can see that this was the expected output.

05:42:04

And this was the actual output.

05:42:05

So we have successfully sorted 10,000 elements.

05:42:08

And that's really the power of programming that.

05:42:11

Without having to look at any of the numbers.

05:42:14

We've just written four or five lines of code.

05:42:16

And we've sorted 10,000 elements.

05:42:19

So all our test cases passed.

05:42:21

All the do look here that it took about 15 seconds

05:42:26

for the sorting of 10,000 elements.

05:42:30

Now maybe that's not that bad.

05:42:33

But we're looking at probably millions of notebooks every week

05:42:37

at Joven.

05:42:38

So we want there to be a faster sorting algorithm.

05:42:43

Okay.

05:42:48

So before we improve the algorithm,

05:42:51

we need to understand the algorithms complexity.

05:42:54

And identify any inefficiencies.

05:42:57

Now the core operation in bubble sort,

05:42:59

if you look at the code here once again,

05:43:01

is this operation of comparison.

05:43:03

So we're comparing a number with the next number.

05:43:06

And swapping.

05:43:07

Now comparison almost always happens.

05:43:10

And swapping doesn't happen nearly as often.

05:43:13

So if you want to find the time complexity,

05:43:16

and we want an upper bound or the worst case time complexity,

05:43:19

we can assume that roughly every comparison also leads to a swap

05:43:25

in the worst case.

05:43:26

So if we just count the number of comparisons

05:43:29

as a function of the input size,

05:43:32

the size of the list that was given as an input,

05:43:34

that should give us an idea of the time complexity.

05:43:37

Okay.

05:43:43

So here we can see that there are two loops.

05:43:46

And the length of each loop is n minus 1.

05:43:48

And inside the inner loop, there is a comparison.

05:43:51

So the total number of comparisons is n minus 1 times n minus 1,

05:43:55

which is n minus 1 square or n square minus 2n plus 1.

05:43:59

Now expressing this in the big own notation,

05:44:01

which is to get a rough idea of how the number of comparisons

05:44:06

or the number of operations in the algorithm grows with time.

05:44:10

We can ignore the lower order terms like 2n plus 1.

05:44:14

So we can now conclude that the time complexity of bubble sort

05:44:22

is order of n square.

05:44:24

And this is also known as quadratic complexity.

05:44:30

So we can now verify that bubble sort requires order 1 additional space.

05:44:35

That this is an exercise for you,

05:44:38

but here's a quick hint.

05:44:39

You can see that we are not allocating any new lists.

05:44:42

We did create a copy of the list, but we didn't have to.

05:44:45

So let's not count that.

05:44:47

But apart from that, there is no additional space that was required.

05:44:50

We are not allocating any new variables.

05:44:52

We are creating this range,

05:44:53

but remember I mentioned that a range simply contains a single variable inside it,

05:44:58

which it keeps incrementing for a for loop.

05:45:02

So we have these two ranges,

05:45:04

so maybe we have two variables assigned.

05:45:06

So it's constant irrespective of the size of the input.

05:45:10

And that's how bubble sort requires order 1 additional space.

05:45:13

Now you may be asked about space complexity,

05:45:16

and this is where it's a slightly tricky thing,

05:45:18

because sometimes strictly speaking space complexity

05:45:22

also includes the size of the input,

05:45:25

because to store n numbers or n elements,

05:45:30

you need n spaces in memory.

05:45:33

So the space complexity of bubble sort in that sense is order n.

05:45:37

And this is something you can check with the interviewer,

05:45:39

if they're asking you what is the space complexity,

05:45:42

and you can ask them if they just want to know

05:45:44

what is the additional space required.

05:45:46

So the overall space complexity is order n,

05:45:48

because we need to store the actual input list somewhere.

05:45:52

But on the other hand, the amount of additional space required is order 1,

05:45:56

which is a constant factor independent of the size of the list.

05:46:01

So that's how bubble sort works.

05:46:05

Now analyzing this order n square complexity,

05:46:09

and keeping in mind that a list of 10,000 numbers takes about 12 seconds.

05:46:15

So if n is 10,000 and n square is multiplied by some constant

05:46:19

is about 12 seconds,

05:46:21

then if you had a list that was of 100,000 elements,

05:46:27

so that would be 10 n whole square or 100 times the same amount of time

05:46:32

that it would take to sort it.

05:46:34

So that means it would take about 20 minutes to sort 100,000 numbers,

05:46:38

which I would say is a bit inefficient now,

05:46:41

and a list of a million numbers would take close to two days to be sorted in Python.

05:46:46

Now if you do it in C++, maybe it might be four or five times faster.

05:46:50

But again, the moment you go from a million to 10 million,

05:46:54

that will actually end up taking a year or so.

05:46:57

And that's bad, and that is why n square or quadratic complexity

05:47:01

is something that we would like to do away with,

05:47:07

because it grows very fast,

05:47:09

as soon as you hit maybe a 10,000 or 100,000 elements,

05:47:13

then it starts taking longer than a few seconds or a few minutes or a few days,

05:47:17

and at that point you can no longer use that particular algorithm.

05:47:21

So we need to optimize bubble sort,

05:47:25

and the inefficiency in bubble sort comes from the fact

05:47:29

that we are shifting elements by at most one position at a time.

05:47:33

So each time we go through the list,

05:47:35

we capture some information about the list.

05:47:38

But we are simply moving one element from left to right, so to speak.

05:47:43

And each time we are just moving it one at a time by doing swaps.

05:47:47

Rather it would be nice to just place elements directly,

05:47:51

maybe a few positions ahead,

05:47:53

and that's where we will look at some optimized algorithms.

05:47:58

Now another common algorithm that is used is called insertion sort,

05:48:03

and this is here is the code for insertion sort,

05:48:06

so you can look through the code for insertion sort here.

05:48:10

And here is an example, you can see how it works,

05:48:14

and we will not look into insertion sort in a lot of detail,

05:48:17

but roughly this is how you arrange cards in your hand,

05:48:20

which is by starting to move cards around,

05:48:24

so that at the maybe on the left edge you have sorted cards,

05:48:29

on the right edge you have the unsorted cards,

05:48:31

and you keep moving the new cards into sorted positions.

05:48:35

That's our works.

05:48:37

So here's an exercise for you, go through this function,

05:48:41

read the source code, and then describe the algorithm in plain English.

05:48:45

Now reading source code is an essential skill for software development,

05:48:48

this is something that you'll have to do in your work,

05:48:51

whether you're doing software development or data science,

05:48:53

maybe because there are no comments in the code,

05:48:55

there is no documentation or the person who is written the code

05:48:58

is not available or has left the company,

05:49:00

or this is some open source library.

05:49:03

So in all these cases you will have to read an understand code,

05:49:06

so read it and then describe insertion sort the algorithm in plain English,

05:49:10

then look it up online and see if it matches what you've written.

05:49:14

And then second is to also determine the time and space complexity of insertion sort,

05:49:19

and see if it is any better than bubble sort,

05:49:22

and explain why or why not.

05:49:24

So these are a couple of exercises for you.

05:49:28

So that's bubble sort and insertion sort.

05:49:31

Now before we continue,

05:49:33

I just want to recall you that this is a Jupiter notebook,

05:49:37

running on an online platform,

05:49:39

hop.binder.jovind.ml and since this is free,

05:49:42

it will start down after some time,

05:49:44

so you want to capture snapshot of your work at regular intervals

05:49:48

and that's where you can use the Jovind library.

05:49:51

So you install the Jovind library using paper and install Jovind,

05:49:54

import Jovind and then run Jovind.com it.

05:49:57

Now when you run Jovind.com it captures a snapshot of this Jupiter notebook

05:50:01

and puts it on your Jovind profile.

05:50:04

So now this will be your profile when you run Jovind.com it

05:50:07

and you will be able to resume your work

05:50:09

by clicking the run button on this page anytime.

05:50:12

And this notebook will go to your profile,

05:50:14

so you can just click on your Jovind profile or just click home here.

05:50:18

And if you check either the overview or the notebook step,

05:50:22

you should be able to find your notebook here.

05:50:27

Like here you go.

05:50:28

Okay, coming back now,

05:50:30

where it steps six where we want to apply the right technique

05:50:33

to overcome the inefficiency in the algorithm.

05:50:37

Now to perform sorting more efficiently,

05:50:39

we will apply a strategy called divide and conquer.

05:50:42

And divide and conquer is a very common strategy

05:50:45

of used across the board for many different kinds of algorithms.

05:50:49

And it has this general steps that is applied

05:50:53

in different ways across different problems.

05:50:56

So step one is to divide the inputs into two roughly equal parts.

05:51:00

Okay, they don't have to be exactly equal,

05:51:02

but two roughly equal parts.

05:51:04

And the idea here is that those two parts

05:51:08

can themselves be used as inputs as sub-problems.

05:51:11

So then we use recursion,

05:51:13

so we recursively solve the problem individually

05:51:17

for each of the two parts.

05:51:19

So here you have a problem,

05:51:21

you have created two sub-problems out of it

05:51:22

and then you call recursion.

05:51:24

So the recursion solution itself will use divide and conquer

05:51:27

and then we'll keep going and so on.

05:51:29

But once it gives you the solution,

05:51:31

combine the results to solve the problem

05:51:34

for the original inputs.

05:51:36

Okay, so you have now results of the sub-problems

05:51:39

and you combine them and you get back the final result.

05:51:42

And then the only last thing you need to know

05:51:45

is because you're going to keep calling

05:51:48

this, keep doing this division recursively.

05:51:50

So if you have an input of size 100,

05:51:52

you will call the same function on inputs of size 50 and 50,

05:51:58

then you will call the same function for each of those 50

05:52:00

will call the same function on inputs of size 25 and 25.

05:52:03

So each half and as you keep going,

05:52:05

you will eventually end up with small or invisible inputs.

05:52:08

And that is where you can solve the problem directly

05:52:11

and include terminating conditions.

05:52:13

So that's where the recursion stops.

05:52:15

Okay, so you include terminating conditions

05:52:17

for small or invisible inputs.

05:52:20

So that's divide and conquer.

05:52:22

You take the problem divided into two sub-problems,

05:52:24

recursively solve the sub-problems,

05:52:26

get the solutions of the sub-problems and then combine them.

05:52:29

So you can also call it divide-conquer-combine, in some sense.

05:52:35

And merge sort is the algorithm

05:52:38

that is the classic application of divide and conquer

05:52:41

to the sorting problem.

05:52:42

So let's take a look at merge sort

05:52:44

by looking at an example visually.

05:52:47

So here we have a list that needs to be sorted

05:52:50

in increasing order.

05:52:53

So remember, step one, divide the problem into two sub-problems.

05:52:58

So here we have half the list, a little more than half.

05:53:01

Here we have another half.

05:53:02

So we have split it into four elements and three elements.

05:53:05

Then we call recursively, we call the same sorting problem,

05:53:10

the same algorithm on these two.

05:53:12

So we split 38 and 27 into one half and 43 into another.

05:53:17

Here 982 becomes one half and 10 becomes the other.

05:53:21

Again, we can split 38 and 27.

05:53:23

We can split 43 and 3, 982, 10.

05:53:26

So now we've ended up with single elements.

05:53:29

So with recursion, we've ended up at this

05:53:32

terminating condition.

05:53:33

We can no longer split the list.

05:53:35

So now we start combining the problems.

05:53:37

Now if you're looking to sort a list with just one element,

05:53:41

38, well that list is already sorted.

05:53:44

So you can return that.

05:53:45

And 27 is already sorted, the single element.

05:53:48

So you return that.

05:53:49

Now we have these two sub lists and we need to combine them.

05:53:52

Each has one element, so we can simply compare these two elements.

05:53:55

And we can tell that 27 comes first and 38 comes second.

05:53:59

So that's how you combine these two results to get 27, 38.

05:54:02

Then similarly with 43, you combine them to get 343 and you get 982 and 10.

05:54:09

Next you can combine these two results.

05:54:12

So this is where now the combination is important.

05:54:15

Okay, we need to look through and we can probably tell that three should come first.

05:54:18

And then 27 and then 38 and then 43.

05:54:22

So we've combined them here.

05:54:24

And similarly here we've combined 9, 10 and 82.

05:54:27

And then we take the final results.

05:54:30

These two final lists and then we combine them back to get the fully sorted list.

05:54:34

Okay, and we'll talk about this combination or what is called the merge operation

05:54:38

in a lot more detail.

05:54:40

Soon, but this is roughly the idea here.

05:54:43

You keep splitting it into half and then you combine the halves.

05:54:47

So let's now state it in plain English.

05:54:50

So first, the dominating condition if the input list is empty or contains just one element,

05:54:56

then it is already sorted or returned it.

05:54:59

If it is not, divide the list of numbers into two roughly equal parts.

05:55:04

Then sort each part recursively using the merge sort algorithm.

05:55:10

And by the power of recursion you will get back to sorted lists.

05:55:16

Then merge the two sorted lists to get a single sorted list.

05:55:20

And this is the key operation here and this is why it's called a merge sort.

05:55:23

Because we are always merging sorted list and making bigger and bigger sorted lists out of them.

05:55:29

And the merge operation is something that you may be asked to write in an interview or a coding challenge,

05:55:34

apart from the whole merge sort operation itself.

05:55:37

So this is something that you can try to explain yourself.

05:55:41

So try to think about how the merge operation might work and explain it in your own words.

05:55:45

Here is some space for you.

05:55:48

But let's jump into the implementation of merge sort then.

05:55:55

Now we will implement merge sort, assuming that we already have a helper function called merge.

05:56:02

And this is a very useful trick where your program may need some complicated piece of logic or some logic which you have not figured out yet.

05:56:12

So all you do is assume that you already have the function and write use it first and then implement it later.

05:56:20

So here's a merge sort algorithm.

05:56:23

So now we have the merge sort algorithm and we have numbers here given as an input to merge sort.

05:56:29

Now here's the terminating condition.

05:56:31

If the length of numbers is less than equal to one which means if the list is empty or has just one element return the numbers.

05:56:39

Then if not, then get the midpoint.

05:56:42

So return length of numbers divided by two.

05:56:46

And remember using the double slash share because a single slash would return a decimal.

05:56:51

And we cannot use a decimal as an index or a position in the list.

05:56:55

So that's why we using the double slash share.

05:56:57

So we take the length of numbers divided by two.

05:57:00

So if the size of the list is 10, so we get back five year.

05:57:04

Then we split the list into two halves and here's some interesting syntax view.

05:57:08

And let's look into what the syntax actually means.

05:57:11

So let's say you have a list.

05:57:18

So this is the list we have and let's admit has the value.

05:57:23

Well we can check it here one, two, three, four, five, six.

05:57:26

So six elements by two, mid has the value three.

05:57:31

Now let's check x of mid.

05:57:34

What does that give us?

05:57:36

Well that gives us one, three, five.

05:57:38

Well actually x of colon mid means x of zero to mid.

05:57:43

And x of zero to mid means all the elements from position zero.

05:57:48

Till before the position mid.

05:57:50

So that's very important.

05:57:51

Once again it's like a range.

05:57:53

So you get the indices at position zero, one and two, not at position three.

05:57:58

Okay, so that gives us these three elements.

05:58:01

Then let's check the other thing.

05:58:03

x of mid colon.

05:58:06

Now what this gives you is this gives you the elements starting from the position mid.

05:58:12

All the way to the end.

05:58:14

So you can also write here minus one or we can also write here.

05:58:17

Lenn of x minus one, but or we can just skip it.

05:58:22

And Python will automatically interpret that you want all the elements starting from mid to the end.

05:58:27

That is 12, five and one. So position three, four and five.

05:58:31

And hence to split the list all we need is to invoke this.

05:58:38

13, five and 12, five, one.

05:58:41

We get back to parts of the list.

05:58:44

So this is a nice thing about Jupiter whenever you don't understand a line of code.

05:58:47

Just create a cell above or below and try out a simple example.

05:58:51

So now we have the left half, numbers zero to mid and then the right half.

05:58:57

So numbers mid colon.

05:58:58

Now here's where the magic happens. We call the function recursively.

05:59:02

So we call the merge sort function itself.

05:59:05

So we call merge sort on left and that gives us back a list, a sorted list for the left half called left sorted.

05:59:12

And then we call merge sort function right and that will give us back a sorted list called right sorted.

05:59:18

And then we combine the results of the two halves by calling the merge operation.

05:59:22

So now we are now saying that we want to merge left sorted and right sorted.

05:59:26

To get back the final sorted numbers and then we return the sorted numbers.

05:59:31

So that's merge sort.

05:59:33

So yeah, it's almost seems like magic but it's pretty small, pretty straight forward.

05:59:40

Only about four, five lines of code if you combine some of these lines.

05:59:45

So then let's come to the merge operation because that seems to be the meat here, right?

05:59:50

This is the only missing piece.

05:59:52

So to merge two sorted arrays, what we can do is we can repeatedly compare the two least elements of each array

05:59:59

and copy over the smaller one into a new array.

06:00:02

So here's what that process might look like.

06:00:04

Let's say you have these two parts, one, four, seven and zero, two, three.

06:00:07

And we want to get this sorted list and notice that these are both already sorted because these are the results of the recursive calls to merge sort.

06:00:15

So we keep a pointer on the left on each one.

06:00:18

So here we have the pointer at one here, we have the pointer at zero.

06:00:21

We compare the two.

06:00:22

We take the smaller one and put it in the list.

06:00:25

How do we know we can put it?

06:00:27

Because if this is smaller than this, all these numbers are also greater than zero.

06:00:32

And then since one is greater than zero and all these numbers are greater than zero.

06:00:36

A greater than one.

06:00:37

So that follows that all the other numbers to the right of one and to the right of zero are greater than zero.

06:00:43

Hence zero should come in the first position.

06:00:46

So we put it there and advance the pointer.

06:00:48

Now you can see here now we can compare one and two in this time one is smaller.

06:00:52

And you know that all the numbers here are greater than two.

06:00:55

So they're also greater than one.

06:00:57

And then all the numbers here are also greater than one.

06:00:59

Hence we know that one is now the next largest number.

06:01:02

So we can now put in one and advance the pointer.

06:01:05

And keep going.

06:01:06

This time now you compare two and four.

06:01:08

So now you can put in two and advance the pointer.

06:01:10

Now you put in three and then advance the pointer.

06:01:12

And at some point you will exhaust one of the lists.

06:01:15

And when you exhaust one of the lists then you can stop comparing and you can simply copy over the remaining elements.

06:01:20

So we can now copy over four and seven and we've exhausted this list.

06:01:23

And we get back the sorted master is zero one two three four seven.

06:01:27

So it's really simple.

06:01:28

It involves each step involves one comparison and incrementing one pointer.

06:01:33

So you're either incrementing this pointer or you're incrementing this pointer.

06:01:38

Okay.

06:01:39

So let's now define the merge operation.

06:01:41

And you can see the benefit now of assuming that the function already existed.

06:01:45

Now we do not have to worry about the actual sorting and recursion etc.

06:01:50

We simply have to worry about merging two sorted arrays.

06:01:58

So first we'll create a list to store the results.

06:02:01

And we have numbers one and numbers two the two left and right list that we are going to combine.

06:02:05

Then we're going to set up two indices or two numbers for iteration.

06:02:09

So we have two pointers on the two lists.

06:02:11

And we set up each of them at position zero.

06:02:15

So each of them are currently at position zero here.

06:02:18

And we loop over the two lists.

06:02:20

So we say while I less than line of numbers one and while j less than line of numbers two.

06:02:25

So if you have four elements in the left list then I can go from zero to three all four positions.

06:02:32

And if you have five elements in the right list, j can go from zero to five zero to four all five positions.

06:02:38

Then we check and we remember we want to make sure that both of these indices are valid.

06:02:45

If any of those have reached the end then you want to skip and we can simply copy over the remaining list.

06:02:51

Right. So as you see here as soon as we reach the point there's no more comparisons to be made.

06:02:55

So we can exit the loop.

06:02:57

So now we check which one is smaller.

06:03:00

So if we if numbers one I so the left list current element is smaller than numbers two j.

06:03:06

Then we append to the merged list numbers one I as we did here and we increment I so this is exactly what we've written done here.

06:03:15

So we put in.

06:03:17

Well let's say here. So we put in one here and we increment the left pointer.

06:03:23

On the other hand if that's not true.

06:03:25

We append the element from the right so norms two j and we increment the right pointer.

06:03:30

So in each case in each while loop we are incrementing one of the pointers.

06:03:35

And then when the while loop ends one of the lists would have been exhausted.

06:03:40

That's when the while loop ends. So we can get the remaining parts of both the lists.

06:03:44

So we can get numbers one I colon will get the remaining elements on the first list.

06:03:48

The left list numbers two j colon will get the remaining elements on the right list.

06:03:52

But remember since one of them is exhausted. So one of these two is going to be empty.

06:03:57

Right. Now we we can check which one is empty and simply add the remaining one.

06:04:02

But here's a simpler solution. We just add both of them to the mercenary.

06:04:07

So we append both the lists at the end and this automatically takes care of the empty case.

06:04:12

If the left side becomes empty then this adds nothing to the mercenary and this adds the remaining numbers from the right side.

06:04:18

If the right side becomes empty then this adds the remaining numbers from the left side and this adds nothing.

06:04:23

So that's a small trick.

06:04:26

So that's the merge operation. Again, not very difficult.

06:04:30

If you have any questions, take this out into specific cells and try it out with examples and you should see it working.

06:04:39

So let's try out the merge operation now. So here we have two sorted lists you can see here.

06:04:49

And there you go. You can see that this is now arranged all these numbers are now arranged in a sorted order.

06:04:57

So now we have the merge operation and we have the merge sort operations. So we can now test out the merge sort function.

06:05:03

So we get the first set of inputs and outputs from test zero.

06:05:11

And you can see here that this is the input and this is the expected output and this was the actual output as well.

06:05:20

Now let's test all the cases using the evaluate test cases function from Joven.

06:05:25

So here we're simply going to call evaluate test cases on the entire list of test cases.

06:05:31

And you can see all the test cases seem to be passing.

06:05:37

Now if one of these test cases had failed what you should do is you should go back and add some print statements inside your merge function or add some print statements inside your merge sort function.

06:05:47

The right places to add the print statements is right after the function definition right after in the body of the function it can be the first statement and then inside each loop.

06:05:56

So inside each loop whatever are the changing parameters you should print them inside the loop.

06:06:00

And then finally you can also print the return value of the function.

06:06:03

And this way you can build a full picture of what your function is doing and that makes it much easier to solve issues.

06:06:12

So test cases and print functions make it easy to fix errors in code and don't worry if there are there are always errors in code.

06:06:20

What's important is you should be able to find a way to fix them easily and without test cases or without printing you may get stuck and you may just keep staring at the code and trying to figure out what exactly went wrong.

06:06:32

So please do that.

06:06:35

Now one last thing I wanted to notice is here the execution took only about 50 milliseconds.

06:06:42

On the other hand remember bubble sort took about 15 seconds to sort 10,000 numbers.

06:06:51

So that's merge sort is much much faster right a millisecond is 0.001 10 to the power minus 3 seconds.

06:06:58

So in a second you can probably sort 200 of 200 list of size 10,000.

06:07:04

And that's what makes merge sort so much more powerful and because it is so much more efficient and as we analyze the complexity you will learn that merge sort is in fact more efficient in terms of the big on rotation as well.

06:07:18

So let's analyze the algorithms complexity and identify if there are any inefficiencies.

06:07:25

Now analyzing recursive algorithms can get tricky and that's where it helps to track and follow the chain of recursive calls.

06:07:32

So what we will do is we will add some print statements to our merge sort function and our merge function.

06:07:37

So we'll simply see what the merge sort function was involved with.

06:07:41

Okay, so we'll add a print statement inside merge and we'll add a print statement inside merge sort both of them and we're also tracking something called a depth to track the chain or the depth of each recursive call.

06:07:53

And you'll see what I mean in just a second.

06:07:56

Okay, so this is what it looks like.

06:07:58

We called merge sort on this big list of elements unsorted and that merge sort internally led to two calls of merge sort.

06:08:06

Let's see this one here and this one here. So you have two calls to merge sort.

06:08:10

One with the left half of the list and one with the right half of the list and they're unequal.

06:08:16

And these two merge sorts finally returned merge lists and we finally called a merge operation on the two of them.

06:08:24

You can see that this is the merge operation, the final merge operation called here on the two merge sort lists.

06:08:32

And this merge operation is working with these two sorted lists, okay.

06:08:36

So we can see that each merge sort invokes itself in works merge sort twice, but this time with an area of half the size.

06:08:43

You can see merge sort was invoked with arrays of or lists of half the size.

06:08:48

And it also invokes the merge function once to merge the two resulting arrays, the two sorted arrays.

06:08:55

Now the two calls to merge sort if you observe closely, they themselves make two more calls to merge sort.

06:09:00

And one more call to merge.

06:09:03

And then those internal calls make two more calls to merge sort and one more call to merge and so on.

06:09:09

Till we end up with single elements, at which point merge sort simply returns that single element.

06:09:17

So the merge sort algorithm ultimately points it out to a series of merge operations.

06:09:24

You can see here that each merge sort all its doing is calling merge sort internally and then calling a merge operation.

06:09:29

So ultimately what we are doing is we are first merging five and minus 12.

06:09:34

And then we are merging two and six.

06:09:38

And then we are merging minus 12 five and two comma six and then we're merging 123 and we're merging

06:09:44

7 minus 12 and then we're merging 7 minus 12 seven and finally we're merging 123 minus 12 77 and then finally we're merging the big list.

06:09:55

list, right? So it's ultimately just a whole bunch of merge operations and if you look

06:09:59

inside the merge operation, this is where a comparison is happening and this is where this

06:10:05

append step is happening. So we are comparing and upending. So those are the two key operations

06:10:11

here and with every comparison there is a append. So if you simply count the comparisons once again

06:10:17

that's happening that should be enough to get the time complexity. And what is the number of

06:10:22

comparisons that's happening? Well, that's straightforward too. If you have two lists,

06:10:27

numbers one and numbers two, each and the total length of the two lists is n. So because the

06:10:33

size, the number of iterations is equal to in the worst case it would be equal to the lengths

06:10:39

of the two lists combined. So you may have to first maybe increment i by one that increment

06:10:44

j by one then once again increment i by one and j by one. So the total number of iterations here

06:10:50

is ln of nms 1 plus ln of nms 2, right? But remember the merge was called if merge sort was

06:10:57

called with a list of size n then merge was called with a list of size n by 2 and n by 2 roughly.

06:11:03

So the total list, the total length of nms 1 plus nms 2 is actually the overall length n. So that's

06:11:10

the real trick here that merge, the merge operation is an order n operation where n is the number

06:11:16

of elements, the total number of elements. So this merge operation takes 4 plus 5 9 comparisons

06:11:25

and this merge operation takes 5 comparisons and this merge operation takes 3 comparisons and so on.

06:11:34

Now this way now we visualize a problem now as a tree where we're calling merge sort with n l with

06:11:39

n elements and that ends up calling merge sort with n by 2 elements and that ends up calling

06:11:43

merge sort with n by 4 elements all the way down and then we start merging. So here when we get

06:11:49

to individual elements we are calling merge with literally single elements and as we come up here

06:11:55

we are calling merge at this point we are calling merge with elements of size n by 8 and n by 8

06:12:02

but we are calling merge 8 times. So now each of these sub problems makes a call to merge and each

06:12:10

of these sub problems has the list size n by 8. So you have 8 calls to merge of size n by 8.

06:12:16

So the total number of comparisons done is n and at every stage you can check this at the top

06:12:23

level you are calling merge with n total elements. So the total number of comparisons is n

06:12:29

at the second level you're calling merge here once with n by 2 elements and you're calling merge

06:12:33

here once with n by 2 elements. So the total number of comparisons is 2 times n by 2 that's n

06:12:39

and here you're calling merge with n by 4 elements 4 times. So that's n.

06:12:48

So if the height of the tree is h then the total number of comparisons is n times h. So

06:12:56

on each level you'd require n comparisons for the merge and you call merges at every level

06:13:01

for each of these sub problems. So the height of the tree is so the total number of comparisons

06:13:06

is n times h. Now how do we get the height of the tree? If the height of the tree is h and you can

06:13:14

see here that as we go down it this is level 0 and it has 1 element this is level 1 and it has 2

06:13:22

elements this is level 2 and it has 4 sub problems and this is level 3 and it has 8 sub problems.

06:13:28

So level k has 2 to the power k sub problems. So if you keep going down this is level h minus 1.

06:13:35

So level h minus 1 should have 2 to the h minus 1 sub problems. But remember at the last level

06:13:41

we simply have sub problems or merge merge calls with single elements. So that means we have a total

06:13:47

of n elements here or n leaf nodes here. So it follows that 2 to the power of h minus 1 is n.

06:13:58

So I'll let you think about that in reason with that. This is something that you may have to

06:14:02

work out on pen and paper to get correctly that if the height of the tree is h then 2 to the

06:14:09

power h minus 1 is equal to n because that the bottom most layer you have n leaves in the tree.

06:14:16

So it follows that h is log n plus 1. So since we said that there are n times h comparisons

06:14:22

and h is log n plus 1. So it follows that the complexity of merge sort is n log n.

06:14:29

And that's a big improvement from n square. It may not seem like much but it is. So n square for

06:14:37

10,000 is 10,000 times 10,000 but n log n for 10,000 is 10,000 times 12 or 13 log to the base 2.

06:14:50

So that's about a few hundred times faster. Now even for an area of a million elements it will

06:14:56

only take a few seconds to be sorted and you can verify this by actually creating a list of

06:15:01

a million elements. So the complexity of merge sort is n log n and you get it by drawing

06:15:06

this subproblem tree and realizing that there are you get a subproblem tree of height log n

06:15:12

or log n plus 1 and at each step you perform a merge operation on multiple merge operations

06:15:20

totaling to n comparisons. So n times log n is the complexity of merge sort.

06:15:26

Now here's also a discussion about space complexity and this is something that I will leave

06:15:31

as an exercise for you. So do read through this and see if you can reason why the space complexity

06:15:37

of merge sort is order n. So time complexity is order n log n and the space complexity is order n.

06:15:46

But here's a hint why it's order n. You can see that inside the merge operation we are

06:15:51

creating a new list and then we are copying over elements from each of the two lists into the

06:15:55

new lists. So we are allocating a new list inside merge. And now it's so now that's no longer

06:16:04

constant that list will have the same size as the size of the problem itself and hence roughly

06:16:09

that's why the space complexity is order n. Okay. So with that we conclude our discussion of

06:16:17

merge sort it's a divide in conquer algorithm you split the list into half recursively

06:16:21

sort both of them then merge the two sorted lists and the initial condition is 1 or 0 elements.

06:16:28

Now there are several extensions and variations of merge sort called the KV merge sort where we split

06:16:35

not into two parts but into K parts when we have the counting inversions problem where

06:16:42

we modify merge sort a little bit to also find some other information about the list and finally

06:16:48

we have hybrid algorithms which combine merge sort and insertion sort. So what they do is for smaller

06:16:54

list they use insertion sort because that's more efficient and then for bigger list they use merge

06:16:59

sort. So as just splitting the list when you get to a small enough problem let's say 10 or less

06:17:04

elements they use insertion sort and that brings us to our next question where we make one level

06:17:16

of optimization and then we stop but here we will go one step further what we do is we will apply

06:17:23

another technique to overcome the inefficiency in merge sort. Now the time complexity is pretty good

06:17:29

you can actually sort millions or even tens of millions of elements with merge sort quite reliably

06:17:35

but it's a space complexity that causes a problem. Now because merge sort requires allocating

06:17:41

additional space and that additional space is as large as the input itself that makes it somewhat

06:17:47

slow in practice because memory allocation is more expensive than computations. So doing a

06:17:53

comparison is very easy you just tell the CPU to compare two things in the memory or stopping them

06:17:58

is also easy because you're still working with memory that you already have but when you have to

06:18:03

allocate new memory you often have to then request the operating system to allocate the new

06:18:08

memory and you have to get its address and do a whole bunch of operations so it's let's say an

06:18:14

order of magnitude more expensive than simply doing some computations so you should try and avoid

06:18:20

memory allocations as far as possible. Now one or two variables is fine but if you're dealing with

06:18:24

a million elements so you're probably going to need maybe a few MB of additional space and that is

06:18:30

what may slow down your algorithm a little bit. It would still be analog and but the constant

06:18:38

factor now the cost of each operation will be higher because it involves an allocation.

06:18:45

Now to overcome the inefficiencies the space and efficiency of merge sort we will study

06:18:49

another device and conquer based algorithm sorting algorithm and this is called quick sort and quick

06:18:56

short sorts the array in place which means it does not create a copy of the array internally

06:19:03

for sorting inside each operation inside each combination operation. So let's see how it works

06:19:07

it's a pretty interesting pretty smart trick. So here's how it works if the list is empty or has

06:19:13

just one element return it it's already sorted straight forward then pick a random element if not

06:19:20

pick a random element from the list. Now this element is called a pivot now there are many

06:19:27

strategies for picking a pivot one is to pick a random element one is to maybe pick the first

06:19:31

element the last element what we will do is we will pick the last element but you can easily

06:19:38

augment our implementation to pick a random element and then reorder the list and this is the

06:19:46

key operation here reorder the list so that all the elements with values less than or equal to the

06:19:51

pivot come before the pivot element while all the elements with values greater than the pivot

06:19:58

come after the pivot element and this element is called partitioning your partitioning the array

06:20:03

around the pivot. So here's an example you let's say we take three as the pivot element the final

06:20:10

element now what we want to do is we want to reorder the elements and the way we reorder is by doing

06:20:15

swapping and comparison in whatever way we can and that's what we will really focus on the partitioning

06:20:20

algorithm and you reorder it in such a way that all the numbers to the left of the pivot are

06:20:28

smaller than it and all the numbers to the right of the pivot are larger than it. Now here's the key

06:20:35

observation here once you do that you can tell that all these all these numbers can now be sorted

06:20:43

independently and none of the numbers from here will move to the right of pivot and similarly all these

06:20:48

numbers can also be sorted independently and none of the numbers here will move to the left of the pivot.

06:20:52

So the pivot is in the correct position in the final sorted array so it's now in its correct

06:20:57

final position and you can simply call quicksawton this half or less than half this portion of the array

06:21:04

and this portion of the array and there's no real combination required anymore right so because we're

06:21:10

doing it all in place we simply call quicksawton each side of the array and once this gets sorted

06:21:17

and this gets sorted recursively then you will have end up with the entire sorted list right and that's

06:21:23

that's how we then continue doing the process recursively now on the left half you once again pick

06:21:27

a pivot and then you arrange the elements around the pivot on the right half you once again pick a pivot

06:21:32

and arrange the elements around the pivot and so on and so on okay so as I said the key observation

06:21:38

here is that after the partition the pivot element is at its right place on the sorted array

06:21:42

the two parts of the array can be sorted independently in place now maybe once again take pen and

06:21:48

paper and try to work it out yourself again all of this makes a lot more sense when you

06:21:55

actually put it down and solve a real problem as a real example so here's an implementation of quicksawt

06:22:02

and once again we will assume that we already have a helper function called partition

06:22:07

which can pick a pivot partition the array and return the position of the pivot element

06:22:12

for the next quicksawt step okay so this entire process going from here to here this is where we

06:22:18

assume that we have a function and write the quicksawt algorithm and then implement the partition

06:22:23

function so here's what quicksawt might look like now quicksawt takes a bunch of numbers

06:22:30

and apart from the numbers it also takes a start index and end index now why we doing this

06:22:35

remember we want to avoid creating copies of the list that's a whole that's the whole thinking here

06:22:40

the line of thinking so we will call quicksawt not with a sublist which is which is a copy of

06:22:49

a portion but we will call quicksawt simply by changing the by passing the same original list

06:22:55

but by changing the start and end index okay now there is some code here if end is none

06:23:02

then we are setting end to the length of the list minus 1 and here's one more thing that we're doing

06:23:08

so the final invocation to quicksawt that we'll make will be something like this we may call quicksawt

06:23:17

let's say there are a few numbers here so we may call quicksawt

06:23:26

on a list something like this and in this case automatically start will have the values

06:23:30

you don't end will have the value none now remember the quicksawt is going to sort the array

06:23:34

in place but we also said that we don't want to modify our test cases so here's one assumption

06:23:40

we are making that if end is none which means if quicksawt is called just with the list then we'll

06:23:46

create a copy of the list right so we'll just create one copy at the very beginning right when the

06:23:51

list is passed for the first time and then we'll not create anymore copies and you can even

06:23:56

skip this line entirely but the only trouble is that we'll start changing our test case input so

06:24:02

that's why let's keep it and let's keep a copy but this is only done at the very top level right

06:24:08

so only when we start we create a copy so that we are not modifying the input list but never again

06:24:15

so that's what we're doing here creating a copy if quicksawt was called with a list and setting end

06:24:20

to landmarks-1 which is the final valid index in the list anyway putting this aside this is the

06:24:27

real condition here so if start is less than end which means let's say here you have start and

06:24:34

here you have end now if start is less than end that means you have two or more elements right

06:24:39

if start and end are equal that means you have just one element and if start is greater than end

06:24:45

that means you have zero elements really so if start is less than end that means if you have

06:24:50

at least two elements then we call the partition function we call the partition function on Nm

06:24:56

and we say that we want to partition the region start to end so let's say this is the region

06:25:00

start to end, we want to partition it so we want to pick a pivot and then partition it in such a

06:25:07

way that elements to the left of the pivot are smaller than it and elements to the right

06:25:12

of the pivot are larger than it for example if you want four to be the partition element four

06:25:17

to be the pivot element, then we will partition the area as 3, 4, 5, 23, so that

06:25:25

3 is smaller than 4 and 5, 23 are bigger than 4 and we will return the position

06:25:29

of the pivot element, okay.

06:25:31

So now you partition the area and return the position of the pivot element, so this is

06:25:37

the position we get back and then we can call Quixot on this region and on this region.

06:25:42

So we can now call Quixot on start to pivot minus 1 and we can call Quixot on pivot

06:25:49

plus 1 to end, okay.

06:25:51

So now we are passing actually explicitly passing in values for start and end, so this

06:25:57

will not kick in the next time, so no more copies of the list will be created, so all

06:26:00

the recursive calls will keep modifying in place, so all the, even the partition call will

06:26:06

modify in place and we will see how partition works in just a moment, so partition gets

06:26:10

the slice of the original list and it returns the position of the pivot element, then

06:26:16

we call Quixot on the left slice, which is before the partition, the elements smaller

06:26:21

than the partition and then we call Quixot on the right slice, which is elements that

06:26:27

come after the partition, okay.

06:26:30

Now here is how the partition operation works, it is pretty straightforward to not

06:26:37

that difficult, so what we do is we will pick the final element as the pivot element,

06:26:43

but if you do not want to pick the final element, you want to pick a randomized element

06:26:46

well, just pick a random position and move that element to the final position and that

06:26:51

is as good as picking the final element now, so random pivot simply involves picking an element

06:26:57

moving it to the final position, but assuming the pivot is in the final position, we

06:27:02

then keep two pointers left and right, now remember we want to create, we want to push

06:27:09

all the numbers smaller than the pivot to the left and we want to push all the numbers

06:27:14

larger than the pivot to the right, okay and what we will do ultimately is we will arrange

06:27:19

them in such a way that some of these are smaller than the pivot and some of these are

06:27:23

larger than the pivot and then we will move the pivot between them, so we will see how

06:27:28

to do that, so you have the left pointer and the right pointer, now here is what we

06:27:32

do inside partition, while these two pointers are far away from each other, first we check

06:27:40

if the element at the left pointer is smaller than the pivot, well if the element at the

06:27:45

left pointer is smaller than the pivot which it is, you simply advance the left pointer

06:27:49

forward, so this goes to 5 and then we go back to the next loop, now this time once again

06:27:57

we check if the element that the left pointer points to is smaller than the pivot, 5 is

06:28:03

not smaller than 3, 5 is greater than 3, so if that is the case then we check if the right

06:28:09

pointer is greater than the pivot, now if the right pointer is greater than the pivot that

06:28:15

means this number is in its right position, it is greater than the pivot, so we move the

06:28:20

right pointer back one space, okay, so that is this operation we just did, now once again

06:28:26

we check is the left pointer smaller than the pivot, no it is not, is the right pointer

06:28:32

greater than the pivot, no it is not, so that means these two numbers are out of place,

06:28:37

right we ideally would want this to be smaller than the pivot and this to be larger than

06:28:41

the pivot, so we swap these two elements, so now 0 comes here and 5 comes here, now once

06:28:46

again we can check is 0, the left pointer smaller than the pivot, yes, so move the left

06:28:52

pointer forward, then we is the left pointer smaller than the pivot, no 6 is now greater

06:28:56

than 3, so we check is the right pointer larger than the pivot, yes, so we move the right

06:29:02

pointer forward because 5 is still in its correct position, it is you know on the right

06:29:06

edge and everything is greater than 3, so now once again we end up in this position that

06:29:11

the left element is smaller than the is larger than the pivot, so we check the right

06:29:15

element, the right element is smaller than the pivot, we want it to be larger, so we

06:29:22

swap these two because these two are once again out of order and now you can see that

06:29:26

102 are all smaller than the pivot and 6 5 11 are all larger than the pivot, so we

06:29:32

do one final check is 2 smaller than the pivot, yes, so we advance a left pointer and now

06:29:39

both of the pointers are at the same position, so now we can tell at this point that

06:29:46

here from this point position onwards all of these numbers are larger than the pivot,

06:29:53

so we simply swap this element with the pivot, so there you go, so you end up with 102,

06:29:59

3, 5, 11 and 6, okay, so that is the partition operation, so again to understand it yourself,

06:30:06

do it on pen and paper, right out this area, create the pivot, create the left pointer,

06:30:14

right pointer and keep creating copies of the array for each step of the loop, okay, and that's

06:30:19

how you understand these things, it's not that difficult, it's just it involves two pointers,

06:30:24

so it's a little tricky, now this is the code for partition and I will let you follow this code,

06:30:31

we'll go over this briefly but by this point, since we're halfway into the course now,

06:30:38

you should be able to read the code and then there are also comments here and understand

06:30:42

what we have just discussed in plain English, understand that in terms of code, okay, so one

06:30:48

exercise for you is to explain this visual approach in plain English, step by step and then the

06:30:54

second exercise for you is to read the code and understand it, or maybe even try to write it from

06:31:01

memory, so just take the English description and try to write the partition function from your

06:31:05

memory, not memorize the code itself, but convert the English text into code, okay, so once again

06:31:13

here, you know, we have the numbers that need to be partitioned, the start and the end,

06:31:19

and if end is not, we simply set end to the last end x, which is land numbers minus 1,

06:31:24

then we initialize the start and end pointers, so we initialize the left and right pointers,

06:31:30

remember we want to use the end element, so this is the end element, so we want to use the end

06:31:34

element as the pivot, so the left pointer is start and the right pointer is n minus 1,

06:31:40

that is what we have set here, and then why, while the right pointer is greater than the left pointer,

06:31:46

we increment the left pointer if the number at the left pointer is less or equal to the pivot,

06:31:51

we decrement otherwise we decrement the right pointer if the number on the right pointer is greater

06:31:58

than the pivot, otherwise the two of them are out of place and they can be swapped, so we swap

06:32:03

them here and finally we place the pivot in place between the two parts and that's it,

06:32:10

that's exactly what's happening here, so let's see here, let's see this partition,

06:32:14

we are taking this list and we are calling partition on it, and three is the number that was

06:32:20

used as the pivot, so now three ends up here in between, so you have 102 and 5116,

06:32:26

and the partition function returns the position of the pivot, so now you can see how it is used

06:32:32

in QuickSort, the partition function returns the position of the pivot and then we call QuickSort

06:32:36

on the left partition, before the pivot and on the right partition after the pivot,

06:32:47

so now we can test out QuickSort, okay, and here's another exercise for you,

06:32:52

add print statements inside the partition function, so there are already some print statements,

06:32:56

you can simply uncomment them, uncomment the print statements to display the list, the left pointer

06:33:01

and the right pointer at the beginning at end of every loop, to study how partitioning works,

06:33:06

and similarly you can also add print statements inside the QuickSort function, to study how the

06:33:10

recursive calls are going on, so study what we've done in margin merge sort and add the same

06:33:16

print statements in Quick and QuickSort and look at these recursive calls, now what you want is

06:33:23

to have a completely clear and perfect idea of what your code is doing, you don't want to be

06:33:28

lost about it and that's why adding print statements and looking at small examples and making sure

06:33:33

that's working perfectly really helps, so let's look at QuickSort and Action, so here's an input

06:33:41

and here's the expected output and here's the actual output and they match great and we can now

06:33:46

evaluate all the test cases using the evaluate test cases functions for function from Joven,

06:33:52

so we import from Joven.python.dec, evaluate test cases, and call evaluate test cases here,

06:34:01

and you can see that it passes all the test cases and not only that you will also notice that it

06:34:09

is marginally faster than merge sort for sorted lists, sometimes you may not see that, but yeah,

06:34:17

you can see here that it's, you will see that in most cases QuickSort is marginally faster than

06:34:21

merge sort for larger lists and that's because it is not allocating new space, okay, so now coming

06:34:28

to the time complexity for QuickSort, assuming that we are able to have a good partition each time,

06:34:35

so each time we are dividing the list into roughly equal halves, the equal parts, like you start

06:34:42

with a list of size n and you partition it into n by 2 and n by 2, so this is what the

06:34:48

sub problem tree looks like, so you call QuickSort with 2 lists of n by 2 and n by 2, then you call

06:34:52

QuickSort with 4 list of size n by 4 and n by 4 and so on, now what is the activity that we're doing

06:34:58

inside QuickSort? In each QuickSort the core operation is partition, right, and that's what puts

06:35:03

one element, the pivot element into its right place, and then the elements smaller than it to the

06:35:08

left of it, the element's larger than it to the right of it, so the partition is where

06:35:15

the actual work, the comparison and the swapping happens and how many comparisons do we perform in the

06:35:19

partition? I would say that the number of comparisons is equal to the size of the actual list,

06:35:26

and you can see that here, you can see that we are going on comparing numbers like this, we're

06:35:31

comparing each number to the pivot, so each number gets compared to the pivot exactly once roughly

06:35:37

and that means that there are a total of n comparisons if n is the size of the list, okay.

06:35:48

So we have n comparison in partition, so partition performs n operations or partition is an order

06:35:53

n function, and what is the height of the tree? Once again, the height of the tree is log n because

06:35:59

to go from n to 1, it takes log n steps, you keep going n by 2 and my 4 and my 8, and so on

06:36:06

n by 2 to the power log n becomes n by n 1, and so the time complexity of quick sort is n login,

06:36:18

if you're able to partition the array into roughly equal parts, and that is what happens on average

06:36:23

if you're picking random pivot each time, then you do end up with roughly equal parts,

06:36:28

maybe 75, 75, 75, 75, but that's still more or less in the same range, so the quick sort

06:36:37

complexity is about n login, and this is called the average case complexity. On the other hand,

06:36:43

if you have a really bad partition and a really bad partition is maybe you pick the smallest

06:36:48

element as the pivot, now if you pick the smallest element as the pivot, then all the elements will

06:36:53

go to the right of the pivot, and you will end up calling quick sort on a problem of size n minus 1,

06:37:00

and then maybe once again if you pick this smallest element as pivot, all the elements will go to

06:37:05

the right of the pivot once again, and you will end up calling quick sort with a problem size of n minus 2.

06:37:10

Now this is an unbalanced T or a skewed tree, and what happens in a skewed tree is that the height

06:37:15

this time is the same as n, you can see n n minus 1 n minus 2 n minus 3, so going up to 1,

06:37:21

the height of the tree is n, but the amount of work involved in partitioning is the same because

06:37:26

you have to run through the entire list to partition the list, right? So, in this case the time

06:37:33

complexity is roughly n times n minus 1 by 2, so the time complexity is about order n square,

06:37:44

and that's bad because that's as bad as bubble sort, but despite the quadratic worst case time complexity,

06:37:54

quick sort is still preferred in many situations, now it really depends on what kind of algorithm

06:38:00

you need to use and what kind of memory constraints you have, because quick sorts complexity is

06:38:06

closer to n log in in practice, especially with a good strategy for picking up pivot, and a good

06:38:11

strategy is picking the random pivot, but there's another one called picking medium of

06:38:15

mediums, you can check that out as well, so that's n log in is the average time complexity of

06:38:21

quick sort, and then n square is the worst case time complexity of quick sort. Now here's an exercise

06:38:27

for you, verify that quick sort requires order 1 additional space, which means that it does not

06:38:32

really need to copy the array, we did create a copy, because we did not want to affect our

06:38:38

test cases, but we could have removed that line and quick sort would work just fine, so because

06:38:43

you do not need to create a copy of the list of the array, it requires order 1 additional space,

06:38:49

but because space complexity also includes often the size of the space required to store the

06:38:55

input, so you can say that quick sort has the space complexity of order n, okay, so if you get the

06:39:00

question about space complexity, you may want to ask, are you talking about the additional space

06:39:06

or do you also want to include the input in the space complexity? So that's quick sort,

06:39:15

and those are the two sorting algorithms we've looked at, so we've looked at bubble sort,

06:39:21

and we've looked at insertion sort, and then we optimized it using the dividing conquer and

06:39:27

got to merge sort, which is order n log in, but it also has a space complexity of the additional

06:39:34

space requirement of order n, which can be avoided using quick sort, which uses order 1 additional

06:39:41

space, but can have order n square complexity in the worst case time complexity, but with the

06:39:50

right choice of a pivot it is closer to n log in, so that's sorting, and you can see that Python

06:39:56

is such an expressive language that all these sorting algorithms which are often quite confusing

06:40:02

to implement in C++ or Java are actually pretty straightforward to implement in Python. All you

06:40:07

need to do is follow the method, which is to state it first in plain English, have some test cases

06:40:14

ready to test your function, and then write your code carefully, checking each line for errors,

06:40:20

and create small functions wherever you need to, so try not to have too much logic in one function,

06:40:25

a good rule of thumb is about 7 to 8 lines of code per function, no bigger than that,

06:40:30

and that's not just for toy problems, but that's also even as a software developer,

06:40:34

something that you can try to follow, just have 7-8 lines of code in any function,

06:40:40

if you have more than that, try to split it into two functions, and this way it's very difficult

06:40:47

for you to go wrong. So now let's return to our original problem statement, and let's

06:40:52

read it once again, you're working on a new feature on Jovian called top notebook of the week,

06:40:57

top top notebooks of the week, and write a function to sort a list of notebooks in decreasing

06:41:02

order of likes. Now keep in mind that up to millions of notebooks can be created every week,

06:41:06

you want to build this first scale, so your function needs to be as efficient as possible.

06:41:11

So first we need to sort objects this time and not just numbers, and second we also want to sort

06:41:17

them in the decreasing order of likes for each notebook, okay? So all we need to do is to use our

06:41:25

merge sort or quick sort techniques that we've already discussed, is to define a custom

06:41:32

comparison function to compare two notebooks, okay? But before we do that we let's create a

06:41:37

class that can capture some basic information about notebooks, so here we have the class.

06:41:41

So we're still following the methods or to speak, right? The step one was to come up with

06:41:45

the input and the output format, so here is the input format, our input format would be using this

06:41:51

class, so we create creating a class notebook, which is title, username and likes.

06:41:57

So we create the class and that gets stored as property, titles, username and likes, and then we

06:42:02

also have a string representation here, then create some test cases, so now we are creating

06:42:07

some test cases here, so we are creating some test cases in NB0 to NB9, and let's put them all

06:42:15

into a list, and you can see here that we now have a list of notebooks, NB0 to NB9, and you can see

06:42:23

that because we have a string representation, we can see that the first notebook is this

06:42:27

a caution S slash pi torch basics, and it has 373 likes, and the second one is this, and it has

06:42:32

532 likes, and these are clearly out of order in terms of likes. Next we will define a custom

06:42:38

comparison function for comparing the two notebooks. What it will do is it will return the strings

06:42:45

lesser equal or greater to establish the order between the two objects, so it should return

06:42:52

lesser when NB1 should come at a position or index lesser than the position of NB2 in a sorted list.

06:43:05

So in in case of our problem, what that means is we want to sort things in the decreasing order

06:43:11

so the first notebook should have the highest number of likes, and then maybe the second notebook

06:43:15

should have the second highest number of likes, and the third notebook will have a lower number of

06:43:18

likes and so on. So if you have two notebooks, NB1 and NB2, and if NB1 dot likes is greater than

06:43:25

NB2 dot likes, so then NB1 should come at a lesser index, okay, so we will return lesser because

06:43:31

it should come at a lower position in the sorted list, so we return lesser, because we want to

06:43:39

decreasing order, and if NB1 dot likes is equal to NB2 dot likes, then we return equal, and if NB1

06:43:45

dot likes is less than NB2 dot likes, so that means this is not, NB2 is the more like notebook,

06:43:52

NB1 is the less like notebook, then NB1 should actually come at a greater position, so we will

06:43:59

return greater, okay, so this comparison function should return whether the first input to it,

06:44:08

should come up, should show up at a lesser position in the sorted list, compared to the second

06:44:13

input. Now in languages like C++ in Java, normally the convention is to return a negative number

06:44:20

zero or positive number, but I find that Python allows you to return strings, strings are first

06:44:26

class, it is in Python, and it is a lot clearer when you are debugging things, when you face

06:44:31

issues to look at actual strings, and it is also easier to write code, so I prefer using strings,

06:44:39

but you can also use, you can also use numbers like negative zero or positive, that is totally

06:44:45

up to you, so now here is an implementation of merge sort which accepts a custom comparison function,

06:44:51

so let us see the merge sort function, so the merge sort function uses, it takes a list of objects

06:44:58

this time, not a list of numbers, and it also takes a compare function, which by default,

06:45:03

we also provide a default comparison, so that we can still use it with numbers. Now with numbers

06:45:08

and default assumption is if you want sorting, you want sorting in increasing orders, so this is

06:45:13

what the default sorting looks like for numbers, so that is pretty straightforward, but you can

06:45:19

also pass a custom comparison function, so here we have the dominating condition, if the length

06:45:24

is less than 2, then we simply return the list, then we get the mid index, and then we call

06:45:29

merge sort on the left half with the custom comparison function, we call merge sort on the right

06:45:34

half with the custom comparison function, and we call merge with the custom comparison function.

06:45:40

Now what happens is that merge inside merge, earlier once again we have these two halves left

06:45:46

and right, and then we have the custom comparison function, so we create pointers for the two of

06:45:51

them, and then we also create the final result list, which is merged, and then we iterate over

06:45:56

the left list and the right list, so while we are going through these, we compare the left

06:46:02

element and the right element, so now we are calling compare, now we are not doing the

06:46:06

greater than less than comparison, we are calling compare, and if the result, if the element on the

06:46:10

left is lesser or equal to the element on the right, then we append it to the result array and we

06:46:18

increment the left counter, otherwise, so lesser or equal means that the element on the left,

06:46:25

the first element on the left should show up at a lower position in the sorted final sorted list,

06:46:31

so that is why we append it first, otherwise we append the right child, the right element,

06:46:36

and we increment the right pointer, and finally we attach any remaining elements here, so this is

06:46:40

something that you can review, something we covered in a lot of detail, so now let's see,

06:46:45

let's call Mozart on our notebooks, and let's check if the notebooks are sorted by

06:46:50

likes and indeed they are, you can see that at position 0 you have the notebook with the highest

06:46:54

number of likes, and then you have the next one and the next one and so on, now since we have written

06:47:00

a generic merge sort function, that works with any compare function, we can now very quickly use it

06:47:05

to sort the notebooks by title as well, or if we had maybe the number of views per notebook or the

06:47:10

number of versions in each notebook or the number of comments on each notebook, we

06:47:15

could do that sorting as well, so we could even use a hybrid of those, so here the example

06:47:20

we're taking is comparing by titles, so here we have NB1 and NB2, and simple comparison strings

06:47:26

can also be compared using the comparison operators, so if NB1 or title is less than NB2 or

06:47:31

title then we return lesser, otherwise we return equal or greater, and with this we should be

06:47:38

able to sort them in the ascending order of titles, you can see AN, CI, CI, F E, L I, L O, P Y, P Y,

06:47:47

P Y T, H, P Y T, H, P Y T, O, P I, Torch, okay, so this is now order sorted in the order of titles

06:47:56

and exercise for you is to sort in the order of username slash title, which means you first

06:48:00

compare the username and if the usenames are equal then compare the titles, so you can compare

06:48:07

you can probably write another comparison function, compare usenames and titles and use that

06:48:12

to do that two level comparison and use that for sorting, okay. Now another exercise for you

06:48:19

going forward is to implement and test the generic versions of bubble sort, insertion sort

06:48:26

and quick sort using these empty cells that are given here, right. So now at this point in the

06:48:32

course you should start writing code, you should be writing maybe solving one problem every day

06:48:39

to really practice the concepts and internalize them and while you're doing that you can also

06:48:45

any problem that you work on, any notebook that you create, you can save it to joven.com it

06:48:52

and I'll show you also how to create new notebooks. So one way to create new notebooks is to go to

06:48:57

joven.ie, click the new button and click blank notebook and you can give it a title let's say

06:49:03

you are doing quick sort generic and you can set up privacy and create a notebook and that creates

06:49:12

a notebook for you and then you can click the run button and run it so that's one way to do it

06:49:17

and another way you can do it is we've given you a problem solving templates so if you come back

06:49:22

to the lesson page you will find a problem solving template here. Now you can click on the problem

06:49:28

solving template and click duplicate to create a copy of this notebook in your profile. So let's do that

06:49:37

and now this is on your profile so you can now click run and then run it on binder or you can

06:49:42

even run it locally on your computer and make some changes to it and come back and run joven.com it

06:49:49

and you will end up with a link that you can share so now you can now go on Twitter and you can

06:49:56

just share this link so write out a tweet and tag us and also use the hashtag 60 Days of Python

06:50:09

okay and maybe say this is your quick sort algorithm for generic objects

06:50:19

and tweet it out and we will retweet your tweet so we want to support everybody

06:50:25

who's taking part in this course on the course page you will find a link to the course community forum

06:50:29

which is where you can go and ask questions where if you have questions about any of these

06:50:33

and you can even discuss some of the ideas that are discussed here some of the exercises that are

06:50:37

shared so you can go into lesson three for instance and create a new topic maybe you want to talk

06:50:44

about the generic implementation of quick sort so maybe you can create a new topic and post a

06:50:49

if you're not able to make it work post your notebook there and ask a question have a discussion

06:50:56

and if you are helping other people out if you're answering other people's questions

06:51:01

and you've written some really great posts there are links to some more problems that have

06:51:05

been shared here so you can check out these links on each of these links you can try out

06:51:11

problems you can make submissions you can solve these problems some of these are interview

06:51:16

questions as well you can check if your results are correct and you can use this solving problem

06:51:22

solving template as a starting point as we've just shared so there is a start and notebook with

06:51:30

each assignment and in the assignment all you need to do is run the notebook so you can run it

06:51:36

on binder for instance and then there is a question mark in a bunch of places you will find like

06:51:42

question marks here in the text and you'll find question marks here in the code so you simply

06:51:48

need to put in your code your answers into the question marks so replace that with your code you can

06:51:54

see here there are some question marks here so you replace that and step by step there are

06:51:59

instructions to guide you there is there are comments to guide you so step by step you can solve it

06:52:05

and then finally you can also make a submission so write it the very end when you run the code

06:52:09

you will also be able to submit directly and when you make a submission then the assignment will get

06:52:15

automated will get evaluated in an automated fashion instantly and you will get a pass or a

06:52:20

feel grade now if you get a pass grade that's great but if you get a feel grade then you will also

06:52:24

get some comments about what went wrong in your solution so you can use those comments to fix

06:52:31

the issues so it's a great way to get quick feedback and keep fixing your issues and especially

06:52:38

watch out for edge cases so that's assignment one and then assignment two is got hash tables in

06:52:43

python dictionaries a very interesting assignment where you are going to implement hash tables

06:52:49

which power python dictionaries from scratch in python and you will also replicate the

06:52:53

interface of python dictionaries so do check it out a very interesting assignment again very similar format

06:53:02

you will find question marks in certain places you need to replace them with appropriate

06:53:06

values expressions or statements and in this way by working through each of these step by step

06:53:15

you can see here by working through each of these you will implement hash functions and hash

06:53:19

tables which again are very commonly asked in interviews as well so this is an important assignment

06:53:25

for from an interview preparation or coding assessment preparation as well and it also teaches

06:53:31

you a lot of really good practices in python programming in particular so do check out assignment

06:53:38

to as well and we will send you an email as soon as assignment three is ready but you can check back

06:53:42

in a couple of days and you should see it on the same page pythondsa.com so what do you do next

06:53:49

review the lecture video and execute the Jupiter notebook use the interactive nature of Jupiter to

06:53:56

experiment with the code complete the assignment and attempt the optional questions as well so each

06:54:02

assignment has some required questions and you can make a submission as soon as you're done with

06:54:05

the required questions but there are some optional questions which are a slightly harder but I highly

06:54:10

recommend doing that because they will improve your understanding give you more practice help you

06:54:15

internalize the concepts better and then participate in forum discussions and join or start a

06:54:24

so this is a great way to learn get together with some friends maybe watch the lecture together

06:54:28

over a zoom call pause the video have discussions wherever you have doubts discussion is a great way

06:54:34

to solve the specific doubts that you may have and it will also help you to articulate your

06:54:41

understanding better because when you explain to others you also answer a lot of your own questions

06:54:47

so please do that this is data structures and algorithms in python thank you and good day

06:54:56

good night hello and welcome to data structures and algorithms in python this is a live online

06:55:03

certification course being organized by Jovian today we are on lesson 4 recursion memorization and

06:55:13

dynamic programming my name is akash and i'm your instructor you can find me on twitter on at akash

06:55:21

and as if you follow along with this course and complete the weekly assignments you can also earn

06:55:29

a certificate of accomplishment which you can add to your LinkedIn profile and you will find hosted

06:55:35

on your Jovian profile as well so let's get started now to the data structures and

06:55:43

algorithms course this is python dsa dot com is the course website and on the course website you

06:55:49

will be able to find all the information about the course so you can view the previous lessons lessons

06:55:56

one two and three and you can also view the previous assignments assignments one and two

06:56:00

today we are on lesson 4 so let's open up lesson 4 the topic is recursion and dynamic programming

06:56:07

you can find a recording of the lesson here and you can also watch a version in Hindi if you

06:56:14

would prefer that in this lecture we will cover a recursion memorization and dynamic programming

06:56:19

by looking at two common problems in dynamic programming the longest common subsequent problem

06:56:24

and then nap sack problem and we'll do this by coding these problems live using the problem

06:56:32

solving template that we have been using one in one way or another since lesson one so let's open

06:56:39

up the problem solving template this is a template that you can use to solve any coding problem

06:56:47

and we will illustrate this by solving two problems using this template today so the first thing

06:56:54

we need to do is to run this template you can see that there is some explanation and then there

06:57:00

is some code here as well how to run this code you have two options you can run it using

06:57:04

free online resources or you can run it on your computer the simplest way to run it is click the

06:57:10

run button here and select run on binder and we just one click this will set up a machine on the

06:57:21

cloud for you start a Jupiter notebook server and you will be able to then

06:57:31

execute the code and modify the notebook and save a version of it to your own profile so that you can

06:57:37

continue working on it so there we have it now we have a running Jupiter hub server

06:57:46

I'm just going to zoom in here a bit so that you can see things clearly

06:57:50

okay so this is the problem solving template and I said we're working on two problems

06:57:55

so I have some problem statements listed out here you can see the first problem longest

06:57:59

common sub sequence is listed here and this is a part of the lesson notebook lesson pages

06:58:08

as well so you will find link to this problem statement on the lesson page two so let's first

06:58:14

modify the title of this notebook problem solving template let's change this title to

06:58:21

dynamic programming longest common sub sequence let's get rid of this I don't think we need this

06:58:34

then I'm going to keep this section on how to run your code so that if I share this notebook

06:58:42

with somebody else they have a way to run it and then before we start the assignment or the

06:58:50

problem let's just save this to our own profile so I'm just going to give it a name

06:58:58

longest common sub sequence this is an appropriate name for it so I'm going to give it this

06:59:06

a project name install the jovian python library and just run jovian.com it now what this will do

06:59:14

is we started out with a template and now we're editing the template by running jovian.com

06:59:19

it we've saved a copy of the template to our own profile you can see this is the link where you will

06:59:24

be able to access this notebook and you can run it and continue your work if this Jupyter notebook

06:59:30

shuts down if you want to continue tomorrow for instance okay so now let's look at the problem statement

06:59:38

now I'll just copy over the problem statement here as well so that we can see it directly within the

06:59:43

notebook there we have it now you can paste the problem statement and if you are getting this

06:59:55

problem statement from some other source then it's always a good idea to include the link to

06:59:58

the original source as well okay now we have a problem statement in front of us so the question is

07:00:10

write a function to find the length of the longest common sub sequence so that's a new term

07:00:16

we'll unpack that between two sequences now let's first learn what we mean by a sequence now a

07:00:22

sequence is a group of items with a deterministic ordering for instance a list a tuple

07:00:29

range or even a string these are some common sequence types in Python so here I have this

07:00:33

string set in dipitis this is a group of items and this also contains an order you can see that

07:00:39

e comes after s and r comes after e and so on so this is a sequence a list would also be a

07:00:45

sequence so that would be a list of numbers that's a sequence then we're looking at

07:00:50

sub sequence what is the sub sequence now a sub sequence is a sequence that is obtained

07:00:54

by deleting or removing 0 or more elements from another sequence for instance if you look at

07:01:01

serendipitous and if we remove the characters s r e n i i o u s then you will be left with e d

07:01:14

so e d p t is a sub sequence of serendipitous now two things to note here e d p t does not

07:01:19

have to occur continuously so these elements can occur anywhere within sequence but the order should

07:01:26

be the same so e d p t occur in this particular order here and e d p t should occur in the same order

07:01:32

here so d should occur after e and p should occur after d and t should occur after p so those are the

07:01:39

two requirements for e d p t to be a sub sequence of serendipitous and visually speaking what we can

07:01:46

see is if you take a sub sequence and then you draw boxes around some of these characters or

07:01:52

some of these elements of the sequence and if you just take the elements in the boxes then

07:01:56

in the same order then you end up with a sub sequence so now we understand what a sequence is

07:02:02

and what a sub sequence is and once again if this is this question is asked in an interview

07:02:07

and you're not sure what you mean by longest common sub sequence and even what a sequence is then

07:02:12

you should ask the interviewer what do you mean by a sub sequence or what do you mean by sequence

07:02:17

and they'll be more than happy to tell you it's very important to communicate

07:02:21

whatever you're thinking whatever questions you have contrary to what your might think asking

07:02:26

questions is actually a good thing the more questions you ask the more it is appreciated

07:02:31

okay so now we've talked about a sequence as and a sub sequence now what's common sub sequence

07:02:38

so let look at these two strings serendipitous and precipitation now if you pick just these

07:02:44

elements that are in the boxes R E I P I T O now you can see that R E I P I T O is a sub sequence

07:02:54

of serendipitous and R E I P I T O is also a sub sequence of precipitation

07:03:01

so a sub sequence which is common which is a sub sequence of both sequences is called a common

07:03:09

sub sequence so R E I P I T O is a common sub sequence between serendipitous and precipitation

07:03:15

now you can have many common sub sequences for instance we could just look at R E and R E here

07:03:21

and R E would be a common sub sequence too or you could just look at

07:03:28

I T and I T and that would be a common sub sequence as well or we've not picked N here but

07:03:34

you could also pick R E N and R E N and that would also be a common sub sequence between the two

07:03:42

now the longest common sub sequence as the name suggests is the sub sequence which between

07:03:48

the common sub sequence between the two sequences which has the maximum possible length and you

07:03:54

can verify this you can try different sub sequences and see that R E I P I T O is the longest

07:03:59

common sub sequence between these two strings these two sequences and its length is 7

07:04:08

1 2 3 4 5 6 7 so you have to write a function to find the length of the longest common sub

07:04:15

sequence between two sequences so that's a question and this isn't visual example that tells you

07:04:22

the answer okay it's now that we have the question of we've understood the question

07:04:28

we can start applying the method that we have been learning throughout so this is the systematic

07:04:33

strategy that we will apply and nothing about this method has changed since the first lesson

07:04:38

even though we've covered a whole variety of topics like binary search and binary search trees

07:04:42

and then sorting algorithms and divide and conquer this method has remained the same the first

07:04:48

step is to state the problem clearly and identify the input and output formats then the second

07:04:54

step is to come up with some example inputs and outputs and these will be used to

07:05:00

test our solution so we should try and cover all the edge cases and that will help us write code

07:05:06

that is correct anticipating all the errors that we might face then we come up with a correct

07:05:13

solution to the problem and state it in plain English very important for you to state the

07:05:19

problem in plain English before you start coding so that you communicate your ideas and you also

07:05:24

make it clear once you express yourself then you implement the solution and test it using

07:05:29

example inputs and you fix bugs if you find any of them and you will be able to find bugs

07:05:34

if you have written good test cases then you analyze the algorithms complexity and identify

07:05:39

inefficiencies if you have any and most likely the first solution that you come up with it doesn't

07:05:45

have to be optimal it just has to be correct so there will be some inefficiency but it's important

07:05:50

to go through that process of first finding a brute force solution and then finding the inefficiency

07:05:56

and then apply the right technique to overcome the inefficiency and repeat steps to 3 to 6 so

07:06:01

you identify what's the right technique and in this case we will learn a couple of techniques

07:06:05

called memoization and dynamic programming and then we go back and state the correct solution again

07:06:10

then we implement the solution and test it and then we analyze it again and if there's further

07:06:14

scope for improvement we do that otherwise we say that we've arrived at the optimal or good enough

07:06:20

optimal enough solution okay I hope by this point this you've started to memorize this process

07:06:27

and that's why we keep repeating it over and over that it should become second nature every time

07:06:32

you see a problem so the first thing is to state the problem clearly and identify the input and

07:06:37

output formats now the problem is already stated clearly enough but let's just state it slightly more

07:06:44

clearly so let's say we are given and just write it in your own words that's more important

07:06:52

watch whatever is clear to you so we are given two sequences and we need to find

07:06:59

the length of the longest common sub sequence between them

07:07:07

simple enough then we have two inputs now we decide to input an output formats

07:07:13

so we have sequence one a sequence example

07:07:23

serendipitous sequence two another sequence example

07:07:34

press shape rotation

07:07:42

great and this these are the only two inputs that we require and the output would be the

07:07:53

length of the longest common sub sequence let's just abbreviate that as LCS

07:07:58

which in this case is 7 and we know what that sub sequence looks like we've just seen it above

07:08:17

so now based on this we can now create and you can see the problem is now created

07:08:22

before I talk about the next thing you if you double click on a textil you can start editing it

07:08:27

and here we are using a language called markdown so you can see this creates a blockcode this

07:08:32

creates a bold font and this creates a code like font so let's see here no and the way to

07:08:40

go back into the display mode is to press shift plus enter now you can see here that now we have the

07:08:46

problem we have the blockcode and then we have all this styling so markdown is a really useful and

07:08:50

easy to learn language for formatting your text especially in Jupyter notebooks to do learn it

07:08:56

but now based on this we can now create a signature of a function so our function lns will accept

07:09:05

a sequence sequence one as sequence two and it will return something so that's the basic signature

07:09:12

of a function and even though it's not doing much just establishing what the arguments are is the

07:09:17

first step towards solving a problem and let's just save our work from time to time it's very

07:09:23

important to keep saving your work on Joven because this is running on a free online service so

07:09:29

this will shut down after some minutes of inactivity so just run Joven.com it and that will save

07:09:35

the notebook to your profile and you can read on it okay so now the next step is to come up with

07:09:40

some example inputs and outputs and here we need to try and cover all the edge cases so

07:09:47

I have written out a few test cases here already. Now the most common cases is a general case of a

07:09:54

string like we had serendipitous and precipitation that's a common case there is one of them

07:10:01

both of them have some common elements and there's a subsequent common subsequence of length 7

07:10:07

but we may also want to test out another type of data and this is one of the nice things about

07:10:12

Python where you can write functions that operate not just on a particular class and it's subclasses

07:10:17

but on any kind of data as long as it satisfies certain criteria for instance strings and

07:10:23

list both allow indexing into them and picking out the ith element or the nth element from

07:10:29

the sequence so the both sequences so our function should be able to work with both strings and with lists.

07:10:35

Then here is another case where we have two sequences and they have no common

07:10:42

a function should not throw an error here it should gracefully return the number zero because

07:10:48

the empty sequence is a subsequence of every other sequence. Does that make sense? Think about it.

07:10:54

So in that case if there's no common subsequence then the empty sequence is the common

07:10:59

subsequence of the answer is zero and here's one another extreme case where one is a subsequence

07:11:04

of the other. Here's another case where one sequence is empty. Here's another case where both

07:11:10

sequences are empty. All of these are important otherwise you might miss out certain special

07:11:15

cases and you will face an error when you code your solution. Finally you can also have this case where

07:11:21

you have multiple subsequences with the same length for instance if you have a b c d e f and b a d

07:11:26

c f e and a c e a c e is one long subsequence of length three and that's the longest you can verify

07:11:35

and b d f is another subsequence which is common to the two and also has the same length.

07:11:40

Those are some test cases. Now let's copy over these test cases here in an interview or a

07:11:46

coding assessment what you might want to do is just write these as comments if you have just a

07:11:50

single coding screen and try to list at least four or five but go as far as you can because this

07:11:57

will also help you streamline your own solution and it's always something that is appreciated by

07:12:03

interviews. Let's do that. Let's get let's copy over these test cases here and you can think

07:12:14

of more so if you have some more ideas of things you should test come up with them there's no

07:12:20

right number of tests whatever it takes for you to feel confident is what you need to do.

07:12:29

Okay so now what we've done is we've taken these test cases and converted them into dictionaries.

07:12:36

So you can see here we have this first sequence sequence one and remember that's why we

07:12:40

written out that's why we've written out here the names of the inputs and the signature of the

07:12:45

function. Now we can create test cases as dictionaries so that we can test them all easily

07:12:52

at once. So we have the sequence one and sequence two in the input sub dictionary inside the main

07:12:58

test case dictionary and then we have the output which is the output of the function which should be

07:13:02

seven and this you can verify so this is a general case then we have another case in this case

07:13:08

we have two sequences these are both lists of numbers and in this case the output that we expect is

07:13:16

five and we have another general case longest and stone in this case you can verify that oh and

07:13:21

E is the common sub sequence it has the output three then here we have two sequences which do

07:13:28

not have any common elements all these come from the left half of the keyboard all these come from

07:13:33

the right half of the keyboard so that was a quick way to generate these two sequences.

07:13:40

Then here we have dense and condensed and you can see that dense is actually a piece inside

07:13:47

so this is a special case where dense is a continuous substring of this string but it

07:13:54

even if we had DEC that would still be a sub sequence because DEC occur in this order so that's one

07:14:01

example and in this case the sequence one is itself the longest common sub sequence and it has length 5

07:14:08

then we have this case where one of the sequences is empty and you can see in that case the output should

07:14:15

in both sequences are empty and here is the case where you can have multiple longest common

07:14:20

sub sequences and even in this case your function should be able to figure out the answer correctly

07:14:26

so let's take this and let us copy over these test cases here so we have T0 to T7 that's A test

07:14:36

cases and you can add more test cases here please feel free coming up with good test cases is a

07:14:48

scale that you should develop and what we'll do is we'll also put all these test cases into this

07:14:56

function called LCS or longest common sub sequence tests so that we have all of them easily available

07:15:03

for testing at once okay okay now next step is to come up with a correct solution for the problem

07:15:13

now we've seen the problem we have identified some scenarios now we need to come up with a simple

07:15:18

correct solution and stated in plain English it doesn't have to be efficient it just has to be correct

07:15:25

so here's one idea here you can see we have a couple of sequences let's create two counters

07:15:36

IDX1 and IDX2 both starting at zero so IDX1 will be a pointer which will start tracking

07:15:47

elements in the first sequence and IDX2 will be a pointer which will start tracking elements in the

07:15:52

second sequence and what we do is we will write a recursive function so we write a recursive function

07:16:00

which will compute the LCS of sequence one from IDX to the IDX1 to the end and sequence two

07:16:09

from IDX2 to the end so what does that mean let's say IDX1 has the value 3 and IDX2 has the value

07:16:19

one so you can see zero one two three so sequence one IDX1 onwards is L O G Y and sequence two

07:16:28

IDX2 onwards is L CH M E M Y so we're looking at this portion of the problem and this portion

07:16:38

of the problem and a recursive function when involved with IDX1 and IDX2 should return the length

07:16:47

of the longest common sub sequence between these two portions so L O G Y and L CH E M Y now

07:16:53

why we doing this we need this longest common sub sequence for the entire string don't we

07:17:00

now here's the logic why we're writing this recursive function which can

07:17:05

theoretically compute this sub sequence for from any position onwards so here's how we do this

07:17:12

if sequence one of IDX1 so if IDX1 was pointing to L and IDX2 was pointing to L here as well

07:17:23

if sequence one of IDX1 and sequence two of IDX2 are equal then this character L belongs to

07:17:30

the L CS of this portion and this portion okay why think about it it makes sense because

07:17:40

these these elements are equal so if you pick the longest common sub sequence of this

07:17:46

and you pick the longest common sub sequence of the remaining then you can always add L to

07:17:51

both that sub sequence and that will make the sub sequence longer right and that way it follows

07:17:56

that L will always occur in the longest common sub sequence between L O G Y and L CH E M Y okay

07:18:06

so we know now that this will occur L will occur in the longest common sub sequence further

07:18:11

the length of this longest the length of this longest common sub sequence will be the length

07:18:17

of the longest common sub sequence between O G Y and CH E M Y plus one okay and now you can see

07:18:26

why a recursion is required because what we can now do is we can say that if sequence one of IDX1

07:18:32

and sequence two of IDX2 are equal then we simply call the recursive function on sequence one of

07:18:39

IDX1 plus one so O G Y and sequence two of IDX2 plus one CH E M Y and assume that recursion

07:18:47

will give us the solution there and simply add one to it because this is equal okay so that's one

07:18:53

if sequence one of IDX1 and sequence one of IDX2 are equal great but if they are not equal

07:19:00

right so for in in this case for instance you can see that if IDX1 and IDX2 are both zero

07:19:07

so IDX1 points to A and IDX2 points to B so if they are not equal then one of the two things should

07:19:15

hold either A does not occur in the longest common sub sequence between the two strings or B does

07:19:24

not occur in the longest common sub sequence between the two strings now we don't know which one

07:19:28

but that's the power of recursion that we can just try both so we can simply ignore A and we can

07:19:34

get the longest common sub sequence between B S E N T and B E S T and check it's length and then we

07:19:41

or we can simply ignore B and we can get the longest common sub sequence between A B S E N T and E S T

07:19:47

and check the length now whichever is longer in length that becomes the solution for the two strings okay

07:19:56

so this is what it looks like we start out with analogy and alchemy we compare A and A

07:20:02

are these two are equal so we know that the longest common sub sequences one the length is one plus

07:20:09

LCS of analogy and alchemy okay now we compare N and L and now we see that they're not equal

07:20:18

so either N does not come in the longest common sub sequence or L does not come in the longest common

07:20:23

sub sequence so we try both we remove N here you see ALO GUI and we remove L here we see CHENY

07:20:33

now once again A and L are unequal so either A does not occur in the LCS of these two strings

07:20:41

or L does not occur in the LCS of these two strings so if A does not occur in the LCS we can

07:20:46

remove A and try again if L does not occur in the LCS we can remove L and try again and here once again

07:20:54

we get a match so in this case we know that L occurs in the longest common sub sequence of these two

07:21:00

elements so now we can get the LCS of OGY and CHENY okay and then you know as these

07:21:07

recursive calls complete you can see that this entire tree pans out you can see that each time

07:21:12

you either get one child or you get two children and if you go all the way down and then you go

07:21:18

back up and simply count the number of matches for each path you will key and you take keep

07:21:23

taking the maximum so here you get back an answer let's say you get back an answer of size two

07:21:27

here you get back an answer of size one so the answer for this is simply the maximum of

07:21:32

two and one which is two and then the answer for this is simply the maximum of two and

07:21:37

let's say this is three then three and the answer for this is simply one plus three four okay

07:21:43

so this is the way that we will build up the solution so we've now looked at the recursive solution

07:21:49

expressed in text and we've looked at the recursive solution expressed as a tree now it's possible

07:21:56

that it still may not make sense to you how exactly this is working and that is where you should

07:22:01

start trying to create this tree yourself so pick up a pen and paper and then start drawing

07:22:07

on pen and paper take an example and try to read each step here and try to work it out like a computer

07:22:14

okay and just thinking about it that way will help you understand this algorithm now one last thing

07:22:21

is that if either of the sequence one from idx onwards or sequence two from idx onwards is

07:22:27

empty which means the index has reached the end point in after doing some recursion then their

07:22:33

LCS is empty so the length is zero okay so that is the recursive solution here I will just copy

07:22:41

over this recursive solution to along with the entire tree

07:22:52

now obviously in an interview you do not need to write all of this in a lot of detail or you do

07:22:58

not need to some it's helps to show diagrams sometimes but you don't really need to do all of this

07:23:04

all you need to do is express yourself clearly that we will create two counters and the condition to check

07:23:11

is whether these two elements at those counter positions are equal what do we do if they are equal

07:23:18

and why are we using recursion here so we are using recursion we can because we can use

07:23:23

reuse some of the sub problems to compute the final problem okay and understanding recursion is

07:23:30

really important for solving data structures in algorithms from is because it's like a super power

07:23:34

pretty much pretty much every problem that you see one way or another can does boil down to recursion

07:23:43

in one way okay so now let's save our work once again and now we're ready to implement the solution

07:23:50

so we have the recursive solution in front of us and if you remember the four steps let's go let's go

07:23:54

ahead and implement it so we see let's just call it LCS recursive and this will accept a sequence

07:24:04

one and a sequence two and let's also initialize IDX one and IDX two because we will be calling

07:24:13

this function recursively so we'll simply use these two counters IDX one and IDX two and set them to zero

07:24:22

now the first thing we need is if IDX one is equal to the length of sequence one or IDX two is equal

07:24:30

to the length of sequence two then we return zero again this is a common thing that happens that

07:24:38

the base case or the end scenario is something when you're describing the algorithm you will

07:24:42

describe it the very end as you're drawing the tree you will notice what the end case end scenario is

07:24:47

but when you're coding the algorithm the end scenario or the base case comes at the very top

07:24:52

because otherwise we'll try and access IDX one from sequence one and that will throw an error

07:24:58

so that's why you need to handle the base case at the very beginning okay next moving ahead

07:25:03

if sequence one of IDX one equals sequence two of IDX two

07:25:19

great we found a match so we simply return one plus now we can call LCS recursive

07:25:26

on sequence one sequence two and we increment IDX one by one and we also increment IDX two by one

07:25:40

both of these need to be incremented because we are going to use this element this common element

07:25:49

as an element in this up sequence okay so there's just one recursive call here that was nice

07:25:55

otherwise we have to either ignore the first element of or the current element from sequence one

07:26:02

or the current element from sequence two so we have two options so we have option one

07:26:07

which is we ignore the current element of sequence one so this becomes LCS recursive sequence one

07:26:15

sequence two IDX one plus one and IDX two and then we have option two this is LCS recursive

07:26:31

once again we sequence one and sequence two and this time we increment IDX two

07:26:38

okay so make sure you understand this piece because this is really the key here and then the length

07:26:47

of the longest common subsequent is simply the maximum of option one and option two okay and that's

07:26:54

it what may have seemed like a fairly tricky problem once you start thinking about it recursively okay

07:27:01

what happens if we simply compare the first two and they're equal and they're unequal okay now

07:27:06

we need to solve the problem for the remaining either we add one or we take or we ignore one

07:27:12

of the elements right once you get that thought the recursive thought then the solution and the

07:27:18

code simply presents itself to you it's just about seven lines of code okay that's our LCS recursive

07:27:26

solution now let's test it out let's look at a test case T0 okay so here we have serendipitous

07:27:34

and precipitation as the inputs let's call LCS let's keep that around so that we can view it later

07:27:45

let's call LCS recursive on T0 but of course we need to fetch from T0 the input and get sequence one

07:27:58

out of the input and similarly we need to get the input and get sequence two out of the input

07:28:13

you can see it takes it returns the value seven which is equal to the output by the way

07:28:19

so if we simply put in here T0 output and I'm also going to put in this special command

07:28:30

called percentage percentage time this is going to tell us how long the cell takes to execute

07:28:36

yeah so now you can see here that if we get back true and the cell takes 495 seconds or half a

07:28:42

second to execute and that's it so now we have tested this test case one small thing I can tell you how

07:28:53

to improve this slightly is because in T0 of input is a dictionary and because the names of the

07:29:01

elements of the dictionary are sequence one and sequence two which also match the argument names

07:29:06

of LCS recursive you can see here we have sequence one and sequence two what you can do is you can simply

07:29:11

say star star T0 input and Python will automatically grab each key so sequence one will be passed

07:29:18

as the argument sequence one and sequence two will be passed as the argument sequence two

07:29:23

that's this is a small trick here that helps us speed up the reduce the amount of code we need to

07:29:28

write okay now we've tested one test case with that's not enough we should be testing all the

07:29:34

test cases to test all the cases we can write a for loop for P in tests etc but we can do something else

07:29:47

too we can use the evaluate test cases function from joven so from joven not Python DSA the

07:29:55

module we will import evaluate test cases it's a helper function that we've created for you but

07:30:01

it's really simple to write you can just use a for loop as well and we call evaluate test cases

07:30:09

on the function that we want to test which is LCS recursive and the test that we have which is LCS

07:30:16

tests and when we do this it is going to try out each test case you can see it's try test case 0 that

07:30:22

was a pass it tried test case one and it's also printing out the input the expected output in the actual

07:30:27

output the test case one was lists and lists work too because all we've used here is indexing

07:30:37

and length and these are both things that are available in both strings and lists and this is

07:30:42

something that's very nice about Python the dynamic nature of the functions once again this work

07:30:48

perfectly fine then here we have another one longest in stone the expected output was 3 and the

07:30:52

actual output was 3 as well here we have ADS FEW AD and another string they have nothing in

07:30:59

common so they expected an actual output of both 0 here's one where one is the is already a

07:31:06

subsequence of another so the smaller one becomes the longest common subsequence and then we have

07:31:11

an empty string and then we have two empty strings and finally we have multiple longest common

07:31:16

subsequences we still get back the right output now if any of these failed you would know exactly

07:31:22

what went wrong for instance if you had an issue in this case where the two of these were empty

07:31:27

and that would tell you that you've probably not handled that empty case properly and that is why

07:31:31

having great test cases is very important okay and we can see the timing so these as well each of

07:31:37

these took about well 48 milliseconds was the highest now that's still a bit high I would say

07:31:44

48 milliseconds because we are just looking at sequence a serendipitous in precipitation which are

07:31:50

of very short length if you're looking at a really long sequence for instance this technique is

07:31:55

used for DNA sequencing and we were looking at two DNA strands or two DNA strings and trying

07:32:03

to get the common subsequence out of them and these can go into thousands of sometimes millions

07:32:08

of elements that would make it rather slow okay so we do want to improve this algorithm further

07:32:16

but let's do that and before that we can just commit our work once again but the first thing

07:32:24

before we improve the algorithm is to analyze its complexity how long does it really take okay and

07:32:29

identify any inefficiencies now to analyze the complexity let's look at an example and let's consider

07:32:37

the worst case now when does the worst case occur here we've seen that if two elements match then we simply

07:32:44

have one sub problem or one recursive call but if the two elements are two elements of the

07:32:50

sequences don't match then we have two recursive calls so if we have two completely distinct sequences

07:32:57

where none of the sequence none of the elements match then each time we will end up with two

07:33:02

sub problems so that becomes the worst case so the worst case occurs each time we have two sub problems

07:33:07

where the sequences have no common elements and here's an example this is a sequence of length 6

07:33:12

here's an sequence of length 8 and this is what the tree will look like so now we have no longer

07:33:18

put the actual sequences we've simply put what is the length of the string that we start out with

07:33:23

so here we start out with strings of length 6 and 8 and then we say that we either ignore the first

07:33:28

character of the first string or the first sequence or we ignore the first element of the second

07:33:32

sequence and that gives us two sub problems and this time the sequences have length 5 and 8 in this

07:33:39

case and 6 and 7 in this case okay so we either reduce one from the left or we reduce one from

07:33:44

the right and once again here we either reduce one from the left or we reduce one from the right

07:33:49

so this way we created tree and you can also see that a lot of common trees get created and that

07:33:54

really is what is the inefficiency and we'll talk about that but what will happen here is 5 7

07:34:00

will then call 4 7 and 5 6 and 5 7 here will once again call 4 7 and 5 6 and 4 7 and 4 7

07:34:06

and we'll get repeated here and 5 6 and 5 6 will get repeated 3 times here so there's a lot of

07:34:12

repeated calls that are going to occur and you can even see this here at the top you can see that

07:34:16

ALOGY the problem was called repeatedly so that's really a source of inefficiency but

07:34:24

now the question becomes that we know that all the leaf nodes will end at 0 0 that's when

07:34:30

the entire re ends so can you count the number of leaf nodes okay can you count the now if you

07:34:36

keep expanding the street completely expand each of these don't skip any of them

07:34:41

can you count the number of leaf nodes now if you count the number of leaf nodes we know that

07:34:44

in a binary tree the number of leaf nodes if the number of leaf nodes is L then the height of the

07:34:52

tree is if the number of leaf nodes is N then height of trees log N and based on that we can actually

07:34:59

determine the actual size of the tree as well so we know that to count the number of unique

07:35:07

parts from root to leaf we'll give us the number of leaves right so each time we have two choices

07:35:13

we either reduce from the left or we reduce from the right so to get to 0 0 we would have to

07:35:18

reduce all the elements from the left and we would have to reduce all the elements from the right

07:35:22

that means if you have strings or if you have strings of length or sequences of length M and N

07:35:29

then you would have to make M plus N choices in total right and you so each time you have M you

07:35:37

have to make M plus N choices each time you have to choose whether you want to reduce from the left

07:35:40

or from the right we have two choices and you have to make those two choices M plus N times that's

07:35:46

the right way to put it really so that means each time you compare you do two choices so you have

07:35:51

two multiplied by two multiplied by two multiplied by two and you keep multiplying that and you will

07:35:56

end up with 2 to the power of M plus N leaf nodes, okay. So here is an exercise for you,

07:36:03

draw the street on a piece of paper, mark out how the number of leaf nodes, how the length

07:36:10

of each part is M plus N, figure that out. And based on that, can you conclude that it takes

07:36:16

2 to the power of M plus N leaves to complete this tree. And if 2 to the power of M plus N is the

07:36:25

number of leaves, then the total number of elements is in the tree simply double of that.

07:36:29

Once again, this is something that is very easy to verify. You can check it here.

07:36:34

For instance, if you just consider these 2 levels, the if you have 2 leaves, then the total number

07:36:40

of elements in the tree is 2 plus 1 3, actually it's double minus 1. So 2 into 2 4 minus 1 3.

07:36:46

If you have 3 levels, you can see here that if you have 4 leaves, then the total number of elements in

07:36:51

the tree is 4 into 2 8 minus 1 7. And you can see that here. So it follows essentially that

07:36:57

we have an exponential number of sub problems. We are calling the recursive function in exponential

07:37:04

number of times. And inside the recursive function, we are doing inside the recursive function,

07:37:09

we are doing a constant time work. You can see here that there's no special work that we're doing

07:37:15

all we're doing is some comparison. And we're doing an addition, both of them are constant time.

07:37:20

So we make 2 to the power of m plus n, recursive calls inside each video constant work.

07:37:25

So the time complexity is order of 2 to the power of m plus n. That's a rough explanation.

07:37:31

We've not gone into a lot of depth because we've covered this over and over in

07:37:34

three lessons. But the exercise for you to is to verify how exactly it is 2 to the power of

07:37:40

m plus n. So that's our recursive solution. And we now know that the time

07:37:48

complexity is 2 to the power of m plus n. Let's just copy that over here.

07:38:04

And the inefficiency as we said in this algorithm is that we are calling the same problem.

07:38:09

We're calling the exact same problem. The LCS recursive function is called with idx equal

07:38:15

idx1 equal to 5 when idx2 equal to 7 and idx1 equal to 5 when idx2 equal to 7 the same time

07:38:21

twice. So each of these sub problems will be called twice. And then each of the sub problems with

07:38:26

them will be called twice. And of course some of these sub problems will once again get shared.

07:38:30

So there's a lot of repetition. Now there's a simple solution here which is simply to

07:38:36

remember some of these results. And this technique is called memoization and you may also just call it

07:38:43

memorization because you just remembering some of these things. But memorization is a technical term for

07:38:47

it. And we remember these solutions in our dictionary called memo. So what we're going to do

07:38:54

is we're going to follow the same recursive strategy. But this time we're going to maintain a dictionary

07:39:00

called memo. And we're going to track intermediate results within the dictionary. And if we find an

07:39:06

intermediate result already exists in the dictionary, then we will not compute it again. Okay? So let's

07:39:12

see. So now we write LCS memoized or let's just say LCS memo for short. It takes a sequence one

07:39:18

and it takes a sequence two. And this time we create this dictionary called memo. And then we

07:39:25

write a function inside it. So we will write a helper function or recursive helper function inside

07:39:30

the LCS memo function. So that it has access to sequence one and sequence two and we will simply

07:39:35

start it out with IDX1 as 0 and IDX2 as 0 as well. IDX1 will track the position and sequence one

07:39:44

IDX2 will track the position and sequence two. Now the first thing we do is create using the two

07:39:51

indices create a key. So we are going to create the key IDX1 comma IDX2. And if the key is present in the

07:39:59

memo. So this is the way to check if a key exists in a dictionary. Then we simply return memo of key.

07:40:05

Simple. The problem is solved. We don't have to solve this problem because it's already it's

07:40:10

already something that we've solved. If it isn't then we need to solve the problem and save it in

07:40:15

the memo. Now here we know that we can now write our same three recursive cases. Now if the

07:40:21

base case if IDX1 is equal to the length of sequence one or IDX2 is equal to the length of

07:40:30

sequence two then we simply set memo of key as 0 because by this point we have reached the end of

07:40:44

the strings there's nothing left for us to compare. LF IDX sequence one of IDX1 equals

07:40:56

sequence two of IDX2. So in this case this is the case where the current characters are equal.

07:41:06

So this is if we go up here and look at the tree once again. This is a case like this where the

07:41:11

current characters that we are pointing at are equal. So in that case we simply return we simply

07:41:17

get the result as one plus the result for the remaining with the first character removed.

07:41:26

So in this case we simply set memo of key to one plus we call the recursive function again

07:41:37

recurs IDX1 plus one and IDX2 plus one. Great. L's so this is a case where the two elements are

07:41:45

not equal and this is where we have two options. I'm not going to write the two options separately.

07:41:49

Let's just do a max directly here. Max and we say recurs with IDX1 plus one comma IDX2

07:42:02

and recurs with IDX1 comma IDX2 plus one. Okay and finally from the recurs function we return

07:42:12

memo of key. So we have whichever case it is we have computed the result and saved it in the memo.

07:42:19

So this time these computations will not get repeated again and again

07:42:28

and let us now return recurs of 0 comma 0 because 0 comma 0 is the entire string

07:42:36

and that's it. And this is the common strategy that you should apply whenever you come up with

07:42:45

a recursive solution and you see the inefficiency coming because of the same problem being called

07:42:50

again and again. This is where you need to apply this technique called memoization. Right and in

07:42:56

this technique you will then be able to simply store intermediate result. So it's really simple.

07:43:04

You just created dictionary and then you add one or two lines of code here and you make sure to

07:43:09

save the result in that dictionary whenever you compute the result the next time you don't have to

07:43:14

compute it. Okay and we can test it out we can test out with all the test cases

07:43:20

evaluate the test cases. So LCS memo and LCS tests and you can see that all the test cases pass.

07:43:32

Now not only do all the test cases pass you can see that the time taken is now lower.

07:43:45

Okay so that's nice the time taken is now lower. Now we went from 415 milliseconds

07:43:51

if we just go up here you can see that it took 480 milliseconds for the for finding the longest

07:44:00

common sub sequence between precipitation and serendipitous but in this case it only took about

07:44:08

0.234 which is 0.2 milliseconds. So it is 2,000 times faster even for strings of length 7 or 8

07:44:16

and that's a huge boost. Let's analyze the complexity here. Let's look at the complexity. Now a

07:44:25

quick and easy way to find the complexity of the solution is to see where the computation how many

07:44:31

times the computation can occur. Now this is where the bulk of the computation is occurring in

07:44:35

a recursive call and this computation is avoided if we already have something in the memo.

07:44:44

Okay so that means that the only number of computations that we need to do is equal to the maximum

07:44:52

number of elements that can end up in the memo. Now what are the keys in the memo look like the

07:44:57

keys in the memo look like IDX1 and IDX2. Great and what values can these take? IDX1 can take

07:45:06

0 to M values if M is the length of sequence 1 let's say. And IDX2 can take 0 to N values if N is a

07:45:15

sequence length of sequence 2. So in total the possible number of keys is M times N.

07:45:24

The possible number of keys is M times N. The possible number of things that you need to store in

07:45:28

the memo is M times N and for each of them you do constant work and then the next time you try to

07:45:33

accesses you do not need to do the work you do not need to call in a recursion you can simply

07:45:37

access the memoization. So what that tells us is the complexity of this case and in

07:45:45

any memoization case in general is equal to the number of keys which in this case is M times N. So

07:45:52

the time complexity here is order of M times N. So we've gone from 2 to the power of M plus N which

07:46:00

if M plus N was equal to 30 would be 1 billion 2 M times order of M times N. So let's say

07:46:09

both strings were 15 and 15 so that would just be 225 operations. So we've gone from 1 billion

07:46:15

operations to 225 operations simply by storing intermediate results and so very powerful technique

07:46:23

there we apply all the time. So now you can see here that the first time 5 7 is computed

07:46:28

the next time 5 7 does not need to be computed again and that's why this tree here is actually marked

07:46:34

this is the tree for memoization. The first time 4 7 is computed it never needs to be

07:46:39

computed again. So this entire tree of computation gets eliminated and similarly this entire

07:46:43

tree of computation gets eliminated. We are eliminating from 1 billion computations almost

07:46:51

all except 225 computation. So we're left with practically nothing and that speeds up your

07:46:56

algorithm by a huge huge factor. So that's memoization and as he said it's really easy to compute

07:47:04

the time complexity of memoization just simply count the number of keys and then just track

07:47:12

how much work do you need to compute each key assuming that you already have the recursive solutions

07:47:18

for the remaining. So how much work do you need to compute each key using some other existing

07:47:24

solutions. Now in this case that was constant because all we needed to do was compare and add.

07:47:34

Okay and I'll let you write here a simple optimized plain English explanation of memoization.

07:47:42

It's worth a it's a good exercise to try out but what we will also look at is another technique

07:47:47

called dynamic programming. Now the downside with memoization is that it requires recursive calls.

07:47:54

And while it's not a problem for small cases when you have really large problems,

07:48:00

a recursion has an overhead and the overhead for a question. If you see this way is that for this

07:48:06

function execution to complete, you need this function execution to complete and this to complete.

07:48:10

And for this to complete you need this to complete and this to complete right. So the idea here is that

07:48:15

for each new recursive call takes more space in the memory and it also takes longer because now we

07:48:23

have to allocate some memory and then set up that function stack, the function stack for the execution

07:48:29

of that function. So if you have a large tree then you're creating hundreds, thousands of

07:48:35

possibly millions of open functions all of which have their own memory and that can eat up a lot

07:48:40

of memory and sometimes that can also take up a take longer time. So the solution to replace recursion

07:48:48

is iteration. And how do we do that? We do that using a technique called dynamic programming. So

07:48:54

we'll do almost the same thing. There are a few changes here. Instead of using a dictionary

07:49:01

to track intermediate results, we will create a matrix because we know that

07:49:06

sequence one, the idx one can go from 0 to n, or 0 to n1. Let's say where n1 is the length of

07:49:13

sequence one and sequence idx two can go from 0 to n2, where c, n2 is the length of sequence two.

07:49:20

And what we can do is we can use a for loop or a couple of for loops to fill out all these

07:49:24

sub problems without having to require a recursion. And this is how we'll do it.

07:49:32

So let's say these are the two strings that we're working with. This is string one, a, c,

07:49:35

g, t and this is string two. And this is what DNA sequences look like. So what we'll do is we will create

07:49:43

a table of size n plus one plus one and n1 plus one and n2 plus one. So you can see that there are

07:49:51

n1 plus one rows. So if this is of length n1, these are n1 rows and then there's an additional

07:49:56

row. And similarly, there are n2 plus one rows here. So if this is of length n2, there are,

07:50:02

there are n2 plus one columns. So you can see these are n2 columns and there is an additional column here.

07:50:08

And table of i and j. So let's say table of if i and j are zero. So i is a pointer for

07:50:17

the first sequence and j is a pointer for the second sequence. So i selects a row and j selects a

07:50:23

column. So table of i and j represents the longest common sub sequence of sequence one up to i

07:50:30

which means sequence one. So here if let's say i was one and j was one.

07:50:43

So this represents the longest sub sequence of sequence one up to i. So all the positions before

07:50:50

one, which means only the zeroed position just t and sequence two up to j, which means all the

07:50:56

positions up to the first position of up to up to one. So which means only the zeroed position.

07:51:02

So which means a. So table one and table ij represents the longest common sub sequence of

07:51:08

these two of just a and t which is zero. On the other hand if we skip ahead a little bit

07:51:14

if we skip ahead to let's say this position, you can count here i goes zero one two three four

07:51:20

five six. So this is six here and here we have zero one to zero one two three. So this is

07:51:31

table of six comma three. The table of six comma three takes the first six elements

07:51:40

which is t a g t c a and the first three elements a g a of sequence two and it stores the result

07:51:49

of the longest common sub sequence between these two. So I'll just let you look at the table

07:51:55

and maybe even draw the table on a piece of paper and verify that the length three is right.

07:52:00

You can see here a g a a g a occurs here. So a g a is a sub sequence of t a g t c a. So the longest

07:52:06

common sub sequence between them is three. Now what we will do is we will now compare the next

07:52:15

elements of we will now compare sequence one of i and sequence two of g. So let's say we are looking at

07:52:26

let's pick an example let's say sequence let's say i has the valued

07:52:30

i has the value zero one two three four i has the value zero one two i has the value two and

07:52:36

let's say j has the value one so zero one. So if we compare sequence one of i

07:52:43

so which is g and sequence two of a sequence two of j which is also g and if they're equal.

07:52:50

So if they're equal then table of i plus one on j plus one which is this value right. So remember i is

07:52:59

two and j is one. So table one of i plus one so table one of three is zero one two three

07:53:07

and table. And table one of i plus one j plus one is table one of

07:53:14

three and table a table a table of i plus one and j plus one.

07:53:22

I being two and j being one is table one of three and two.

07:53:29

And table one of three and two is the valued two. So this value is obtained by adding one

07:53:37

two table one of i comma j. So because these two elements are equal,

07:53:42

when we can then say that if we take the longest common subsequence of t a and a and add one to it,

07:53:50

that will give us the longest common subsequence of

07:53:53

t a g and a g. So the exact same logic is recursion. We have simply

07:53:59

now reversed it. So we now now we're looking at the last element that we can keep filling out the

07:54:04

last value using some previous values. So this is one case. Similarly here's one other case

07:54:09

where a and a are equal. So the longest common subsequence of t a, g, t, c, a and the longest

07:54:15

common subsequence between a, g, a is one plus the longest common subsequence between t, a, g, c, and

07:54:22

a, g. Okay, one plus this value. So that's one case. The other case is if they're not equal. So let's look

07:54:31

this value for example over here. So we have t a, g, t on this side and then we have

07:54:38

okay, let's look at this one. If we have t a, g, t on this side and we have a, g, a, c on this side.

07:54:43

Now t is the element here and c is the element here. They are not equal. So that means the longest

07:54:52

common subsequence between these two either does not contain t or it does not contain c. It cannot

07:54:58

contain both obviously because one of the strings has to end. So if it does not contain t,

07:55:06

then it is this result. And if it does not contain t, if it does not contain c, then it is this result.

07:55:12

So we simply take the maximum of these two, maximum of these two to get the result for this,

07:55:16

if these two elements are not equal. And that is how you fill out the table. You start from the top,

07:55:21

the first row is zeros because we have empty strings and the first column is also zeros because

07:55:26

we have empty strings. To fill out an element, you compare if the two elements are equal.

07:55:30

And if they're equal, we simply add one to the diagonally left top left element.

07:55:37

If they're unequal, then we take the maximum of the element above it and the element to the left

07:55:42

of it. And that way, we fill out the entire table. Okay. So that's the dynamic programming solution.

07:55:48

And I know this can seem a little bit complicated. Honestly, I still get

07:55:53

confused with dynamic programming a lot of times. And that's why I like to just draw tables and write things out carefully.

07:56:03

Okay. And especially, you have to be specially careful with indices. Because here we are saying that if sequence i, i and

07:56:10

sequence to j are equal, then table one of i plus one and j plus one is one plus table i j. So be just

07:56:15

watch the indices carefully here. But let's implement the solution.

07:56:22

Let's implement the dynamic programming solution. So let's say LCS dynamic programming. So we'll just say

07:56:29

dp here and we have sequence one and we have sequence two. And the first thing we need is we need

07:56:40

a table of results. Now this table for it, let's just grab n1 and n2. So length of sequence one

07:56:49

and length of sequence two. And now we need to create a table with all zeros. How do you

07:56:58

create a table with all zeros? The way to do it, a way to create a list of zeros is this zero for underscore

07:57:05

let's say n1 and let's give n1 and n2 some values. Now if you want to create a list of zeros of

07:57:14

length n1, use simply say zero for underscore or zero for x, you simply ignoring whatever value

07:57:21

you're getting from a range, range n1. And that's going to give you a list of zeros. But we don't

07:57:26

want a list of n1 zeros. We want these want these want to be rows. So we want each of these to

07:57:33

itself be a list of zeros of length n2. So zero for x, n, range, n2. And now we have, you can see that

07:57:44

we have five rows, one, two, three, four, five. Then we have seven columns, one, two, three, four, five,

07:57:50

six, seven. So this is the table that we want to create initially. Now this is a table that we've

07:57:56

created. This is going to be this exact same table. And we're simply going to start each string

07:58:02

from position one this time, not from position zero because we want to have this additional

07:58:06

row where we don't consider either of these. That just makes computations a little easier.

07:58:13

Now we say for iDx1, n, n, addx1 and range n1. So that's that's going to iterate over the

07:58:21

rows. And then for iDx2 in range n2. And that's going to iterate over the columns.

07:58:30

And first we compare if sequence one of iDx1 is equal to sequence two of iDx2.

07:58:46

If they're equal, then we can fill out table of i plus one and j plus one

07:58:55

has one plus table of ij. Okay. And we can see this here. Suppose the first elements were equal.

07:59:08

So suppose this was suppose iDx1 was zero and iDx2 was also zero. Suppose they were equal.

07:59:14

Then this value should be one. So this value should be one plus the diagonally top element.

07:59:18

And that holds to anywhere within the list. So wherever you have two elements equal like G and G are

07:59:23

equal here. So this value is one plus this value.

07:59:29

Else we have table i plus one and j plus one is max of

07:59:46

table i comma j plus one. So you stay in the same row or you go to the previous row or

07:59:58

you go to the previous column which is table of i plus one comma j. And this is the previous column.

08:00:05

Okay. So this is this case where G and A are not equal. So if G and A are not equal,

08:00:11

then we take the maximum of these two values. And that's it. That is going to fill up the table

08:00:17

for us and then we simply say return table. We simply want the bottom right element. We can simply

08:00:24

say return table minus one minus one. So this is going to get their last row last column.

08:00:28

And that's our dynamic programming solution. Let's do evaluate test cases here.

08:00:41

Okay. Turns on there's no i. Okay. Let's just call this i and j.

08:00:55

Turns out i dx 1 is not defined. Let's just make these i and j. Now that we're doing this coding

08:01:01

live, you can see that even after a decade of coding, I still make all of these issues.

08:01:09

It says the list index is out of range. It seems like i plus one and j plus one.

08:01:16

Ah, that's because remember we need an additional row and an additional column to track the

08:01:21

case where either of the strings is empty. So we need to get range n 2 plus 1 here and we need to get

08:01:29

range n 1 plus 1 here. Okay. That's why it helps to have test cases so that you can fix all of

08:01:34

these issues. Now you have test case 0, it passes and test case 1, 2, 3 all of them pass.

08:01:40

You can see that all test cases are fast and you can also verify that the amount of time it

08:01:44

took is now lower than the amount of time it took for memorization. Okay. And so that's the

08:01:53

dynamic programming approach. You simply create a table and you fill out the table. Sometimes just

08:01:59

working with indices within the table can get confusing. So it helps to work with it on

08:02:04

paper and make it clear to yourself and write it in English. That's why we written it in

08:02:09

plain English here. And our exercise for you is to verify that the complexity of this dynamic

08:02:15

programming approach is order of n 1 times n 2. So which is the same as memorization and it's actually

08:02:22

more straightforward to see here because you have two four loops in each of these four loops.

08:02:28

You are simply doing a comparison and an addition and there's not even any recursion to

08:02:34

very, there's not even any recursion for you to worry about. So you just do a comparison and you

08:02:41

do an addition or you take a maximum pretty straightforward. So order of n 1 times n 2 and it does

08:02:48

not even invoke another function. So it does not take up too much memory, it does not take up too much

08:02:55

time. It's very, very efficient. And this is how you solve pretty much every dynamic programming

08:03:02

problem. You write a recursive solution. You come up with a brute force solution and keep in

08:03:06

mind that recursion is almost always the way to go about creating a brute force solution. So you

08:03:14

come up with a recursive solution and then you identify your rather recursion tree. And if you see that

08:03:21

the same sub problem is being called again and again, that is a point where you can introduce memorization.

08:03:26

So you introduce memorization and sometimes you can just write the memorized solution and that's

08:03:31

enough because it's easy to reason about. You just put in a memo and you're done with it.

08:03:36

Even the interviewer or the coding assessment will accept that solution. But in some cases you

08:03:42

will be asked to then remove the recursion and write it as an iterative fashion and that is when

08:03:47

then you have to start drawing a table and think about what are the rows and columns in that table

08:03:53

need to represent. So here the ijth element of the table represented the first the first i elements

08:04:00

of sequence one and the first ij elements of sequence two what is the longest sub sequence between

08:04:05

them and we use that to build the next row and the next column and we then filled out the entire

08:04:12

table and we simply use the last value. Now again this is not very straightforward how to come

08:04:18

up with this and the way you do that is by solving problems. So if you solve five to 10 dynamic

08:04:24

programming problems you will get some intuition about how to build the tables and it's always

08:04:29

very helpful to solve it on pen and paper first especially with dynamic programming so that it's

08:04:34

clear to you what each element of the table represents. Otherwise you may make a lot of off by one

08:04:41

errors like missing the plus one here or missing the plus one here and get confused just like i did pretty

08:04:48

much. And that's the time the time complexity is pretty straightforward. In most cases it is simply

08:04:54

the size of the table but sometimes you may have to do more than constant work here. So keep

08:04:59

that in mind see what it is that you're doing inside your loop. Now inside of inside your loop if you

08:05:04

have to go back and check the entire length of the string so that will introduce another factor

08:05:09

into the equation. So keep that in mind but in most cases counting the iteration should be good enough

08:05:14

to give you an idea of the time complexity. Okay so that's the first problem and let us just

08:05:24

commit this and out saved to my profile. So if I just open this up here you can see that now I have

08:05:35

notebook called longest common sub sequences and I can share it online whenever you work on a

08:05:40

notebook it's always a good idea to make it public put it up on Joven all you need to do is run

08:05:45

Joven.com it and share it online just press the share button and then you can share it on Twitter

08:05:49

LinkedIn Facebook or wherever you like. So that's the first problem that we looked at.

08:05:56

Now let's come back to listen for and by the way the problems that we're talking about all the

08:06:02

problem statements the graphs the images you can see them in the second link here but we will once again

08:06:08

open up the problem solving template and now we'll use it for the second problem. Let me run this once again.

08:06:18

We're going to look at the second problem which is the Napsack problem.

08:06:23

So let's read the Napsack problem. It's also called a zero one Napsack problem. Here's there are

08:06:28

many variations of this problem but here's one way to state it that you might come across

08:06:34

or something similar you are in charge of selecting a football or a soccer team from a large pool of

08:06:40

players and each player has a cost and a rating. So there's this election going on you have to

08:06:47

come up with a team for this year and you have a large pool of players each player has a cost and

08:06:53

each player has a rating. Now you have a limited budget so you need to build a team within the

08:06:59

budget. So what is the highest total rating of a team that you can create which fits within

08:07:04

your budget. So this is the question here you have to maximize the total rating but fit it fit

08:07:10

the total cost within your budget. We have two variables here rating and variables.

08:07:14

Rating needs to be maximized cost needs to be simply optimized to the extended it fits in the

08:07:20

budget and this is simplifying assumption here is that you can assume that there is no minimum

08:07:25

or maximum team size. This is simplification and later you can introduce a criteria there as

08:07:32

well that you want to build a team of exactly 10 people and see if you can also solve that problem in

08:07:37

a way. So that's the Napsack problem let's copy it over and here's a Jupiter notebook

08:07:47

a fresh problem solving template. Let's simply change the title here.

08:07:58

And it's also called the zero one Napsack problem because each item can either be chosen

08:08:05

or not chosen and let's give it a project name here too.

08:08:17

Let's commit it and let's paste the problem statement here.

08:08:30

Okay. So that's a problem statement and this is a specific or a special form

08:08:38

of a more general problem statement and we look at the general problem statement in a second

08:08:43

we'll when we try to state the problem clearly but here's once again the systematic strategy we

08:08:48

will apply we will state the problem clearly identify the input and output formats come up with some

08:08:53

example inputs and outputs and try to cover all the H cases then we will come up with a correct

08:08:58

solution for the problem and state the solution in plain English it just has to be simple

08:09:02

correct solution not to complex then we apply the right technique to overcome the inefficiency

08:09:09

and then we analyze the algorithm and identify any inefficiencies after implementing the solution

08:09:15

and finally we apply the right technique to overcome the inefficiency and then repeat the process

08:09:19

of stating the solution implementing it and analyzing it. So to state the problem clearly what we can

08:09:27

do is we can abstract out the problem in more general terms and that is what is stated here

08:09:33

and let's just grab that and we'll take a look.

08:09:45

So here we have we are given an elements and each of which has a weight and a profit

08:09:51

so you have an elements and here's the profit of each element and here's the weight you can

08:09:55

know of each element so you need to determine the maximum profit that can be obtained by selecting

08:10:00

a subset of the elements weighing no more than a given weight w so you have a capacity a maximum

08:10:07

capacity let's say the maximum capacity is 15 and you have to select certain elements so that

08:10:13

you fill out the total weight is no more than the capacity and the total profit is maximized

08:10:21

that's and this is why it's called an abstract problem so assuming here you have a bag or an

08:10:24

abstract with a capacity of 15 kilograms and these are the weights of the items and these are the

08:10:30

profits. Now in this case you can see in this example the optimal selection is these four elements

08:10:37

which are the weights 5, 3, 2 and 5 so that you fill out the total capacity of 16 of 15 and

08:10:44

the solution on the maximum profit that you can obtain is 7 plus 4, 11 plus 5, 16 plus 3, 19.

08:10:52

Now you can try other combinations and verify that this is actually the best solution

08:10:57

do give it a shot. So what are the inputs here so we have pretty clear we have an input

08:11:07

weights so these are the weights of the this is a list of numbers containing weights

08:11:19

and then you have profits a list of numbers containing profits

08:11:27

and this should have the same length as weights and then finally you have a capacity

08:11:37

the maximum weight allowed and there you go and now we have outputs so now the output would simply

08:11:48

be the max profit so this is the maximum profit that can be obtained by selecting elements

08:12:06

of total weight no more than W or no more than capacity.

08:12:17

Okay great so that gives us a pretty good starting point now we can write a function

08:12:25

signature here so we write max let's say def max profit

08:12:29

and we can give it weights and we can give it

08:12:37

profits and we can give it a capacity and we pass.

08:12:45

So now we have defined the problem we have stated we have identified input and output formats

08:12:50

now we need to come up with some example inputs and test cases. Once again we have listed out

08:12:55

a few test cases here so we will have a few generic test cases where you have just random

08:13:01

set of weights and profits and you identify the anapsack the optimal solution then here's one

08:13:08

option where all of the elements can be included you can take everything here's another option where

08:13:14

none of the elements can be included you have to think about all these scenarios here's one where

08:13:20

only one of the elements can be included then you may also think about areas where

08:13:24

you do not use the complete capacity okay you do not use the complete capacity

08:13:34

because the optimal solution is actually taking a lower capacity so there may be a way to

08:13:37

fill out to capacity but that may have a lower profit then another option which takes less

08:13:44

than the complete capacity but has a higher profit so think about some cases here think

08:13:48

think about some good test cases here and I will just copy over these for now

08:13:57

and then what we'll do is we will express these test cases once again as

08:14:01

dictionaries we have test 0, test 1, test 2 all of these expressed are dictionaries and these are

08:14:06

covering all the test cases that I mentioned here you can see here are some weights and some

08:14:10

profits and the capacity is 165 and then the optimal solution is 309 now we are simply asking

08:14:16

here for the optimal solution the maximum profit that can be obtained but an extension of this

08:14:22

problem is to identify which are the elements that should be chosen and it's a simple extension

08:14:28

it's a good exercise for you to try out and you can discuss it in the forums we have test 0,

08:14:34

test 1, test 2, test 3 and 4 and 5 so we have a total of 6 test cases let's copy over these test

08:14:42

cases here and let's put them here into a single string and that gives us the test cases okay

08:14:52

now coming up with the solution so once again the first step is to try and come up with a recursive solution

08:15:00

and a recursive solution is again quite straightforward we'll write a recursive function max profit

08:15:07

that given an index so this time we have just one sequence so given an index within the

08:15:12

sequence so let's say our index IDX it computes the maximum profit that can be obtained

08:15:20

using the elements from IDX onwards so 31547 using all of these elements IDX onwards

08:15:30

the maximum profit that can be obtained right and using a given capacity so it will take an index

08:15:38

so it will take an index and a capacity so if let's say the IDX is 1 so it will then look at just

08:15:47

these elements and of the capacity 10 so it will try to fill the capacity of 10 and that's

08:15:54

a recursive function and why are we creating a recursive function like this there's a simple reason

08:15:59

now suppose IDX has the value 1 and the capacity 10 or let's say the capacity is 3 then the weight of

08:16:06

this element is greater than the capacity so that means it cannot show up it cannot be selected

08:16:13

because it cannot fit inside the back the nap site that we have so then the solution for this

08:16:20

sub problem with IDX equal to 1 and capacity equal to 3 is same as the solution for this

08:16:28

sub problem with this element removed because you cannot include this element within the nap sack right

08:16:33

so if you remove this element and simply consider these elements the remaining elements which

08:16:39

essentially means IDX plus 1 so max profit of IDX plus 1 profit of weight IDX plus 1 profit

08:16:48

IDX plus 1 and capacity is the answer or max profit of weight IDX profits IDX and capacity

08:16:56

because the current weight 5 is greater than the capacity 3 which is which the recursive function has been

08:17:02

so that's one option but the more general case is that you have enough capacity so let's say you

08:17:06

have a capacity of 10 recursion was called with the capacity of 10 and you are at IDX 1 so then

08:17:13

you have two choices either you include this element in your nap sack or you do not include

08:17:20

this element in your apps because you don't know whether the optimal solution will have this

08:17:25

element or not so you try both so there are two possibilities we either pick weights IDX this

08:17:31

element or we don't and what we can do is we can simply compute the result in both cases and pick

08:17:37

the maximum so if we don't pick weights IDX then once again if we don't pick this element so the

08:17:44

capacity remains the same let's say the capacity was 10 so we simply try out to fill out the

08:17:48

capacity of 10 using the remaining elements so we simply call max profit with weights IDX plus 1

08:17:54

or profits IDX plus 1 onwards and the remaining capacity which is 10 but if we do pick the

08:18:00

element if we pick the element and we had a capacity of 10 then the optimal then the solution the

08:18:07

best solution in this case will have a profit 3 more than the solution for this case and since we

08:18:14

also use some capacity so we need to add 3 in the profit and we need to subtract 5 from the

08:18:20

capacity right so if we pick weights IDX then the maximum profit for this case is profits of IDX plus

08:18:30

max profit of weights IDX plus 1 onwards profits IDX plus 1 onwards but because we've used up

08:18:35

some capacity we reduce the capacity in the recursive call okay and that is why a recursive call

08:18:40

takes both an index and a capacity okay I hope that makes sense secure the recursive 3 that tells

08:18:49

you the same thing we started the first index and we we have the capacity and if we don't pick

08:18:55

the first element then we simply the answer is simply the the best solution for second index

08:19:04

onwards with the same capacity if we do pick the first element then the answer is the second solution

08:19:09

onwards with the reduced capacity with the profit added okay and then we simply take the maximum

08:19:17

of these two cases so we call these two recursive calls and then we simply take the maximum of

08:19:21

these two cases to get back the final result or the final best answer and the final end case

08:19:30

is that if we reach the end if weights IDX onwards is empty if the index that we're tracking

08:19:35

has reached the very end then irrespective of what the capacity is the maximum profit is in that

08:19:40

case is zero so let's try and implement this now let's copy this over as the explanation

08:19:54

let's try and implement the solution let's say let's call it max profit recursive

08:20:15

and this is going to take a set of weights it is going to take a set of profits and it is going

08:20:21

to take a capacity and it's also going to take an index which the index will start out in zero

08:20:28

so now if the index is we start with the base case so if IDX equals the length of weights

08:20:37

in this case there's nothing left to do we simply return zero because we don't have any more

08:20:42

then we check if the weights IDX is so the current element is greater in weight than the

08:20:52

capacity then it's a pretty straightforward solution we simply return max profit recursive of

08:21:05

weights, profits, capacity plus one sorry capacity and IDX plus one so we simply ignore

08:21:15

this element because we cannot fit it in the capacity that we have else we have two options

08:21:22

we have option one option one is even though it can fit within the bag we don't take it

08:21:28

we every because the optimal solution may still not have it just because it fits does not mean

08:21:32

we should take it so we look at the option one which is once again the same as this where we

08:21:37

ignore this element and then we have we look at option two in option two we actually put this

08:21:42

element into the bag so since we are putting this element into the bag then we get we get profit

08:21:48

from it so we get profits of IDX and then we call max profits recursive and this time we call it with

08:21:56

weights and profits and now the capacity has reduced a little bit because we have taken this

08:22:05

element so now we can now we need to fill the remaining we fill the need to fill the bag with the

08:22:11

remaining elements from IDX plus one onwards with a limited capacity of capacity minus weights

08:22:17

of IDX and then finally we just put in IDX plus one so that we can start calculating the

08:22:24

solution from the next element onwards so that's max profit recursive again not very

08:22:32

difficult it is just about six seven lines of code and let's try it out here a test zero

08:22:41

let's try max profit recursive with test zero input

08:22:50

and we need to get weights capacity and profit all of these out of it the simple way to do that

08:22:57

is simply to put in star star and we'll get back all of these we'll get passed in capacity will

08:23:04

get passed as a capacity parameter in weights will get passed in the weights parameter in

08:23:08

profits as the profits parameter okay so we've encountered error and that's completely fine

08:23:16

completely fine to encounter an error I see so what we've done here is we have not really

08:23:24

taken the maximum of these two we've just defined the two options so we do need to take max of

08:23:28

option one an option two okay once again this is why helping test having test cases helps

08:23:36

and you can see they're now we call max profit and we can also add a timer here the max profit

08:23:41

it takes 210 microseconds but it results it returns the result 309 great we get back the result

08:23:48

309 here which is what we expected so our function is working correctly we can even evaluate it

08:23:56

on all the test cases so from joven dot python dsa we import evaluate test cases

08:24:12

and then we simply call evaluate test cases on all the inputs so we pass in max profit

08:24:23

recursive and then we pass in all the test cases as tests now you can see that we have these test

08:24:31

cases and each test case seems to be passing is fine all six test cases are passed and these are

08:24:38

the times they took so that's your recursive solution pretty straightforward once you reason it out

08:24:45

once you may be just look at an example draw tree of recursion yourself work it out on paper the

08:24:53

code is in fact in most cases fairly simple and this is what the recursion tree looks like each time

08:25:00

we make a choice to either include the element or not include the element and now you can reason

08:25:05

the complexity very easily because now we have n elements for each one we keep making this choice

08:25:09

so that means we end up with two to the power n leaves and from there it follows that the

08:25:16

complexity of the recursive algorithm is order of two to the power n right so it could be

08:25:20

two times or c times two to the power n but and in the bigger notation it's order of two to the

08:25:26

power of n so it is exponential and complexity and why is it exponential complexity once again

08:25:32

there are it's a possibility here that we may be computing a lot of things repeatedly because we

08:25:38

are creating so many of these sub problems so it's possible that we may be creating we may be

08:25:44

re-computing a lot of data here so now the task for you or the an exercise for you is to write

08:25:52

the memoized version of this so what is it that you need to memoize now the trick here is to

08:25:58

look at what is changing within the recursive calls so now in max profit recursive you can see that

08:26:05

weights and profits remains the same but it's a capacity and the idx that change so you can

08:26:11

take the capacity comma the index the idx as the key in your memoization dictionary and each time

08:26:19

you compute so each time let's say you compute this or you compute this or you compute this

08:26:24

store the result in the dictionary before returning it and then at the beginning of the recursive

08:26:29

function check within the dictionary if this value is already present okay so remember what we did

08:26:36

for longest common sub sequence we defined a recursive function internally we defined a memo

08:26:42

dictionary internally and the recursive function kept either checking the dictionary or filling the

08:26:49

dictionary if it could not find a value and that could eliminate a lot of the repeated work

08:26:55

in your problem okay so that's the challenge for you to try out implement the memoized solution

08:27:03

and what we do is we will go ahead and we will

08:27:08

implement the dynamic programming solution so let's just commit our work once again and we

08:27:12

analyze the algorithms complexity in recursion it's ordered to the power of n in memoization

08:27:22

now let's in exercise for you what do you think the complexity will be but let's apply dynamic

08:27:26

programming so let's look at a dynamic programming solution and once again for dynamic programming you

08:27:32

have to create a table you always almost always have to create a table for dynamic programming and in this

08:27:38

case we can see that there are any limits so there are n rows within the table because we have

08:27:45

n elements to choose from and we have a number of columns going from 0 to capacity plus 1 going from

08:27:51

0 to capacity and that's why there are total of capacity plus 1 columns and in fact what we can

08:27:57

do is we can also include another column at the top here another row at the top here which we have

08:28:03

not which is not shown here but what n represents an n is a number of elements so what n represents

08:28:11

or what the a particular element in the table represents so table of i comma c what it represents

08:28:18

is the maximum profit that can be obtained using the first i elements if the maximum capacity is c

08:28:27

so if your maximum capacity is c let's say your maximum capacity is 3 what is the maximum

08:28:33

profit that you can obtain using the first two elements so here let's say we are at this

08:28:39

position so using the first two elements of the list within this capacity okay so the first

08:28:52

two elements have weights 1 and 2 and the capacity is 3 so you can you can actually pick

08:28:59

sorry the first two elements have weights 2 and 3 and the capacity is 3 so you either pick

08:29:03

this element or pick this element now if you pick this element the profit is 1 and if you pick

08:29:08

this element the profit is 2 so the solution is to pick this element and you get you fill the

08:29:14

capacity 3 and you get a profit of 2 you cannot pick both because your capacity is 3 okay so that's

08:29:19

the logic here a very simple visual representation now remember that there will also be a 0

08:29:25

throw here which we have not shown but this is something that should be here another 0 throw

08:29:31

so the 0 throw represents that you have not picked any of the elements and if you don't pick any

08:29:35

of the elements it is simply going to contain all 0's and that's why it's not shown here the first

08:29:39

throw assumes that you have picked you can pick only the first element so you can you you can't pick

08:29:46

the first element telecapacity of 2 and then from a capacity of 2 onwards you pick the first

08:29:50

element and that has the maximum capacity of 1 the maximum profit of 1 the second row or the

08:29:59

row number 2 with row with index 2 represents the fact that you can pick both of these elements

08:30:06

and if you can pick both of these elements once again at capacity 0 none of them can be picked

08:30:11

at capacity 1 none of them can be picked at capacity 2 this element can be picked which has

08:30:17

a weight 2 and it gives you maximum profit of 1 at capacity 3 this element can also be picked

08:30:24

so now you have a choice to pick between the two of these so you might as well better pick this one

08:30:30

because this is going to give you higher profit and then finally when the capacity becomes 5

08:30:34

you can pick both of these elements and you can pick both of these elements and that is going to

08:30:38

give you a profit of 2 plus 1 3 and so on so you keep filling out the stable for each

08:30:44

step here or for each set of first eye elements you fill out the capacity table and then you

08:30:52

use the information to fill out the next row and the next column and so on and finally what we need

08:30:58

is using all the elements and using the maximum capacity that we have what is the maximum

08:31:05

profit that we can obtain so the last element of the table will give you the result okay so what

08:31:11

is the logic look like we will fill the table row by row and column by column now if table of i comma c

08:31:20

table of i comma c let's say this is a certain position here table of i comma c

08:31:25

can be filled using some values in the row above it okay now if you look at the table of i comma

08:31:31

seal you look at look at this element for example yeah let's look at this element here

08:31:39

so in here c has the value 3 and then i has the value 0 which is a row that is not shown

08:31:45

1 2 3 4 so i has the value 4 and c has the value 3 so if

08:31:58

yeah so if weights of i is greater than c so 0 1 2 3 4 if if this if this

08:32:06

weight so this weight of this element is greater than the capacity so the weight of this element is

08:32:12

4 it is greater than the capacity then this element cannot show up in this maximum profit

08:32:21

why because its weight is greater than the capacity so obviously it cannot show up in the maximum

08:32:26

profit now if it cannot show up in the maximum profit then this cell can be filled using the

08:32:32

value above it because in any case you cannot put in this element so you might as well get the

08:32:36

result by using the first three elements. And in that case, the value of this cell is obtained

08:32:42

from the value of the cell above it. That's one case. Now, on the other hand, let's come

08:32:47

here, you come to this case, to this cell in, to fill this cell, because you have a capacity

08:32:53

of four, you have the option of either choosing this element or of not choosing this element.

08:32:59

Now, if you do choose this element, let's say you choose this element with a capacity of four.

08:33:04

With the capacity of four, you get back a profit of nine. And now, you have no more capacity

08:33:10

left to create a to fill more elements. On the other hand, if you do not choose this element,

08:33:17

then that's the same as this value, because if you do not choose this element, then you have to fill

08:33:22

the capacity of four using the value of, using the first three elements. And that simply gives you

08:33:28

the same highest profit as the previous cell. So, you just consider these two cases,

08:33:33

whether we choose the element or we do not choose the element. Now, if, if you do not choose

08:33:38

the element, the value comes from above. If you choose the element, then the value comes from where,

08:33:43

let's see. If you choose the element, the profit of nine comes, and you fill all the capacity

08:33:49

four. So, you have no remaining capacity. But on the other hand, if the capacity was six,

08:33:54

and you choose the element, then you have chosen the element, and you've used up the capacity

08:34:00

four. So, you can still use the previous three elements to fill the remaining capacity, which is

08:34:06

six minus four, so which is your capacity of two. So, you can go back to the previous

08:34:10

row and check the capacity two. And see how much was a maximum profit that you can obtain with

08:34:16

capacity two. And it turns out that with capacity two, using the first three elements, you can

08:34:22

obtain a maximum profit of one. So, the maximum profit here, when you choose the element,

08:34:26

is nine plus one n. Similarly, here, a maximum profit that can be obtained, if you choose the

08:34:33

element is nine plus seven minus plus from the previous row, you pick the element with a capacity

08:34:41

seven minus four, which is three, so nine plus five fourteen. So, that's the logic here.

08:34:47

Sometimes you choose the element, sometimes you don't choose the element, and in fact,

08:34:52

the result of the cell is simply the maximum of either not choosing the elements, the maximum

08:34:57

of this cell or choosing the element, and subtracting the weight, which is six minus four, two.

08:35:04

So, maximum of this in that, okay. So, let's implement this same dynamic programming solution.

08:35:11

Once again, do work this out on paper. It really helps to work it out on paper.

08:35:17

Well, let's say we have max profit dp, the dynamic programming solution. We have weights,

08:35:27

we have profits, and we have a capacity. And then let's say n is lenn weights. So, we need to

08:35:37

create a table. So, this is our table. Our table contains n rows. So, we have lenn n. And then,

08:35:48

in for each of the rows, we contain, we have capacity plus one. Oh, we contain n plus one rows,

08:35:57

remember. So, we also want to consider the case where we don't consider, we don't take any of the

08:36:01

elements. And it is filled with zeros and the number of columns is capacity plus one.

08:36:13

To check the values from zero to capacity. So, that's a table right now.

08:36:18

You can check what this capacity looks like. Let's say n has nn capacity.

08:36:24

Have the values here n in this case is 5 and capacity is 10.

08:36:34

Oh, we don't need a lenn here.

08:36:39

We don't need a lenn here as well.

08:36:42

It's all perfectly natural to make these mistakes. This should be range, not lenn.

08:36:49

So, this should be a range and this should be a range too.

08:36:58

Yeah. Now, you can see that we have created n rows, n plus one rows. So, one for each of these and then

08:37:04

one more row above containing all which will contain all zeros. This is in the case where we

08:37:08

don't pick any of the elements. And then we've created 11 columns. So, this is for capacity zero.

08:37:13

So, again, the first column will also contain all zeros. And this is something that you will

08:37:18

often see in dynamic programming. You will have an additional row at the beginning or at the end

08:37:23

containing all zeros. And that is simply to make your calculations are computation easier. But

08:37:29

what that will lead to is off by one error. So, you need to be very careful while doing this.

08:37:36

And now, we'll fill out this value using either this value or by subtracting the weight of the

08:37:41

element that's here and getting a value from the previous row. So, now we start iterating. So,

08:37:47

when now we say for i in range n and for j in range c, let's just say for c in range capacity,

08:38:03

we should be capacity. Table of i comma c, and it's actually going to be i plus 1 and c plus 1

08:38:13

because we have these additional rows and columns. Table of i plus 1 comma c plus 1 is there are two

08:38:22

cases here. If weights of i is greater than c, the current capacity,

08:38:32

then we can simply look at the previous row. So, which is this case, let's say the weight 3 is greater

08:38:40

than the current capacity 2. So, then we simply copy over the value from the previous row,

08:38:48

the same column. So, we just say table of i comma c plus 1, we see. So, the capacity should

08:38:58

go from the value of 1 because we don't want to affect the first column. So, the capacity

08:39:03

goes from the value of 1 to a value of 10. So, capacity c goes from the range of 1 to capacity.

08:39:10

And if the weights i is greater than the capacity, then we cannot fill the table on the other hand

08:39:16

if it is, if it fits within the capacity, then we have two options. So, table of i plus 1 comma c

08:39:23

has two options. So, one is we don't use the current element, we don't use the current element

08:39:30

and that gives us table i c once again. The other option is we use the current element. So,

08:39:34

we get profit from the current element. So, profits i, but we do not get profits. But that reduces

08:39:40

the capacity. So, we then have to pick table of i. But now, we have to pick c minus weight

08:39:51

weights of i. Okay. And that should fill out the entire table pretty much.

08:39:58

That is a nice thing about dynamic programming. You simply just have to write this one

08:40:02

solution or this one recurrence and be careful about it. And everything else is taken care of

08:40:08

by this loop here. And now, we simply return table of minus 1 and minus 1.

08:40:17

And let's see if that works. It's likely that there are some issues here, but let's see.

08:40:22

We have test cases max profit dp with the tests that we have.

08:40:34

Great. So, we are seeing an issue already. I see here that

08:40:39

this should be range. And this should be range.

08:40:50

Okay. One thing that we haven't done here is

08:40:59

well, it seems like our solution is always zero.

08:41:03

Ah, this should be capacity plus 1. So, that we this takes all the values from zero to

08:41:11

capacity. Right. So, see the iterator should take the values from 1, 2, 3 for all the way up to

08:41:15

the maximum capacity. And the range does not end. So, the range does not include the end value.

08:41:21

So, you need to put capacity plus 1 here. Okay. Now, with that out of the way, you see once again,

08:41:29

these off by one errors are always going to bug you with dynamic programming.

08:41:33

I've probably solved 50 or 100 terms in dynamic programming, and I still make these errors.

08:41:39

But with that out of the way, you can see now that each of the test cases seems to pass.

08:41:45

Now, there may be other cases, which you have not accounted for. But overall, we've covered

08:41:50

all the test cases here. And we ended up with now a dynamic programming solution.

08:41:56

And I'll let you figure out the complexity here. But once again, it's pretty straightforward

08:42:01

because we are filling up this table. And filling up this table simply requires this constant amount

08:42:05

of work, which is a comparison and then potentially another comparison and an addition and a

08:42:11

subtraction. So, like four or five operations. So, you have this n times n, you have this n times

08:42:20

n times capacity where n is the length of weights and capacity or W is the total capacity.

08:42:27

So, n times W is the number of iterations and that really also is the complexity, the time

08:42:34

complexity of the algorithm. So, that's an abstract problem. And now, what you can do is try and figure

08:42:42

out not just what is the maximum value, but also figure out what are the actual elements that

08:42:49

you're using. Now, you can do this for the napsack problem and you can do this for the longest

08:42:53

subsequent problem. Figure out the actual longest subsequent and it should be possible to do that

08:42:57

with just a small modification. Now, use the forum. If you have any questions about the contents

08:43:05

of this lecture, go back to the lesson page and open up the course community forum here.

08:43:12

You can see here that this is the lesson for recursion and dynamic programming lesson. You can

08:43:17

post your question here and you can also discuss ideas on how to figure out what the longest

08:43:23

common sequence is and what the best selection for the napsack problem is. So, what you do next?

08:43:28

Well, you can review the lecture video and execute the Jupiter notebook. The next step is also to

08:43:34

complete the assignment. Now, we have released assignments 1 and 2 so far. If you go back on the

08:43:40

lesson page, you will find lessons, you will find assignments 1 and 2 and you can walk on them.

08:43:46

There is sufficient time and also work on optional questions and do participate in forum discussions

08:43:53

and if possible join or start a study group too. That's a great way to stay motivated.

08:43:59

This was lesson 4 of data structures and algorithms in Python. Thanks and talk to you soon.

08:44:06

Hello and welcome to data structures and algorithms in Python. This is an online certification course

08:44:11

being conducted by Joven. Today we are on lesson 5 graph algorithms like BFS, DFS and short

08:44:18

dispots. My name is Akash and I am your instructor. You can find me on Twitter on at Akash and

08:44:26

it's. If you follow along with the scores and complete all the assignments and build a course

08:44:31

project, you can earn a verified certificate of accomplishment for this course.

08:44:36

So with that, let's get started. The first thing we will do is go to the course website,

08:44:43

ithendasay.com. You can point your browser to pythondsay.com to open up the course page

08:44:51

and on the course page you can enroll for the course and you can view all the previous lessons

08:44:56

and assignments. So do check it out and do check out the course project as well. But for now,

08:45:03

we will open up lesson 5, graph algorithms. Now on this page, you can watch a video for the lesson

08:45:11

later, the same video that you're watching right now and you can also catch a Hindi version of

08:45:16

your wish and here is the code that we are going to use today. The first notebook under the

08:45:24

heading notebooks. So let's open it up and this is a Jupiter notebook hosted on Joven. You should

08:45:32

be familiar with it by now but here you can see that there are some explanations and then there are

08:45:37

some code cells where we can write some code. You can see that there's some code here.

08:45:42

Now to actually execute and edit this code, we will need to run this notebook.

08:45:48

You can find the instructions to run the notebook right here. But the simplest way to do it is to click

08:45:53

run and select run on binder. Now this will take a second or two but this will take your Jupiter

08:46:01

notebook and create a new machine in the cloud and send your Jupiter notebook to that machine

08:46:06

for execution. This is a free service that you can access via Joven.

08:46:17

You can also run the notebook on your own computer directly if you wish before that you can check

08:46:22

the run locally option here. So our Jupiter notebook server is now ready. So we can now

08:46:31

start editing and writing some code. Let's just go full screen here.

08:46:40

Okay so the topic today is graph algorithms, BFS, DFS and shortest parts using Python.

08:46:46

Now before we talk about graph algorithms, let's just try to understand intuitively what graphs are.

08:46:52

Now here's an example of a graph in the real world. So this is the real way map of India. You can

08:46:58

see here all the train stations that you have in India. They're represented using these black dots

08:47:06

points. They're also labeled. So each train station points with city or a village.

08:47:12

So all these are also labeled. And then you can see connections between these stations. So these are

08:47:18

as you might guess railway lines and you'll see that there are three or four colors involved.

08:47:23

These colors could represent different types of railway lines like different gauge,

08:47:28

meter gauge, broad gauge, et cetera. Or these could represent different zones.

08:47:34

So there's some information contained in the connections as well. Now another important thing

08:47:39

is that each railway line between two cities will also have a certain length.

08:47:46

So that's what a graph is roughly. And the kind of questions that you may want to ask here is

08:47:52

for example, is there a path from New Delhi to either a path? So given this information first

08:48:02

of all the questions, how do you even represent all this information? Because you have so many

08:48:05

railway lines connections between different cities, so many hundreds of cities. How do you even

08:48:09

represent this? So that you can start writing algorithms to answer these questions. So if you're

08:48:14

building a search, a train search website, then you would have to answer given New Delhi and

08:48:21

Hyderabad, is there a way to get from New Delhi to Hyderabad? Okay, that's the first question that

08:48:26

you might ask. Now if there is a way, then the next question might be that what is the path

08:48:33

with the shortest number of stops? So do you go this way for the shortest number of stops?

08:48:39

Do you go this way or do you go this way? Another question could be what is the path with the shortest

08:48:44

distance, right? So sometimes if you measure the distance and if you measure the number of stations,

08:48:50

the number of stops, they may be different along different paths and one may be greater than the

08:48:54

other in certain cases. So those are the kind of questions that we want to ask and answer today.

08:49:05

Another question could be what are all the stations reachable from New Delhi within one stop

08:49:10

or two stops or three stops or ten stops? So those are the kind of questions we'll try and answer.

08:49:15

And for that we need a way to represent graphs in a more abstract fashion because the same

08:49:22

question can be asked in a different context. For instance here we are looking at flight routes,

08:49:27

international flight routes. Now once again you can ask the exact same thing here.

08:49:31

Is there a way to get from New Delhi to Vancouver? Now if there is, then how many stops will that

08:49:39

require? What is the minimum number of stops we can take to get from New Delhi to Vancouver?

08:49:45

Or what is the minimum time it might take? Maybe if you're okay with taking multiple stops,

08:49:50

but you want to minimize the, the time taken of the distance traveled because you're concerned

08:49:56

with the miles for some reason. Another thing you could ask is what is the minimum cost if there

08:50:03

is a cost along each route? Okay. Now here's one more example from a very different domain.

08:50:10

This is hyperlinks or the internet essentially. So you can see here you have a whole bunch of websites

08:50:18

and you have links on websites. Now links on websites point to other websites and in this case

08:50:23

it is a one way connection. You can see that from this particular course website we have a link to IBM

08:50:31

but from IBM you may not have a link to this course website. Now that's an interesting thing

08:50:35

that's a slight variation here and this is called a directed graph because each

08:50:41

connection here is has a particular direction. Now this is again interesting to ask

08:50:48

is there a way to navigate from cs.umast.edu to ithaka weather? If there is what is the shortest way?

08:50:58

What do, what does that path look like? So those are the kind of questions that we want to answer

08:51:03

and to do that we will need a more abstract representation of graphs and we will start with

08:51:08

the simplest possible representation where you have certain points or what we will call

08:51:14

nodes or vertices. So these are two terms that are used for these points. So nodes or vertices

08:51:21

or graph has certain nodes or vertices and just to make things easy these could be cities

08:51:27

of these could be web pages or these could be something else but just to make things easy what we

08:51:31

do is we will number the nodes. So in our graph if we have 10 nodes then we will number the

08:51:39

nodes from 0 to 9. Okay this is and they can be numbered completely arbitrarily. There's no

08:51:44

reason to name name number the 0 or number this one. What's more important is that we should use up

08:51:49

all the all the numbers from 0 to n minus 1 if we are dealing with n nodes. Now why do we do that?

08:51:57

We will see in a moment when we try to represent graphs using certain data structures like

08:52:02

adjacent adjacency list etc but we want a number or nodes from 0 to n minus 1 and this number

08:52:08

is arbitrary this one doesn't represent anything in the sense that one being greater than 0 or so on.

08:52:14

So these nodes have labels and then you have edges between nodes. So an edge is simply a pair

08:52:21

an edge is simply something like 1 comma 2. So a pair 1 comma 2 tells you that there is an edge

08:52:28

between the node 1 and node 2. Now as we move forward we will also store some information within an edge

08:52:36

and we will call that weight of an edge and we will also later look at directed edges and those

08:52:41

will get us directed graphs but let's start with this and let's see how we can now represent

08:52:47

with this basic structure how we can represent a graph. So we can represent a graph using two

08:52:53

variables. So one is called a number of nodes and the number of nodes is in this case

08:52:59

five and then we can represent the edges using a list of pairs.

08:53:05

So in this case the pairs are 0 comma 1. In this case the pairs are 0 comma 1 that's an edge

08:53:23

then 0 comma 4 that's an edge 2 then we have 1 comma 2 so 1 is connected to 2 and the edge

08:53:34

in this case is by directional. So when we're saying 0 comma 1 we're saying it automatically says

08:53:38

that 1 and 0 are also connected right. So 1 comma 2 and then we have 2 comma 3 and which order we

08:53:45

write these in doesn't matter we could have just written 3 comma 2 here as well.

08:53:51

Or we also have 1 comma 3 and then we have 1 comma 4 and then finally

08:54:02

we have 3 comma 4. So this is how we represent this data structure which what we've drawn here

08:54:12

is now represented in code using these two variables and we can check here if we simply print

08:54:18

the number of nodes and the length of edges we can verify if this is roughly correct. So you see

08:54:26

we have five nodes and we have 1, 2, 3, 4, 5, 6, 7 edges. Okay, seems right to me we could there

08:54:32

may be a mistake here but roughly we have set things up correctly. Okay.

08:54:39

Now the question becomes is this representation good enough. Now this representation is good enough

08:54:45

if you want to convey the structure of a graph to someone I could give you these two variables

08:54:49

and then without showing you this image and you could use this information to draw the graph

08:54:55

on a piece of paper. So this representation is complete it provides all the information about

08:54:59

the graph but it may not be efficient. For example if you want to find out which nodes

08:55:06

the node 1 is connected to we would have to iterate over the entire list of edges we would have to

08:55:11

go through this one and then check if either of these is one and check if either of these is one

08:55:15

and so on. So that makes it very tricky to access any information efficiently.

08:55:22

Rather it will be much nicer to just look at a list of nodes that one is connected to in some

08:55:27

way and go from there. Now if you want to find a shortest path we would first have to find all the

08:55:31

nodes that one is connected to and then for each of those we would have to find their neighbors

08:55:36

and then for each of those we would find have to find their neighbors and so on. So it would get pretty

08:55:41

tedious to go through the list so many times. So that's why and by the way by a neighbor we

08:55:47

represent we mean two nodes that are connected by an h. So 0 and 1 are neighbors but 0 and 2 are

08:55:55

not neighbors. So that's a very simple nomenclature that we can use and what we can say is if we

08:56:03

track the path we say 0 1 2 and there if there is an h between both of them we say that 0 1 2 is a path.

08:56:09

So 0 1 2 in this case is a path but 3 0 1 is not a path because there is no path but

08:56:17

there's no edge between 3 and 0. Okay. Now we'll see what what we mean by show of paths and neighbors

08:56:22

and so on in some time. But to work with graphs more efficiently we will represent them using

08:56:32

what's called an adjacency list. Now the name it explains what it contains. So the adjacency list

08:56:40

contains a list or each node and it contains a list of all the nodes that are adjacent to

08:56:47

that node. Now again adjacency is the same as an adjacent to same as neighbor. So if for each

08:56:53

node so for example for the node 0 we maintain a list and that list contains the numbers 1 and 4

08:56:59

indicating that 0 is adjacent to or 0 is a neighbor of or 0 is connected via a direct edge to 1 and 4.

08:57:07

So that's why you have 1 and 4 here and then 1 is connected to 0 2 3 and 4. You can see that 1 is

08:57:14

connected to 0 2 3 and 4. Similarly 2 is connected to 1 and 3. Please connected to 1 2 4

08:57:23

and 4 is connected to 0 1 3. Now this is more convenient for sure 1 because since this is and

08:57:29

this is a list. If you want to find let's say which nodes 2 is connected to we can directly

08:57:37

access the index 2 within the list and this is why we number the vertices or the number the nodes

08:57:42

from 0 to n minus 1 so that we can access them directly in an adjacency list. Right so we directly

08:57:49

access the numbers to our next two and so we have 1 and 3 here. So that's what makes it convenient

08:57:55

and one important thing to notice here is that edges each edge goes twice. So the edge 0 1

08:58:01

shows up in the list for 0 you can see here in the list for 0 we have 1 and similarly in the

08:58:08

list for 1 we have 0. So each edge shows up in 2 adjacency list of each of the nodes that it connects.

08:58:20

So now the obvious next question might be to create a class to represent a graph as an adjacency list

08:58:26

in Python. Okay this is again a question that you might get asked a step or this might be part of

08:58:31

another question that you may get asked where you're asked to perform a breadth first search or depth

08:58:36

first search or find the shortest path or the first step you'll have to do is define a class for a graph

08:58:42

to maintain the information about the graph as the adjacency list. Okay so here we're creating a class

08:58:47

graph and the first thing we'll need inside the graph is a constructor function.

08:58:53

So we need to put something inside the constructor function and we know that the first argument to

08:58:59

any graph any class method and Python is self which represents the object that will get created

08:59:05

ultimately when we create an object of the class but apart from this what information do you need to

08:59:12

create a graph? Now it's pretty straightforward we can simply work with this information because

08:59:19

these two variables together specify the graph completely. So let's simply accept

08:59:25

norm nodes and a list of edges as the information. The first thing we can do is simply store

08:59:31

norm nodes in self dot norm nodes so that once we create a graph we can access the number of nodes

08:59:36

very easily then we need to create the adjacency list. So we need to create the adjacency list

08:59:42

we'll call it self dot data and initially we will create a list containing empty lists because

08:59:51

and then we will fill out the empty list step by step. So what we need is something like this in this

08:59:57

case because there are five because there are five nodes so this is what we need to create

09:00:05

the five empty lists. Now in general the way to create repeated elements is this you can say

09:00:17

if you want to create a repeated element like this zero times a you type zero times 10 and that

09:00:23

gives you this list zero zero or containing all zeroes. On the other hand if you create empty list

09:00:29

times 10 let's call this L1 let's see what L1 is. It looks like you've got in an empty list

09:00:36

or you've got in a list containing 10 empty lists but let's just go into the first element.

09:00:41

So the first element is this first empty list and inside the first element let us add the value one.

09:00:49

Okay and then let's look at the let's look at the list L1 once again and you see what happens.

09:00:55

This one gets inserted into all of these lists. Now what's the problem here?

09:01:00

Now the problem here is that when we do this when we create a list containing an empty list or

09:01:06

containing any object then the same object gets replicated 10 times but python does not create

09:01:12

copies. Now when you're working with numbers it's fine because when you're working with let's say

09:01:17

the number zero that's fine because there's no internal structure inside zero right so there's nothing

09:01:25

you can change inside the zero it's a fixed value fixed immutable value. So what so you can you

09:01:31

can't really say L1 of zero and change its value internally what do you induce you can set L1 of zero

09:01:37

to another value let's say you can set L1 of zero to one. So instead of getting all zeros you get

09:01:44

but you cannot take the zero and change something inside it. On the other hand when you have an

09:01:50

empty list here so this is the same list that is showing up in 10 different showing up 10 different

09:01:56

times each of the elements in the list. Outer list is simply a pointer to this same empty list.

09:02:02

So what we can do is since we can go inside this empty list and append something to it so since

09:02:07

this is the same object that we are seeing over and over. The one gets appended to the first list

09:02:12

and because the rest of them are the same object we get back all once inside here okay so this

09:02:18

is the reason we're spending time here is because there's a common bug that you may unintentionally

09:02:25

execute whenever you want to create an list of empty lists do not use this method. So what's

09:02:32

a method you should use then. So here's one method you can use let's say you want to create a list

09:02:38

of empty list of size 10. So you may be familiar with this object this this object called range

09:02:47

this function called range what this does is if you view it as a list you can see that it contains

09:02:53

all the elements from zero to nine okay now if you view the range itself it simply shows you zero to 10

09:02:58

but when you convert it into a list you can see that internally it contains the values zero to nine okay

09:03:04

so you can take this range and you can do something like this put this range or put anything which

09:03:10

is iterable inside these brackets the list brackets and then say for x in range and simply put x

09:03:19

so what did that do that did practically nothing we simply took x from the range of of zero to 10

09:03:26

and returned x itself so we created a new list like this but suppose we multiplied it by two here

09:03:32

x by two so for each element in the range we are multiplying it by two and so we get back a new

09:03:37

list which is zero to four six eight so this is each element is the double of the elements that we

09:03:43

have in the range now what we need is we need just empty lists right so we can simply put an

09:03:50

empty list here and we can ignore this value x that we get here so now we get back a whole bunch of

09:03:55

empty lists so let's call this L2 and what we're now doing is for each element in the range

09:04:00

we are creating a new empty list so this is important so now when you do L2 0 dot append 1 and then

09:04:10

check L2 you can see that one was only inserted inside the first list okay so keep keep out

09:04:19

watch out for this this is something that you will probably go wrong with at some point I've gone wrong many

09:04:26

times and one last change we can make here is whenever you're not using a variable in Python it's always

09:04:32

a good idea to just call it underscore you can still call it x but you're sometimes somebody reading

09:04:38

your code may not understand why you have declared a variable and not used it and assume that

09:04:43

maybe you've made a mistake so just to make things very clear it's always a good idea to make

09:04:47

something underscore it's also a variable name a valid name and mark something is underscore if

09:04:52

it is not being used okay so with that whole discussion about lists we now know how to create a

09:05:00

list of empty lists so here you have a list of empty lists or underscore in range number nodes so

09:05:09

now we have created a list of empty lists then for each edge in edges we need to do something so

09:05:18

we need to insert it into the right lists okay now what is for edge in edges look like so let's see

09:05:27

for edge in edges print edge okay each edge is a pair we already know that and when you have pairs

09:05:38

or tuples here you can get them get the values out so let's say let's call them n1 and n2

09:05:45

node 1 and node 2 you can get the values n1 and n2 out like this so now we can say print n1 and print n2

09:05:59

you can see that we are able to get values n1 and n2 out directly within the following so let's

09:06:04

call this n1 and n2 and now this is a much more pithonic way of writing code so one of the things that

09:06:10

we are also learning is how to write code which is more pithonic or which is idiomatic in python

09:06:15

and this is again something that will impress people when you use it in an interview or a coding

09:06:20

challenge so for n1 and n2 and edges what we need to do is first we get self-to-of-date of n1 so this

09:06:30

gives us the adjacency list for n1 the first node and here we append the value n2 and similarly we do

09:06:39

the same for n2 and we append n1 to it and that's it now we've set up the graph let's create a graph g1

09:06:49

let's call this graph 1 maybe and we simply invoke the graph function and then we give it a number of

09:06:56

nodes and the edges right so remember self will be passed in by python automatically as the object that

09:07:03

is getting created so the graph 1 object essentially so now the number of nodes is 5 and we have a

09:07:09

list of edges and let's see what graph 1 dot data looks like so there you go you can see that 0 is

09:07:18

connected to 1 and 4 and 1 is connected to 0 2 3 4 and so on now while this is okay it would be nicer

09:07:26

to print it like this so maybe let's see if we can print it like this and the way to do that

09:07:35

is to define a wrapper function so we define a function called underscore underscore rpr and it contains

09:07:43

it simply takes self as the input and what we are going to do is we are going to go over

09:07:55

we are going to call enumerate on self dot data what does that give us let's just check what enumerate

09:08:03

on self dot data give us gives us well maybe before we do that let's see what enumerate on a list gives us

09:08:14

enumerate on a list gives us this object but let's just get the value out of it in a for loop

09:08:20

because you can use an enumerate in a for loop and just print x so what enumerate gives us is it gives us

09:08:28

the values from the list but apart from those values it also gives us indices okay so you can get

09:08:33

an index i and a value v out of enumerate so then you can see that you can print both i and v here

09:08:43

and you will get back the same output okay so what we can say is we can do enumerate self dot data

09:08:49

now because self dot data contains these elements so what we will get back is we will get back pairs

09:08:58

let's see here

09:09:07

we will get back pairs 0 comma 1 4 1 comma 0 2 3 4 2 comma 2 1 3 now this is starting to look a lot like

09:09:13

what we want okay so we'll just take enumerate self data and these will take these pairs so the pairs

09:09:20

will be a node so node n and it's neighbors so we have the node n so the node n will first be 0

09:09:31

and it's neighbors will be 1 and 4 node n will be 1 and it's neighbor will be 1 and 2 and so on

09:09:37

and then so for n comma neighbors in enumerate what we'll do is we simply create a simple string

09:09:47

and here we're using string formatting we simply creating this string

09:09:56

where we put this here we place a placeholder where we put n and then here we put a placeholder

09:10:03

where we put neighbors again let's just see what that looks like and this is the best thing about

09:10:08

Jupiter while you're writing code you can test your code right then in there simply by creating

09:10:12

putting data into a new into a new cell so let's see graph 1 dot data you can see here that now

09:10:19

we have now we've converted that enumerated list into a list of strings so we have a string here

09:10:25

this is the string 0 pointing to 1 comma 4 this is the string 1 point this is 1 pointing to 0

09:10:32

to 3 4 and so on but this is still a list of strings what we need to return from the

09:10:36

rapper function is a single string so the way to join them together whenever you have a list of

09:10:42

strings and you want to join them together all you need to do is you say what you want to join them

09:10:47

with so we want to join them with a new line and then call the join function on that string

09:10:55

and return that right so that is our rapper function and we'll see it's uses in just a moment

09:11:00

as similarly we have another function called STR now rapper is used when we simply type graph 1

09:11:07

so when we type graph 1 this is the output of the default rapper function

09:11:12

now this will get replaced by the rapper function that we are defining but when we do STR of

09:11:17

graph 1 or when we do print of graph 1 or when we insert graph 1 into a string that is when the

09:11:21

STR function is used now we will simply use a rapper representation so let's just put

09:11:28

self dot underscore underscore rpr and that's it okay so let's see now let's put

09:11:36

let's type graph 1 here and you can see that now we have this representation printed using this

09:11:42

rapper function that we've defined so we have 0 1 4 1 0 2 3 4 2 connected to 1 3

09:11:49

connected to 1 2 4 and 4 connected to 0 1 and 3 okay so now we have a graph data structure

09:12:02

that we've implemented using a class so the adjacency list and we have a nice way to print it out

09:12:07

and this is just good programming practice now you don't have to do this in a coding competition

09:12:12

or you don't have to do this it's good if you do it in an interview if let's say you're able

09:12:16

to type this out quickly but when you are working or when you are working on your own problems

09:12:21

or on your own code or on a project always make sure that any classes you define have a good

09:12:27

string representation so that when you type the name of a variable you understand what it

09:12:32

represents and you don't have to spend time thinking about it make it clear to yourself okay so that's

09:12:37

the adjacency list and we'll see how that is useful in just a few moments

09:12:42

but here are a couple of questions for you try writing a function to add an edge to a graph

09:12:50

that is represented as an adjacency list okay so here we've specified all the edges right in the

09:12:55

beginning but can you write a function add edge which takes a couple of nodes and it inserts an edge

09:13:03

between those two nodes and here's a hint this code might be useful so do try that out

09:13:09

now here's another one can you write a function to remove an edge from a graph

09:13:13

which is represented as an adjacency list here you may have to use the list remove functions

09:13:18

to remove a particular element from a list but these are two good exercises to

09:13:22

complete here okay now before we continue let's just save our work and we know that this

09:13:30

notebook is running on binder which is a free service so we'll just save our work by running

09:13:35

joven.com it and what that will do is that will capture a snapshot of this notebook all the

09:13:40

changes that you've made and put this on your joven profile now this will go on your joven profile

09:13:46

from where you can continue running it continue executing it from where you have left off okay

09:13:54

now another common representation for graphs is called the adjacency matrix which is slightly

09:14:00

different from adjacency lists in this case for example the same graph here is represented using

09:14:05

this matrix so what we do is we create a matrix of size n by n if n is the number of if n is the

09:14:12

number of nodes in the graph and then for each node for instance since we have zero and since we have

09:14:20

a edge between one and two so if you take the first row row number one and column number two

09:14:28

you put a one there otherwise if there's no edge for example there's no edge between zero and two

09:14:35

you take the zero through and column number two the you put a zero there okay so you put a one

09:14:41

wherever there is an edge between the two nodes and you put a zero wherever there isn't you can see

09:14:45

that there is this reflexive property here because one two is one and two one is also one because

09:14:52

these are undirected edges now of course if this is a directed graph this would be different

09:14:59

so an exercise for you once again is to represent a graph as an adjacency matrix in python

09:15:05

shouldn't be too hard all you have to do is instead of so an adjacency list we initialized a

09:15:10

list of empty lists here you may want to initialize a list of zeros a list containing lists

09:15:18

of zeros okay and then you may simply just want to fill in the zero once in the right places

09:15:24

now adjacency matrices have their own benefits sometimes they are more useful for example

09:15:31

when you want to immediately check if there is an edge between two vertices or two nodes you can

09:15:37

quickly look up look it up in the adjacency matrix but in the adjacency list you will have to

09:15:41

get the list for one of them and then search through that list which is fine for most cases but

09:15:47

in some cases you may just want an adjacency matrix as well so that's one other way you can

09:15:53

represent a graph and that's an exercise for you okay now we know we've represented graphs

09:16:01

and now we can start looking at some graph algorithms and probably the most common graph

09:16:07

algorithm something that you will ultimately get asked in one interview or the other if you're

09:16:12

interviewing with a bunch of companies is breadth first search and breadth first search well this is what

09:16:20

suppose you have this this is a real world graph that we're looking at so these are cities in

09:16:26

Germany and you can see that there are roads between these cities and we have lengths of each road

09:16:32

now we can ignore the lengths for now what's important is that these cities are connected to each other

09:16:38

but not all cities are connected to all of them all of the others so starting from Frankfurt

09:16:45

you may want to find out which are the cities that are that you can reach from Frankfurt

09:16:52

without stopping so which are the cities that are one edge away from Frankfurt and if you look at

09:16:58

it this way it turns out that manheim castle and wasberg are the three cities that are one edge

09:17:05

away from Frankfurt it's so if you start drawing the street of sorts so you will find that manheim

09:17:12

wasberg and castle are one edge away okay then you might ask which are the cities which are two

09:17:18

edges away from Frankfurt so now the cities that are manheim is connected to calls through and wasberg

09:17:28

is connected to these two cities and then castle is connected to this city okay so here you have

09:17:33

these other cities then you might ask which are the cities that are three steps away from the

09:17:38

from Frankfurt and that would be the remaining two cities Augsburg instead start good okay

09:17:46

now let you think about this but what you will find in this way as you go step by step by step

09:17:52

like first you're finding all the cities that are one step away so all the nodes that are one step

09:17:57

away from a source node then you're finding all the nodes that are two steps away from a source

09:18:02

what this will give you is ultimately you will end up for each node you will find out

09:18:10

how far away it is from the source and that will be the length of the shortest path between the two

09:18:15

okay and you can verify that I'll let you think about it for instance if you see you can go to

09:18:22

castle by going this way from wasberg to no knownberg to mention to castle

09:18:27

but that would not be the shortest path but binary search the this is called bread first search

09:18:33

bread first search will always discover the shortest path because we first finding all the nodes at

09:18:37

distance one and then we're finding all the nodes at distance two and then we're finding all the

09:18:41

nodes at distance three and if a node at distance three has a shorter path then it would have been

09:18:47

already found when we're finding nodes of length one or two or distance one or two okay

09:18:53

so that's bread first search so here's one problem that you might

09:18:59

face in an interview implement bread first search given a source node in a graph using python

09:19:07

and here is some pseudocode this is so it's always a good idea to write or explain your approach

09:19:13

in plain English before you implement it so that you do not make mistakes while coding

09:19:19

so here for is the pseudocode so if you have to write a function bfs which takes a graph

09:19:25

and a root or a source node so first we say create a queue so we create and this is taken by

09:19:31

from wikipeedia so first we create a queue and what's a queue well a queue is a very simple data

09:19:39

structure a queue is simply a list and it follows a first in first out policy so when you have

09:19:48

a list and you want to add something into a queue it's also called nq the nq operation so when you

09:19:53

want to add something into a queue you add it at the end okay so you have a list and then you simply

09:19:57

keep adding things at the end you just append things at the end of a list but when you want to

09:20:02

access something from a queue you do not access any value directly no you always access the first

09:20:10

available value okay you access the first available value in this case what what is called the

09:20:15

value in front and when you access a value it gets removed okay so in this way you can see that

09:20:22

it implements the first in first out policy like if first we nq 1 and then we nq 3 and then we

09:20:29

own q 4 and then we want to dq and when we want to dq we simply get the first value that was inserted

09:20:35

which is 1 then maybe we nq a few more numbers 5 to 7 then we dq and then we get back the first

09:20:43

that we had inserted which is not yet dq so then we get back or whatever was the second value

09:20:49

inserted initially right so that's a queue and we let's see how a queue is useful so we create a queue

09:20:56

and then we mark the label we mark root we label the root node as discovered okay so we need

09:21:06

to somehow track which nodes have been discovered or visited and first what we do is we will mark

09:21:14

the root node so let's say we starting from the node 3 we will mark 3 as discovered so 3 is now

09:21:19

discovered and as soon as we mark something is discovered we will nq it okay then while the queue is not

09:21:27

empty which is while we have not accessed all the elements in the queue or while we have not

09:21:32

dqed all the elements from the queue we dq and element so we dq the first element which has not yet

09:21:38

been removed from the queue and if we are looking for a particular goal node then we can simply end

09:21:46

there like we found that node but we are not looking for a goal node so let's remove this code

09:21:55

yeah so we get we get the first element or the first node from the queue which is not yet

09:22:01

and then so for example initially we just have 3 in the queue so then we get back 3 we get 3

09:22:07

back from the queue then we check all the edges for 3 so we check that 3 is connected to 1

09:22:12

and 3 is connected to 2 and 3 is connected to 4 so we see all the edges for 3

09:22:18

and if the other end of the edge we check for each node let's say the other end of this

09:22:26

we check if 2 is not yet discovered or not yet visited then we nqed 2 into the queue

09:22:34

similarly we check for 1 and if 1 is not yet already discovered we nq1 into the list

09:22:39

similarly for 4 we nq4 into the list okay so we have dq3 so 3 is no longer in the queue

09:22:47

or we've moved forward we no longer going to get q3 out of the queue but now we when q2 1 and 4

09:22:55

and 2 1 and 4 we now understand they are at distance 1 so when we pick the next element of the queue

09:23:00

we dq the next element the first in first and first out we get back 2 and then

09:23:06

we mark 2 as visited great now we visited 2

09:23:14

oh no we we mark as soon as we are adding something to a queue we also mark them as visited

09:23:18

because we've identified that 2 1 and 4 are all at distance 1 from 3 and we've added them to a

09:23:25

queue so we mark them as visited now when we get 2 out of the queue in the next iteration

09:23:31

we check if there are any nodes which 2 is connected to those are not yet visited so 2 is connected

09:23:38

to 1 but 1 is already visited so there's no need to nq it again and then 2 is connected to 3

09:23:44

but 3 is already visited so there's no need to nq it again and so we just move forward

09:23:50

then we go to 1 and when we go to 1 we realize that 0 is not yet visited so we nq0

09:23:59

4 is visited so we don't nq4 okay and that's how we proceed so now what you should do is you should

09:24:05

draw this on a piece of paper and work it out just write on a piece of paper what would be the

09:24:11

first element that gets inserted and what will be the elements that we will insert into a queue

09:24:16

etc etc but this is the algorithm here exactly what we what we just discovered so we deq in

09:24:25

a vertex or all the edges that start from the vertex we or the node we if the other end of the edge

09:24:34

is not labeled as discovered then mark it as discovered and nq it into the queue okay let's implement

09:24:41

this let's see if we can implement this life so we are implementing bfs where we will get a graph

09:24:47

and a source node the first thing we need to set up is a queue so the queue is empty then we set up

09:24:53

discovered and discovered will be false initially and it will have the length so we want to market

09:25:04

false for all the elements okay and remember now we can use this notation here because false is an

09:25:09

immutable value so it doesn't matter so we don't really need to use the range or the list comprehension

09:25:14

notation here then here let's come here so we mark the label root as discovered

09:25:26

so discovered of source let's just call it root so that we don't get confused with the terminology

09:25:35

so we mark discovered of root as root great then we insert or we nq the root so we type q dot append

09:25:47

now nq simply means adding something to the end and you know how to do that in a list you simply

09:25:51

call q dot append so q dot append root great now python list by default do not support a dq operation

09:26:03

so what we will do is we will set up an index which will track the first available element in the

09:26:11

queue okay so whenever we decrease an element we will increase the index so that we move forward

09:26:18

so here we have the index idx equal to zero so now while there are elements in the queue which means

09:26:24

while the next available index is less than the length of the queue

09:26:28

first we will get the current we will decrease so dqing simply means getting the first in

09:26:38

element the element that was most recently inserted and has not been deqed so we get current

09:26:45

is q of idx and then we can also increase idx so as soon as we decrease something we update the index

09:26:53

so you can imagine that the index starts out here and when we deq this or delete this then we get

09:26:58

that value out and then we update the index to the next position okay so now we have the current

09:27:04

so there's a dq operation then what do we have next now we want to check all the edges

09:27:14

of current right so we are going to say for so remember we have the adjacency list representation

09:27:21

so we will get for node in self dot data current so self dot data current contains a list of

09:27:30

all the nodes that are connected with the current node so for node in self dot data current

09:27:35

if not discovered node so if you have not yet discovered the node then we first marketed this as

09:27:47

discovered and then we added to the queue so we do queue dot append node okay so what you end up with

09:28:01

this way is first you have the source that got added to queue and then we inserted all the nodes

09:28:10

which were at a distance one from source and then we insert then if you follow the trajectory will

09:28:15

see that we will insert all the nodes that are at a distance two from queue and so on right so

09:28:20

ultimately when we end up with the entire process we will have the queue and the queue will contain

09:28:27

the list of nodes as they would be visited in a binary in a breadth first search okay so we can

09:28:34

simply return the queue here so let's try it out so we have graph one and let's call bfs

09:28:45

and graph one is this graph so let's grab this image as well

09:28:50

let's simply copy the code for the image and come down here

09:28:56

let's come down here and put the image here okay let's call bfs on graph one starting at the node three

09:29:08

okay of course this should be called graph dot data

09:29:11

so because graph is the graph that we're working with so we need to check graph dot data here

09:29:20

okay so we start out with the node three and you can see that three first causes one two and four

09:29:26

to get inserted and then one causes two to get inserted okay now that's bfs view it's pretty much

09:29:33

done at this point but what would also be helpful is maybe to keep track of what is the distance

09:29:39

of each node right so we can also track we can also keep track of a distance so let's say we

09:29:45

have a distance which we initially set to none or yeah which we initially set to none

09:29:56

and we will track a distance for for formation for each node so we have the distance here

09:30:00

and initially we are going to set the distance for the root to zero of course because the

09:30:10

root is at zero distance from itself and the distance here means the number of edges right

09:30:16

then when something is discovered so when we are discovering a node and that node was not already

09:30:22

previously discovered that means that the distance for that node is one more than the distance for

09:30:31

the current node which caused it to be discovered right so the distance for so for example if you're

09:30:38

starting with three the distance for one is one more than three which caused one to be discovered

09:30:43

and the distance for zero is going to be one more than one which caused zero to be discovered

09:30:49

so that's the distance great we've now also track the distance one other thing that would be nice to

09:30:55

have is what is called the parent if you see if you go back here you can see that it would be nice to

09:31:03

know what led to calls through being discovered was it manheim wasberg or castle so that we can

09:31:09

work our way backwards and find a path from Frankfurt to calls room okay so for that what we can

09:31:17

do is we can keep track of a dictionary of a list called parent once again we will have no

09:31:24

parents by default so parent none and whenever we find a node and that node was not already discovered

09:31:35

then we can set the parent of that node to the current node which caused it to be discovered okay

09:31:41

and now we can return from the queue the distance and the parent let's see if that works okay so it

09:31:50

seems like now we have these are this is the this is the order in which the nodes are being visited

09:31:56

you can see that three is the first node to be visited and three has and if you want to check the

09:32:08

distance of three you can see that the distance of three is zero so this is distance is given

09:32:15

in the order of the nodes in the order of the original numbering of the nodes so you can see that

09:32:19

three is at a distance zero from itself obviously then you have one two and four now if you want to

09:32:24

check the distance of one just check the index number one here so one is at a distance one if you

09:32:30

want to check the distance of two now that is at a distance of one as well you can check here and then

09:32:36

you want to check the distance of four four is also at a distance of one right so all of these

09:32:40

one two and four are at a distance of one from the root node three and also you can see here the

09:32:45

the parent of one remember these are zero one two three four these are the indices of the nodes

09:32:50

so the parent of one is three and the parent of two is three as well and the parent of four is three

09:32:56

three itself does not have a parent that's why this is none and finally the last node we visit is zero

09:33:01

and it is at a distance two you can see it is the distance here is indeed the highest and the parent

09:33:09

for zero is one right so because one was the first node that caused zero to be visited that could have

09:33:16

been four two but in this case just how we implemented it one was the first node which caused it to

09:33:20

be visited so one is the parent of zero so if you now want to find the path from three to zero

09:33:26

you can look at the parent of zero that would be one and then you can look at the parent of one

09:33:32

that would be three and we're done so we can work backwards from the target we can keep checking the

09:33:37

parent of the parent of the target and that will give us the entire path so now we have the path

09:33:43

we have the distance and we have the order in which these nodes will be visited so you may get us

09:33:49

bread for search in all these different variations but roughly this is what the code looks like

09:33:54

and you can see here that the code is not too long now we have created all these additional

09:33:59

additional lists but you don't really need them so the code is about 15 lines of code 10 to 15

09:34:05

12 to 15 lines of code not more than that so that's BFS again if you're working on a BFS problem

09:34:12

it always helps to first state it in simple words and work it out with an example and then start

09:34:18

coding so that you do not make mistakes while coding now one question that you can work on is to

09:34:31

check if all the nodes in a graph are connected this may not always be the keys for example here

09:34:36

you can see that all the nodes in the graph are connected but sometimes you may have a situation where

09:34:40

some nodes are not connected for instance if these edges one one two and three to one present

09:34:46

then two would not be connected to zero and maybe two is connected to five and six etc so here is one

09:34:51

graph where not all the nodes are connected to each other you can see that there are nine nodes but

09:34:57

there are only eight edges and if you look carefully you will see that zero one two three zero one

09:35:04

two three are connected but there is no connection from these nodes to four so four five six are

09:35:10

then connected separately and then seven eight are connected to each other but not to one another right

09:35:15

so can you use breadth first search to determine if all the nodes in a graph are connected

09:35:21

I would reckon yes look at this queue now this queue gives you all the nodes

09:35:29

that starting from the source node are connected to the source node by zero one two three or so many steps

09:35:36

if something is not connected it will not show up in the queue so you can simply check the length

09:35:40

of the queue and see if that is less than the total number of nodes and then use that to

09:35:47

determine if all the nodes are connected or not now another related question that you may get

09:35:52

asked is to find the number of connected components in the graph now what's a connected component

09:35:58

if you take a set of nodes that's connected that's one component and if you remove that then you

09:36:03

look at the next set of nodes that's connected that's two components if you remove that then you

09:36:07

take the next set of nodes that connected that are connected that and that gives you the

09:36:14

third connected component and so on so in this case for example you have this is one connected

09:36:19

component you can check by drawing the graph and then this would be one connected component

09:36:24

and then these would form one connected component so zero one two three would be one connected

09:36:29

component four five six would be another and seven eight would be another can you find the number

09:36:34

of connected components or even can you list all the connected components of a graph using

09:36:38

BFS yes you can again a very simple way to do it is just pick the first node perform BFS from

09:36:46

the first node that gives you the connected component that contains the first node then find

09:36:51

the first index which is the first node which is not yet visited start BFS from that node

09:36:56

now that will give you the connected component for the second node and then find and then keep

09:37:01

doing keep repeating this till all the nodes have been visited okay that's another question that

09:37:07

you might get find the number of connected components or find a list all the connected components

09:37:12

in a graph so BFS is a very versatile algorithm that can be applied to solve pretty much

09:37:19

most graph problems that you may get asked in an interview so do do work on a few BFS problems

09:37:26

and get some practice with it okay now another way to work through a graph to look through a

09:37:33

graph is what is called DFS and this is the way in which you would normally explore a means

09:37:38

when you start out in one direction and then keep going so for example we started out here and

09:37:43

then we kept going till we hit and end right so you can see here that we kept going until we hit

09:37:48

and end and then we turn back and then we try to next path and then we turn back and try the next

09:37:52

path and so on so we go like this then we turn back we try 5 go like this turn back we try 8 then we

09:37:57

turn back try 9 10 okay that's another way to go about it and it's some cases in some cases BFS

09:38:06

makes more sense in some cases DFS makes more sense and you can in most cases both of them work just

09:38:12

fine for most problems so you can implement either one when you are faced with a graph problem

09:38:19

so let's implement DFS or depth first search okay now here is a depth first search it's

09:38:25

pretty straightforward you have you pick a node and then you pursue the the node and then you

09:38:29

next node then the next node and so on among the edges you pick one node and then once once you've

09:38:37

exhausted the path along one edge you come back and try the next edge and then you come back and

09:38:40

try the next edge so there are two ways to write it there is a way to write it recursively

09:38:45

and then there is a way to write it without recursion and I leave it as an exercise for you to

09:38:50

write it recursively but what we do is we will write it without recursion and you write without recursion

09:38:56

we will use something called a stack we will use a stack and a stack is another data structure

09:39:03

very simple list like data structure but it's just like a queue but it's different instead of

09:39:10

being first and first out which is what we do in a queue in a stack we perform last in first out

09:39:18

so here's how it works you start with an empty stack so you can think of it like this container

09:39:23

our cookie jar and you start putting in things into that jar you put in one and you put in two

09:39:28

and you put in three so now when you have to remove an element from the stack or you want to access

09:39:33

an element from the stack the only element that you can access is the element that was inserted

09:39:38

most recently so last in first out that's a stack how is that going to be useful it's pretty

09:39:47

straight forward if you think about it because this node when you start from this as the source

09:39:53

you will add all these three into the let's say you add these three into the stack now if you add

09:39:58

these three let's add them in this order so you start with this node then you add this this and

09:40:02

this so you add these three into the stack then the last in a value was two okay so then what you do

09:40:09

is you extract two out and then you insert everything that two is connected to into the stack so you insert

09:40:15

three into the stack and then you the last in value was three so you insert you take out three

09:40:20

and then you insert four into the stack then the last in value was four then you take out four

09:40:25

and you have nothing left to insert so now this entire path has been exhausted so then you end up with

09:40:31

five now when you end up with five you can insert its neighbors eight and six into the stack

09:40:37

and once six gets inserted into the stack then you take out six and you put seven into the stack

09:40:43

and so on right so you can see how depth first search is working using a stack and roughly this is

09:40:49

what the procedure the process looks like you start a stack it's empty push push the current

09:40:58

source let's say the root node what which you are starting with push the root into the stack

09:41:03

now while the stack is not empty pop the stack so get the last in value from the stack

09:41:09

and that gets removed as soon as we call pop then if that node is not already discovered

09:41:18

then we mark it as discovered and then for all the edges from v to w so for all of its neighbors

09:41:26

we simply push them into the stack right so that's it that's all we're doing all of its

09:41:31

neighbors which are not already visited we can simply push them into the stack okay so let's do that

09:41:38

let's implement dfs and once again we will keep this picture in mind so let me just grab this picture

09:41:46

here as well this is one of the nice things about Jupiter that you can take these images

09:41:57

and simply include them within your Jupiter notebook while coding so that you don't make any

09:42:02

mistakes so let's say we're writing defined dfs and once again let's assume that we are going to start from three

09:42:09

and this picture is graph one so let's say we are starting from three so defined dfs graph

09:42:19

and we have a root node that we want to start with and the first thing we want to do is you

09:42:24

want to create a stack and you can use a list as a stack adding you can simply add things to the end

09:42:32

and then pop them from the end so we create the stack and then we find discovered we marked

09:42:38

discovered as falls or every node, then we say stack dot insert so stack dot append so we simply add

09:42:53

the number three to the end or the root number to the end so stack dot append root and then

09:42:59

we don't mark it as discovered yet now this is the interesting thing in dfs because

09:43:04

remember when you start out with three you want you don't want to mark four one and two all of

09:43:09

them as discovered you want to put them into the stack but only when they come out you want to

09:43:14

mark it as discovered because you want to discover four and then you want to discover zero before you

09:43:18

discover one so that's why we put these into the stack but we don't really mark them as discovered

09:43:24

just yet so that's why we're not marking the root as discovered then while land stack is greater than zero

09:43:35

we get the current value so the current value would be stack dot pop so interestingly

09:43:43

I think list do support a pop operations if you have a list and then you do L1 dot pop

09:43:52

you can see that the value we that you get from L1 dot pop is the value two

09:44:01

and L1 now has the value five comma six okay so you can use a dictionary or you can use a

09:44:06

python list like a stack in fact we can even try append here to see the entire process let's say we

09:44:11

are appending three and then we are popping three we get back three and five six two remains

09:44:16

so we pop the current node and then we mark it as discovered we mark it as discovered here

09:44:26

my discovered of current is true and we may also just want to store that this is the result

09:44:33

that we have so we may also just want to create a result list where every time we pop something

09:44:39

we are also going to add it to the result list so let's say result dot append current

09:44:48

and then we are finally going to return the result okay but here's the main logic so for

09:44:56

all the nodes in graph dot data current we are simply going to push those nodes into the stack

09:45:06

so we are simply going to say stack dot append node okay so what we do is we start with three

09:45:15

and we then pop three and add it to the result and then we put one two and four into the stack

09:45:23

we don't mark them as discovered yet then we pop one and then we put all of the zero two three

09:45:32

four into the stack we don't mark them as discovered yet we mark one is discovered now

09:45:38

then we pop zero because the sorry then we pop four not one because we insert one two four

09:45:43

so four is the last inserted value so then we pop four you mark it as discovered and then we

09:45:49

insert zero one and three now you can see that there is some repetition here we're also inserting

09:45:53

three once again so just to avoid that what we can do is we can say if not discovered node

09:46:03

only then add it to the stack right there's no point in adding something to a stack if it is not all

09:46:08

if it is already discovered so now with that in mind let's see we start with three and then we insert

09:46:14

one two and four great four is the last value inserted so three is discovered now four is the last

09:46:20

value inserted we pop four and then we insert zero one but we don't insert three because it's

09:46:28

so now one is the last value inserted then we pop one and then we try to insert some of these other

09:46:33

values it seems like everything is already inserted so nothing will get inserted then the only thing

09:46:39

that remains is zero so we pop zero then we pop once we have pop zero we are going to pop four

09:46:45

so the order in which we expect to see things is three or one zero two I believe let's see

09:47:01

DFS graph one starting at the node three okay so it looks like we have zero one so it looks like

09:47:12

we made a mistake because we got some repeated values here and that's because we may want to just check

09:47:19

if not discovered current we may want to just add this check and put everything inside this check

09:47:28

so that any older values that have been inserted into the stack which are already visited later

09:47:35

sometime through another value in the stack that gets ignored so we end up with three one three four

09:47:41

one two zero right so it goes like this first we go from three to four to one

09:47:49

to two and then we go from three to four to zero so that's how it goes now a challenge for you

09:47:58

is to also implement distance now in this case the distance will not really make sense because this is

09:48:03

not the shortest distance anymore so when you want to get shortest distance from one node to another

09:48:08

then you want to use BFS not DFS because if you track distance here you may end up going by DFS

09:48:14

three to four to one to two and that is going to give you a distance of four a distance of three

09:48:22

to getting to two although the shorter distance is one so maybe distance doesn't make sense here but

09:48:26

what you may want to put in is the parent you may want to track the parent for each node

09:48:31

should be simple enough to do whenever you are popping something you may just want to track it

09:48:37

parent okay that's an exercise for you another exercise that you can try is to write a function

09:48:47

to detect a cycle in a graph now when you're performing DFS let's say you are going about

09:48:52

performing DFS starting at one and you do this and then you end up here

09:48:59

back at one right because you go from one to two to two zero and when you notice that zero

09:49:05

points to one which is already visited that gives you an indication that there is a cycle in the

09:49:10

graph a cycle is simply a path which leads from a node to itself so one two zero one is a path and

09:49:15

a path is something a path is a sequence of edges so one two is a edge two zero is a edge and

09:49:21

zero one is a edge so this is a path but one two and two four is not an edge so one two four

09:49:26

is not a valid path right so cycle is simply a path that leads a node leads from a node to itself

09:49:33

so the challenge we use to write a function to detect a cycle in a graph another challenge for you is to

09:49:41

detect maybe the number of cycles in a graph okay so that's another thing that you can try out

09:49:48

but we'll move on to another problem now we'll talk about weighted graphs and get closer to that

09:49:55

example of the railway map that we looked at initially so here you have nodes so you have nodes

09:50:02

numbered from zero to eight so you have a total of nine nodes and you have edges two now these edges

09:50:11

also have weights and this could be distances for example the railway line or this could represent

09:50:19

any other information which is of value to you right so you decide what edge weights are what they mean

09:50:24

in the abstract representation we simply call them weights so this is a weighted graph and here is an

09:50:29

example of how we can convey the information about a weighted graph I can give you the number of nodes

09:50:34

and then I can give you a list of edges so the first two elements of each edge tell you which

09:50:38

nodes are connected like the nodes zero and one are connected here and then the last element of

09:50:45

the list of the third element of the list tells you if it is weighted if there is a weight associated

09:50:51

with the edge okay so you have zero one three and then you have zero three two so zero is connected to

09:50:58

three and it has a weight two and so on and you can verify that there are 10 edges here and these are

09:51:04

10 edges with the 10 weights so that's one variation that we see in graphs here is another variation

09:51:11

this is called a directed graph in this case edges have a certain direction so this corresponds

09:51:18

to the example of hyper links where we have pages web pages on the internet and one page can

09:51:22

link to the other but the other page may not necessarily link back it they mean in which case you

09:51:27

may have a by-directional edge but in most cases there you would have a single unit directional edge

09:51:32

so you have zero one one two and two three now directed graphs can be represented just the same way

09:51:39

as undirected graphs all we need to do is we need to provide some information that this is a directed

09:51:44

graph right so you can simply say a directed equals true and that will simply and once you provide

09:51:51

all these all this information that can then specify to the person who is going through this data

09:51:56

that this is a directed graph right so here's how it's exactly the same as a normal undirected

09:52:02

graph but when we create the adjacency list we can have a graph we can have a node from zero to one

09:52:10

but we should not put zero into the adjacency list for one because there's no way from

09:52:15

to there's no direct edge from one to zero there's only a direct edge from zero to one

09:52:20

so keep that in mind in similarly in the adjacency list now you will not set both the value

09:52:24

zero one and one zero to one you will only set one of them for responding to the one direction

09:52:30

unless of course there's a bi-directional edge okay and what we can do is we can even combine

09:52:35

directed graphs and weighted graphs so here's what we'll do we will define a class which can

09:52:41

represent weighted and directed graphs and Python so we'll use it to represent undirected graphs

09:52:47

directed graphs and weighted graphs all of these and we will take some information in the

09:52:53

constructor to capture this detail so let's see let's create a class graph once again we will

09:53:00

create a constructor now this has the self which is the object that gets created always the first

09:53:08

argument to any method in a class in Python then we take the number nodes then we take the edges

09:53:17

and then we take a couple more arguments we take a argument directed which has a default value

09:53:22

false and we take the argument weighted which has a default value false okay and we're going to

09:53:28

store the information self dot directed let's store self dot number nodes as number nodes

09:53:35

self dot directed as directed self dot weighted is weighted okay so now we come to the edges

09:53:50

so for edge in edges what do we do now an edge can either have two values or three values

09:53:59

if it is weighted if it is unweighted then it'll have two values if it is weighted then it'll have three

09:54:04

values so we need an if condition here if self dot weighted then include weights else

09:54:16

work without weights okay now we may want to also because we need to create an adjacency list so we

09:54:25

will create self dot data just as we have been doing so far and in self dot data we will create a

09:54:34

list of empty lists as we have done or underscore in range number edges now what we'll do

09:54:46

along with self dot data we will also create something called self dot weight and self dot weight

09:54:52

will store for each corresponding value in the adjacency list it will store the weight of the edge

09:54:58

between the two elements so far under and you'll see how it works in just a moment

09:55:04

now edges okay so we have self dot data in self dot weight and this will make it easier

09:55:11

another way you can do it is instead of storing single values you can store tuples

09:55:15

inside self dot data which will correspond to the node and which will also contain the weight right

09:55:20

so that both these are both ways to do it I'm just doing it this way but you can do it the

09:55:26

other way as well where you can store tuples directly inside self dot data which suppose it is weighted

09:55:35

then first we get the values out of the edge so node one node two and weight from the edge

09:55:43

remember the edge is a tuple if and then first we set self dot data node one

09:55:51

and append to it node two and then we also set self dot weight node one so at the exact same location

09:56:02

where we have node two at the exact same index we store the weight between the of the edge

09:56:08

between node one and node two which is weight okay so now we've stored one direction which is

09:56:14

node one to node two we may also need to store the second direction so if not directed so if if the

09:56:21

graph is not directed only then we need to store the second direction so we just say self dot data

09:56:25

node two dot append node one and then self dot data node two dot append weight okay and that's the

09:56:38

case when it is weighted if it is not weighted well the code is actually simpler so we simply get

09:56:46

node one and node two from the edge and we say self dot data node one dot append node two and then

09:57:00

if not directed so there's no weight here so we simply check if the graph is not directed

09:57:05

self dot data node two dot append node one okay so there's a bit of code here but the code is

09:57:16

again fairly straightforward it's just a couple of things that we have to take care of whether it's

09:57:20

weighted or not whether it's directed or not but now that we've done this we have a fairly generic

09:57:25

representation for a graph right so now we can take this graph and remember graph one

09:57:31

graph one had this information so similarly we can take we can create this graph we can use this

09:57:37

graph class to represent graph one but we can also use it to represent one of these which is a

09:57:41

directed graph with weights or a graph with directed edges or a graph with both a graph with both

09:57:47

weights and directed edges which we'll see in just a moment now one thing that we'll also do here

09:57:52

is create a nice representation so let's just create a representation here now I'm not going to

09:58:00

get into the code of this but roughly what we want is we while showing the graph if there is a

09:58:05

weight we also want to show the weight we'll show the weight alongside the other node so let's see

09:58:15

we create a result the result will be this at the empty string and then we'll return that result

09:58:22

then we are going to say for i comma nodes comma weights in a numerate

09:58:37

self dot data and self dot weight so now this is an exercise for you to figure out what exactly

09:58:44

this is doing and you can apply the exact same technique take create a new create a new cell

09:58:51

and put this data into a cell put the zip into a cell and then see what that represents if

09:58:56

if you're not able to if it doesn't show something then try converting it into a list or using it

09:59:00

in a far loop and then put a numerate around it and see what that represents so that you understand

09:59:05

what i nodes invades represent but i am simply going to write it here so that you see the final result

09:59:21

okay so let's take norm nodes one once again and edges one it was called norm nodes and edges

09:59:36

so this was the initial data data that we were working with let's create graph one

09:59:52

and of course we want to do this only if it is weighted so if self dot weighted

10:00:00

if it is not weighted then we have a different case where for i comma nodes in

10:00:08

a numerate self dot data result plus equals

10:00:26

okay let's see so graph one we are going to use the graph and we're going to pass

10:00:32

norm nodes edges and by default weighted and directed are both false so we don't need to specify

10:00:37

them and let's see graph one this should be norm nodes so you can see with life coding we always

10:00:50

make mistakes and it's almost always bound to happen that's where Jupiter notebooks are very

10:00:54

helpful and it's always helpful to just test your function while you're writing it okay so

10:01:00

now we've created graph one and graph one you can see is a undirected graph you can see that zero

10:01:04

points to one and one points to zero then let's look at graph two so we're going to grab this

10:01:12

data this contains let's call this norm nodes two and edges two this is a graph with weights

10:01:19

so now let's create a graph two graph and here we pass in norm nodes two edges two

10:01:35

and weighted equals true and let's see graph two okay there's a small change here

10:01:44

yeah so now you can see for graph two this was the graph we were looking at here this graph let's

10:01:53

grab this image as well yeah this is the graph that we were looking at and you can see that zero

10:02:06

is connected to one and three and so it's zero is connected to one three and eight one three and

10:02:12

there are also weights associated so zero one has the weight three zero three has the weight two

10:02:17

and zero eight has the weight four and so on if there seems to be something off here because zero

10:02:28

one only seems to be connected to zero I think we may have made a mistake somewhere in the code

10:02:42

okay so we may just have to debug this code it seems like we may have made a

10:02:48

small mistake somewhere because zero one one seems to be connected only to zero but one should

10:02:54

also be connected to seven I don't see why that did not show up here

10:03:04

this is the curse of life coding and that's why I have created a working I have some working

10:03:12

code here so I'm simply going to grab the working code right now and we'll just replace that

10:03:22

but see if you can detect the bug in the code okay we don't the version I have does not require

10:03:28

you to specify weighted so we can simply skip weighted here it detects automatically if the

10:03:33

graph is weighted still something wrong here let's just quickly verify what's going wrong

10:03:39

so we are going through the list of edges here and we are pending maybe let's just print

10:03:55

graph 2 dot graph 2 dot data it'd be the issues in the representation and not in the

10:04:04

code graph 2 dot edges ah there seems to be some issue in the weight here so we may not have inserted

10:04:16

the weights correctly I see so this should be called weight this should be called weight

10:04:33

and so should this be called weight or there was this intact error here

10:05:03

okay I think we fixed it finally let's

10:05:33

see this should be called weight so we have an edge here we have too many values to unpack

10:05:53

ah we simply pass weighted equals true finally and we need to make this a list

10:06:07

it's finally done some good hardcore live debugging but we have this finally and again you get to see

10:06:16

that when you're coding you will fail you will make issues you just need to but if you have a

10:06:20

clear idea of how you've written the code it's easier to narrow down the issues by looking at the

10:06:24

errors but let's see this graph here so we have 0 connected to 1 3 and 8 and that's you can see that

10:06:31

here 1 3 and 8 are 0 connected to 1 3 with the weights 3 2 and 4 then we have 3 connected to 0

10:06:39

2 and 4 so we have 3 connected to 0 2 and 4 and we have 6 connected to 5 and 8 you can see 6

10:06:45

connected to 5 with the value 8 so great we have now represented a graph properly and this is why

10:06:51

a representation is really useful because now we can check if our implementation is correct before

10:06:56

we go on and implement any graph algorithms we can check if our representation is correct let's try

10:07:01

one more let us also try this directed graph so we're going to grab this code and put it here let's

10:07:08

call this number node 3 edges 3 and directed 3 let me grab this graph code here as well

10:07:25

we're working with this graph and let's create graph 3 so for graph 3 we have graph and we

10:07:35

pass in node 3 we pass in edges 3 you can verify that the edges are set up correctly and we just

10:07:45

specify directed equals true so we don't really need this at this point we can just say directed

10:07:49

true and weighted by default is automatically false so we have graph 3 here you can see the 0

10:07:56

is connected to 1 and 1 is connected to 2 but not to 0 so now we haven't inserted the opposite edge

10:08:02

and then 2 is connected to 3 and 4 and then 3 is connected to 0 and 4 is connected to 2 great

10:08:08

so we've implemented we've now set up another graph and now here similarly you can check that if

10:08:14

you have a weighted directed graph the code is still going to work fine okay so that's an exercise

10:08:20

for you and at this point let us just save our notebook using jubin.com it

10:08:27

so the next question that we're going to look at is called the shortest path question

10:08:33

and this is really what we started out with let's say you have a bunch of nodes and this is

10:08:39

we have taken a directed graph here but you need not have a directed graph you can do this with

10:08:45

an undirected graph 2 and that will be an exercise for you but you do need weights here now whenever

10:08:50

you're talking about shortest paths in terms of weights that is when this algorithm makes sense

10:08:56

now if you do not have weights in the graph then the shortest path can be found simply by

10:09:00

performing breadth first search okay so whenever you're asked to find the shortest path the first

10:09:05

question you should be asking is is there a weight involved or are there no weights now if there

10:09:10

are weights involved then we simply concern with the length of the path the number of nodes in each

10:09:14

path and in that case you can simply perform a breadth first a breadth first search but if you

10:09:21

have weights whether it's directed or directed then breadth first search alone may not be enough right

10:09:27

because it may turn out that certain paths for instance you go from 0 to 3 so you say you go

10:09:34

if you go via 0 to 4 and 3 the length of the path is 2 plus 3 5 plus 4 9 but if you

10:09:40

sorry the yeah the total size the total size of the length of the path is 2 plus 3 5 plus 4 9

10:09:47

but the number of nodes is 4 0 to 3 4 on the other hand if you go via 0 1 3 in this case the number

10:09:56

of nodes is smaller so there's just one in between so 0 1 3 there's just three nodes total

10:10:00

but the length of the path is 14 which is far higher right so this could represent that you go to a far

10:10:05

of place of via a train and then take a train to something that was actually closer even though

10:10:11

there were more stops in a different route okay this is what we're going to implement now we're

10:10:18

going to implement an algorithm to identify the shortest path from a given node to a given

10:10:26

target okay so now this time we're going to focus our search between a node and a target so what is

10:10:33

the shortest path in terms of the total weight of the path not in terms of the number of nodes

10:10:39

in the path keep in mind go to the shortest path in terms of the total weight that we can find from

10:10:47

a starting node to an end node and roughly the strategy goes like this and the strategy

10:10:53

is called the die straws algorithm roughly the strategy goes like this you have the source node

10:10:58

and the source node is at a distance 0 from itself there's nothing there really but

10:11:06

the first thing that we know the first and the only thing that we know is that for one of the

10:11:13

siblings for one of the neighbors of the source node the direct edge will be the shortest path so

10:11:22

for example we have one and we have two now you have directed you have direct edges from

10:11:31

you have direct edges from 0 to 2 and you have a direct edge from 0 to 1 0 to 2 has the

10:11:40

weight 2 and 0 to 1 has the weight 4 now in this case suppose we had an edge from 2 to 1 and that edge

10:11:48

hit had the weight 1 then you could go from 0 to 2 with the weight 2 and then go from 0 to 2 to 1

10:11:55

by a weight 1 and the total weight you would incur to get to 1 would simply be 3 and that would be smaller

10:12:01

than the shorter smaller than the direct edge right so even if you're looking at direct

10:12:08

connections of the root we can't say that the direct edge is the shortest path except

10:12:14

or one of the nodes right so if we just look at the node where the edge weight is the smallest so

10:12:21

you start at the root and you look at the edge with the smallest weight then we can say for sure

10:12:29

that the shortest path from the root to the next node to the node 2 is the direct edge why

10:12:37

because this direct edge is smaller than or smaller than equal to any other direct edge so any other

10:12:44

path that comes to to indirectly will contain another another direct edge and then some other

10:12:50

edges right so it will have a length greater than or equal to this direct edge right so that's the

10:12:55

key insight here that at every point you maintain a group of visited nodes also in this case initially

10:13:00

just 2 0 is visited and then you find the first node which is at the closest distance from

10:13:07

any node within the visited group okay so for example if we start out at 0 and then we look at

10:13:14

one and we look at 2 we see the smallest edges 2 so we add 2 into our visited group because we

10:13:22

we know that this is the shortest path from 0 to 2 and at this point now we take all of the

10:13:29

siblings of all of the neighbors of 2 and update their weights now because we know that 0 to 2

10:13:35

as a direct shortest path so we can update the distance for 4 that 4 could be at a potential distance of

10:13:42

2 plus 3 5 or there could be a shorter path so we do not yet added it we will just update 4

10:13:48

and similarly if there was a edge to 1 we can update the distance of 1 and we can say that the

10:13:53

distance of 1 is either 4 which was the direct edge or it can be 2 plus 1 if there was a direct edge

10:14:00

from 1 so now we will get to know that 1 is at a distance of 3 which is smaller right in this

10:14:06

case it's not but suppose there was a direct edge from 2 to 1 of weight 1 we would get to know that

10:14:12

1 is at a distance 3 so each time you add a new node as you mark the node as visited you

10:14:18

update the weights of update the distances of all its neighbors and then you simply find the next

10:14:25

node with the smallest distance right so you will find that the next node with the smallest distance

10:14:29

in this case is 4 and then you update the neighbors of 4 there is only one neighbor the next

10:14:34

node with the smallest distance is 3 you update the weights of 3 and so on so that was shortest path

10:14:40

in a directed graph but here let's see a shortest path in an undirected graph where we have

10:14:44

more such cases let's just watch this from the beginning let's wait for the animation to start again

10:14:53

so we started 0 then we checked 2 okay we marked 2 as updated then we checked 9 then we marked 3

10:15:01

is updated then we update the distance of 14 but now we can see here that we have another path to go

10:15:07

to 2 we go to 3 that's why we track that and finally we get 2 we marked 2 as visited now we are

10:15:15

considering 3 and using 3 we are updating the weights of all the other graph all the other nodes

10:15:21

and then we are marking 3 as visited then we are using 3 to mark 6 as visited and so on right so

10:15:27

at each point you have a group of visited nodes and you have distances for all the nodes that are

10:15:33

connected with the visited nodes and then you pick the first unvisited node with the smallest distance okay

10:15:41

now let's read the algorithm you first mark all nodes as unvisited and then you create a set of

10:15:48

all the unvisited nodes and you call it the universal set so a set of all the unvisited nodes is

10:15:53

called the unvisited set assigned to every node a tentative distance value now set it to 0 for the

10:16:01

initial node because the initial node is at a distance 0 and set it to infinity for all the other nodes

10:16:08

so we now set the distance to infinity because we have not yet visited the nodes we don't know

10:16:12

their distance then you set the initial node as the current node so there's a always a current

10:16:18

node that we're looking at in this case we'll start with the initial node now for the current node

10:16:23

consider all of its unvisited neighbors and then calculate their tentative distances through the

10:16:29

current node right so you have the current node and the current node is connected to a lot of unvisited

10:16:34

nodes and if we look at each unvisited node we know the distance up to the current node

10:16:44

because the current node is visited and using that we can calculate distances for the unvisited

10:16:50

nodes now if the unvisited nodes have distances set to infinity then we know that the distance

10:16:56

from the current node distance why for going via the current node is going to be smaller

10:17:02

then the distance infinity that has been set but on the other hand if the if a distance has

10:17:08

already been set for an unvisited node through some other node then we can simply compare whether

10:17:13

it is better to go through the current node or whether it is better to retain the retain the

10:17:18

distance that was obtained by some other node and just maintain that right so in this way we simply

10:17:23

update the distances of all the unvisited nodes that are neighbors of the current node okay so for

10:17:31

if the current node is a and it is marked with a distance of 6 and then there is an edge

10:17:36

connecting it with a neighbor b and then that edge has the weight or the length 2 then the distance

10:17:42

to go to b through a from the source will be 6 plus 2 8 right so from the source to a is 6

10:17:50

a to b is 2 so the distance if you want to go to b through a will be 6 plus 2 8 on the other hand

10:17:56

if b was already previously marked with a distance right so it was not visited but it was just marked

10:18:01

with a distance greater than 8 then we know that we found a shorter path via a so we updated

10:18:07

its distance to 8 on the other hand if we have a value let's say the value of for visiting b

10:18:13

via another node b was 7 so we keep the distance as 7 right so we simply updating the distance we

10:18:19

are not yet marking these new we are not yet marking b as visited now when we are done updating

10:18:27

all the distances for the current node then we mark the current node as visited and of course

10:18:36

we remove it from the unvisited set right so we mark the current node as visited then a visited

10:18:42

node will never be checked again because once you have visited a node you have found the shortest

10:18:46

path to it and you have used it to update the distances of all its neighbors you never need to

10:18:50

visit it again so then find the first unvisited node find the first unvisited node that is marked

10:19:03

with the smallest distance right so now we have a bunch of visited nodes and then we have a bunch

10:19:07

of unvisited nodes many of those unvisited nodes have been marked with a distance so you simply

10:19:13

get the first unvisited node with the smallest distance and make it the current node and

10:19:18

the repeat the process okay so you start out with 0 you see that you can mark 2 as you can mark

10:19:24

the distances of for 1 and 2 so 1 gets the distance 4 and 2 gets the distance 2 now then you mark

10:19:31

0 as visited now you see that the node with the least the unvisited node with the least distance

10:19:38

is 2 so you get 2 and then you mark the mark the edges from 2 so you mark the distance for 4 as

10:19:46

2 plus 3 5 and suppose 2 had a H to 1 then you would mark the distance for 1 as 2 plus 1 if

10:19:53

1 was the weight of the H let's say your mark the distance for 1 is the minimum of 4 and 2 plus 1 so

10:19:58

which will be 3 so you can mark the distance for 1 as 3 and that's it and then you remove

10:20:08

2 from the unvisited set next you find the next unvisited node the which has the lowest distance so

10:20:16

if this H existed that would be 1 but if it's as if this H does not exist that would be 4 so you

10:20:22

get 4 and the new mark distances for the neighbors of 4 and so on okay so what we'll do is we will

10:20:33

create this we create this graph here which contains okay that should be a graph here that we can

10:20:45

look at yeah so we'll create this graph here which contains 0 to 6 which contains 6 nodes 0 to 5

10:20:55

this is the graph you're creating let's just put it here this graph yeah so this is a graph that

10:21:13

we'll work with and let's start writing a shortest path algorithm so depth shortest path

10:21:20

and we have a graph and that's it we have a start node so let's call it source and then we have

10:21:32

a target node that node that you want to get so we want to go from 0 to 5 and as soon as we have

10:21:38

the as soon as we mark the target node as visited we are algorithm is done right so first we

10:21:45

mark everything is unvisited by setting visited pulse times lan graph dot data so here we have

10:21:55

mark visited then we have distance so we take we take the distances infinity here's a way to create

10:22:02

infinity in python you just say float in and once again we set all distances to infinity

10:22:15

then we are going to maintain a queue so because we have this first in first out kind of

10:22:20

structures we're going to maintain a queue the first thing we'll do is we will mark

10:22:29

the distance for the source node as 0 then we can insert the source node into queue

10:22:43

so queue dot insert or queue dot append source and then we'll set our index to keep track of

10:22:51

what is the next element that we need to decue so the first element is what we need to decue so

10:22:56

while index is less than 0 and not visited target so while index is less than the length of the queue

10:23:09

and the target is not visited so what do we need to do we need to get the current element from

10:23:14

the queue so we simply get queue of i dx and then we increment increment i dx by 1 so we increment

10:23:22

i dx by 1 here then we need to take all the neighbors of queue all the neighbor or we also

10:23:32

need to finally mark it as visited so let's just put in visited current equals true here

10:23:36

but in between what we need to do is we need to update the distances of all the neighbors

10:23:49

and then we also need to find the next node with the find the first unvisited node

10:23:58

with the smallest distance okay so to update the distance of all the neighbors we have written

10:24:10

a function called update distance so we'll call this function update distance or update distances

10:24:19

where we will pass in the graph and we will pass in the current node and we will pass in the

10:24:24

distance matrix or the distance array and we pass it in this way and what update distances does

10:24:32

let's look let's look at it here and again it's always a good idea to extract out specific pieces

10:24:37

of logic into separate functions so here we're calling update distances where we have a current

10:24:42

node and then we have the graph and then we have the distance so we get the neighbors of the current

10:24:47

node using graph dot data graph dot data current will give us the neighbors of the current nodes

10:24:53

then we get the weights of of the neighbors of the edges connecting the current node towards

10:24:57

neighbors so we get the weights as well now we go through each list of neighbors so for i

10:25:03

common node in enumerate neighbors and then we check we get the weight so we now we have the node

10:25:13

and we have the weight so we have for each edge the node that it is connected to

10:25:22

and the weight of the edge and then we check the distance for the node if the distance for the node

10:25:30

let's say it hasn't already been said then it is infinity so in that case distance to the current

10:25:35

node from the source plus the weight of the edge from the current node to the next node will be less

10:25:41

than the distance so if the distance of current plus weight is less than the distance we simply

10:25:46

update the distance of the node on the other hand if the distance of the node has already been said

10:25:51

via some other node and that is less than the distance via the current node then we do not update

10:25:56

the distance okay so that's all we are doing here and we can ignore this for now we'll come back to it

10:26:02

but this is performing exactly that update distances function that we talked about

10:26:07

then next we want to find the next unvisited node so here we have a function called pick next

10:26:12

node which has a list of distances and it has visited so we want to track the minimum distance so

10:26:18

we first set a variable called minimum distance to the value infinity and then we set a variable

10:26:24

min node so this is the node with the minimum distance to the value non then we iterate over

10:26:29

the all the lists all the nodes in the that we have in the graph so from 0 to n minus

10:26:37

and we check that if the node is not visited and the distance of the node is less than the minimum

10:26:42

distance we've obtained so far then we set that node to the minimum node and we set the minimum

10:26:47

distance to that value okay so we track the minimum distance the running minimum distance by going

10:26:53

over all the nodes in the graph and we keep track of which node has the minimum which unvisited

10:27:01

node has the minimum distance so finally what pick next node gives us is the first

10:27:06

next unvisited node okay so here we can get next node is pick next node and we give it the

10:27:18

distance and we give it visited okay so now if there was a next node it's possible that there

10:27:31

is no next node because we've probably already visited a way thing that we can visit so if there

10:27:37

is a next node then we end Q it so we say Q dot append next node and that's it that's pretty much it

10:27:52

so that is our shortest path algorithm we create a visited list we create a distance list we

10:28:00

create a Q where we will add things so this this will be all the all the nodes that we have

10:28:05

visited will go through it all go through this one by one and the Q in order will give us

10:28:12

a list of all the nodes in their order of distance from the source node now what we need to return

10:28:19

here is we simply need to return distance of the target then since that was what was asked here

10:28:26

let's also mark current has visited true here soon enough so that we don't end up visiting current

10:28:38

again and again all right so let's run the shortest path algorithm then here we have a graph

10:28:47

this is the same graph that we see here now we can create a graph graph 7

10:29:03

and this is weighted and directed so we will pass in graph we will pass number nodes 7

10:29:10

we will pass edges 7 and then we will pass weighted equals true and directed equals true

10:29:19

and this is graph 7 okay this seems like it was it worked out right zero is connected to one

10:29:30

and two whether weights four and two respectively and five is connected to nothing four is connected

10:29:37

three is connected to five four is connected to three okay this looks fine so now we can say

10:29:45

shortest path in the graph from let's say from zero to five

10:29:55

in graph 7 and it says that the length of the shortest path is 20 so you have two three four

10:30:02

11 so two plus three five five plus four nine nine plus 11 20 so that seems to be right

10:30:07

what would also be nice to get is just to see what that path is and for this we can

10:30:13

introduce something called a parent so here we can simply have another thing called a parent

10:30:19

which is set to none for each element so visited we'll let's call this parent and let's

10:30:28

set it to none by default and all we need to do is whenever we are enquying a node we need

10:30:34

to track why it got encued right so if an if a node is getting encued then it is probably

10:30:40

getting encued so so so I not whenever we are enquying whenever we are updating the distance

10:30:44

of a node we need to track why it's distance got updated so inside update distances whenever we

10:30:51

update the distance of a node we also set the node the parent of the node to that current node

10:30:59

from which the distance got updated right and that's all we need to do when we update the distance

10:31:04

of a node we need to track why did we update this distance by which node we did we come to update

10:31:10

this distance so this way we have not tracked the parent and let's return not just the distance

10:31:15

of the target but let's also return the queue and let's return let's just return the parent for now

10:31:24

think this should be fine okay so now you have the parent for each one so if you look at

10:31:29

the fifth element 0123455 you can see that the parent of five is three so it seemed like we arrived

10:31:36

at five from three and then if you look at the parent of three so 0123 the parent of three was

10:31:41

four it seemed like we arrived at three from four and you look at the parent of two it seems like

10:31:46

we arrived at four from two then you look at the parent of two and it looks like we arrived there

10:31:53

from zero and zero was our source so the path is if simply going reverse 0 to 035 okay and that's

10:32:02

how you get the shortest path and not just the shortest path distance notice that zero itself

10:32:07

does not have a parent because that was the source now you can repeat this with another graph

10:32:13

let's say we take this other graph that we had this was graph two so let's grab this image here

10:32:25

so let's get graph two let's say shortest path graph two and let's get the shortest path

10:32:44

maybe from zero to seven so it seems like there are two paths one goes via one and one goes via

10:32:51

six out two three three two and seven so let's get the shortest path from zero to seven

10:33:00

okay so we started out with zero and we ended up at seven so zero one two three four five six

10:33:06

seven it seems like the parent for seven was one and then the parent for one was zero

10:33:13

so it's clear that it picked the path zero one seven and the total length of the path was seven

10:33:18

sounds good we can try another one we can try two and eight so there are a couple of ways to go

10:33:27

from two to eight one is to go via three so you can go to six two other three ways actually

10:33:34

but six two six you can go at three zero and eight or you can go at three four and eight

10:33:40

let's see which one it picks okay so now zero one two three four five six seven eight

10:33:47

so the parent for eight is five oh sorry zero one two three four five six seven eight

10:33:55

so the parent for eight is four so we came to eight via four and then the parent for four zero

10:33:59

one two three four the parent for four is three so we came to four via three and then the

10:34:04

parent for three zero one two three the parent for three is two so we came to three via six

10:34:10

So 2, 3, 4 is the path and the length should be 8 plus 1, 9 plus 6, 15.

10:34:16

Great.

10:34:16

It seems like we figured out the shortest path once again.

10:34:18

And this time, this was an undirected graph.

10:34:20

Okay.

10:34:22

So as long as you have weight,

10:34:22

you can apply this algorithm.

10:34:24

And this algorithm is called the dystras algorithm.

10:34:28

And that's it.

10:34:30

So that's all we're going to cover today.

10:34:32

Now, one thing that we have not looked at very closely

10:34:34

is the running time complexities.

10:34:36

Or so, let's do a quick look at that.

10:34:40

Let's do a quick look at, let's say at BFS.

10:34:42

And see, we can identify and get our guess the running time complexity

10:34:46

and the full proof is left to you as an exercise.

10:34:50

But roughly, it looks like this.

10:34:52

This is the main.

10:34:54

This is the main loop here,

10:34:56

so where we are going through the queue.

10:34:58

So the number of times this may happen is n,

10:35:00

of which is the number of n,

10:35:02

which is the number of nodes.

10:35:06

And the number of times this might happen.

10:35:08

Now, inside each node inside BFS,

10:35:10

remember that we check a full list of nodes inside each node for BFS.

10:35:14

So the number of times this may happen is equal to the number of,

10:35:18

for each node,

10:35:18

we may perform an additional number of steps equal to the number of nodes.

10:35:22

It is connected to, right?

10:35:24

So if we have n nodes,

10:35:26

so we have n while loops.

10:35:28

And then if we have a total of m edges,

10:35:30

and let's say those m edges are split across

10:35:34

if I count the number of edges for each node,

10:35:36

the number of edges is E1, E2, E3, E4, and so on.

10:35:38

And then we, so the number, the size of this loop for the node n1 is E1,

10:35:44

the size of this loop for the node n2 is E2,

10:35:46

the size of this loop for node n3 is E3.

10:35:50

So if you add up the list of all the edges, E1 plus E2 plus E3 plus E4,

10:35:54

so the total number of iterations inside this four loop

10:35:56

turns out to be,

10:35:58

the total number of iterations inside the four loop

10:36:02

will turn out to be the total,

10:36:04

the sum of all the edges in C lists,

10:36:06

and the total sum of all the edges in C lists is equal to twice the number of edges.

10:36:10

You can see here the number of edges is 1, 2, 3, 4, 5, 6, 7,

10:36:14

and you can verify that the number of elements of all the edges in C lists

10:36:18

put together is 14, because each edge is represented twice, right?

10:36:24

So we end up, if we have n, so if we have n, n vertices,

10:36:34

and m edges, we end up with n plus 2m operations, right?

10:36:42

So each of the n operations to start the y loop,

10:36:44

and then each of the 2m operations,

10:36:48

those are to iterate over each adjacency list, right?

10:36:52

And now when we are talking about complexities,

10:36:56

we can ignore the m, if m is the number of edges,

10:36:58

we can ignore the factor 2 associated with it.

10:37:00

So what we end up with is order of n plus m.

10:37:04

So order of n plus m is the complexity of breadth first search,

10:37:14

and now by this point,

10:37:16

you should be able to just work it out by looking at the code,

10:37:18

so do try it out, and if it's not clear, do ask on the forum,

10:37:22

but order of n plus m is the complexity of breadth first search,

10:37:26

and you will find a similar complexity for depth first search as well,

10:37:32

order of n plus m.

10:37:36

For the shortest path algorithm, however,

10:37:38

the complexity will be different,

10:37:42

because in the shortest path algorithm, let's see it here.

10:37:50

In the shortest path algorithm, what we do is,

10:37:52

we go over all the vertices, so that's,

10:37:56

we insert each vertex or each node into the queue once,

10:38:00

and then we take it out once, so this contributes a factor n.

10:38:02

Then when we are saying update distances,

10:38:04

then it also contributes a factor m.

10:38:08

But when we are picking the next node,

10:38:10

we may we visit all the vertices once again, right?

10:38:14

So here we are performing n operations inside,

10:38:18

and we are picking the next node.

10:38:20

So that gives us order of n square plus nm.

10:38:26

n square plus m.

10:38:30

Yeah, something like that.

10:38:32

So order of n square plus m or n plus m into n.

10:38:34

So those are some complexities that you will see reported for shortest path.

10:38:40

And a way to improve this,

10:38:42

a way to improve the picking of the next node,

10:38:44

is to use what is called a min heap,

10:38:46

so that you don't have to look through the entire list of nodes each time,

10:38:50

to pick the next node,

10:38:52

but you can simply pick the next node in a very short time.

10:38:56

So there's a data structure called a min heap that you can look at.

10:38:58

The min heap allows is used to keep track of a bunch of numbers

10:39:04

and easily track the minimum.

10:39:06

So you can keep a bunch of numbers around in a binary tree like this,

10:39:08

and the root will always be the minimum,

10:39:10

and the numbers on the left and right will always be larger than the root.

10:39:14

And then the same will be true for each sub tree as well.

10:39:18

And insertion into this heap is of order log n,

10:39:22

and deletion into this heap is of order log n as well.

10:39:26

And then the min max in this case,

10:39:28

fetching the min or the maximum value is of order 1.

10:39:30

So instead of meant to instead of looping through the entire list of nodes each time,

10:39:34

what you can do is you can simply insert nodes into this min heap,

10:39:38

and delete nodes from the min heap when they've become visited,

10:39:42

and getting the next node is as simple as fetching the minimum value.

10:39:46

Okay.

10:39:48

So check this out.

10:39:50

This is not something that will generally get us.

10:39:52

This is a more advanced concept.

10:39:54

In fact, even the diastras shortest path algorithm,

10:39:56

it's very unlikely that you will get us.

10:39:58

But do review it and do try as an exercise if you want to go further,

10:40:02

try implementing and improving the diastras algorithm using a binary heap.

10:40:08

So that will take the complexity from m plus n times n to m plus n times log n.

10:40:14

Okay.

10:40:16

And that may be better.

10:40:16

So do check that out.

10:40:18

That's obviously going to be better for larger graphs.

10:40:20

So do try to implement it.

10:40:22

In fact, inside Python,

10:40:24

there is a built in heap called the heap queue data structure,

10:40:28

and that will optimize the pick next node operation.

10:40:36

In the diastras algorithm.

10:40:40

Okay.

10:40:40

So that concludes our discussion of graphs here.

10:40:44

There's a lot more in graphs.

10:40:46

Graph theory is an entire course in itself.

10:40:48

But since this course is particularly concentrated on data structures and algorithms from the perspective of coding interviews and coding assessments,

10:40:56

this is as far as we need to go.

10:40:58

So what you should do is you should practice more graph problems related to breadth first search and depth first search.

10:41:04

That is really something that you need to become very familiar with breadth first and depth first search.

10:41:10

And shortest path may be sometimes some really hard interviews.

10:41:14

You may get asked shortest path as well.

10:41:16

So do familiarize yourself with that.

10:41:18

But apart from that, you don't really need a lot more.

10:41:20

But there are other algorithms.

10:41:22

You can look at minimum spanning trees.

10:41:24

You can look at topological sorting.

10:41:28

You can look at connected components.

10:41:30

That's another path.

10:41:32

You can look at detection of cycles.

10:41:36

And there's something called disjoint sets.

10:41:38

So there's a huge huge number of topics that we can cover in graph.

10:41:40

But we'll stop our discussion here.

10:41:42

So what do you do next?

10:41:44

Review the lecture video and execute the Jupiter notebook.

10:41:46

Complete the assignment and attempt the optional questions.

10:41:50

And finally participate in forum discussions.

10:41:52

Very important.

10:41:54

If you're stuck at any point, just go on the forum ask a question.

10:41:56

You can also share your code as long as it's not working to get help.

10:42:00

And you can also join or start a study group to learn together with friends.

10:42:06

And you can also find us on Twitter at Chovian ML and add Akash NS.

10:42:10

And the next lesson is data structures and algorithms.

10:42:14

In data structures and algorithms is Python interview tips, tricks and practical advice.

10:42:20

Thank you.

10:42:22

Hello and welcome to data structures and algorithms in Python.

10:42:26

This is an online certification course being conducted by Chovian.

10:42:30

Today we're on lesson six by the name of your tips, tricks and practical advice.

10:42:36

This is the final lesson of the course.

10:42:38

So I hope you're excited.

10:42:40

My name is Akash and I'm your instructor.

10:42:44

You can find me on add Akash NS.

10:42:50

If you've been following along with the course and you have been working on the assignments.

10:42:56

And if you complete a course project as well, then you can earn a certificate of accomplishment for the course.

10:43:02

Which you can find on your Jovian profile and also add to LinkedIn or download this PDF.

10:43:10

So let's get started.

10:43:12

First thing we'll do is go to the course website, pythondsa.com.

10:43:18

So this is the course website by thandassa.com.

10:43:20

This is where you find all the information about the course.

10:43:24

You can watch all the previous lessons lessons 1 through 5.

10:43:28

And you can also check out the previous assignments assignment 1, 2, 3.

10:43:32

And you have the course project as well.

10:43:34

Let's open up lesson 6.

10:43:40

Now on lesson 6, you will be able to find a video recording of the video you're watching right now.

10:43:48

And here is the code that we will look at today.

10:43:52

So today we will do something different.

10:43:54

We will simulate the experience of being in an interview.

10:43:58

So while we have given you a problem solving template and we recommend that you follow this template.

10:44:04

For any project or any notebook that you work on, any coding problem that you work on.

10:44:08

And here on the problem solving template, we also have a method.

10:44:12

Something that we have been applying throughout this course to different kinds of problems, different kinds of data structures and algorithms.

10:44:20

But in an interview, obviously you will not have this template.

10:44:24

So we will see how to apply this method during an interview.

10:44:28

And before we do that, let's revise a method so that we can recall it from memory when we are working on the interview problem.

10:44:36

So here is the systematic strategy that we have been applying so far for solving problems.

10:44:40

And do check out the previous lessons if you haven't seen them.

10:44:44

For examples of how to apply it in detail.

10:44:48

So the step one is to state the problem clearly in your own words and identify the input and output format.

10:44:56

And then the second step is to come up with some example inputs and outputs and try to cover all the edge cases that you can think of.

10:45:04

So you want to think of all the possible scenarios and that will help you write your code properly.

10:45:10

Then step three is to come up with a correct solution for the problem and state that solution in plain English.

10:45:16

And then step four is to implement the solution and estate using some example inputs.

10:45:22

This is important while you're practicing.

10:45:24

But initially when you come up with a correct solution, it will be a simple solution.

10:45:28

What is often called a brute force solution.

10:45:30

And in an interview setting, you may not have the time to implement it from scratch.

10:45:34

So you may skip if the brute force solution is too straightforward.

10:45:40

Then step five is to analyze the algorithms complexity and identify any inefficiencies in the algorithm.

10:45:46

So what you can do in an interview is come up with a correct solution and describe it to the interviewer and then analyze its complexity directly and start identifying inefficiencies and then move on to apply the right technique to overcome the inefficiencies.

10:46:00

So this is where you need to identify what which one of the techniques that you've learned in this course do you need to apply.

10:46:08

Is this a binary search problem? Is this a divide in conquer problem?

10:46:12

Is this related to binary search trees?

10:46:14

Is this something that you can solve in a similar way? You'd solve sorting? Is it important to look at the first case or average case complexity?

10:46:22

Is this a graph problem or is this a recursion or is this dynamic programming or a memoization problem?

10:46:32

So all of these things are something that you have to think about.

10:46:34

And as you practice more and more and more problems, so for each of the lessons if you try and practice about five to ten problems,

10:46:40

then you will start to recognize these patterns and when you're on step six, when you're trying to come up with the right technique to overcome the inefficiency, the ideas will automatically come to you.

10:46:50

So practice is very important to succeed in step six.

10:46:56

And once we have a determined how to overcome the inefficiency through the right data structure algorithm, then we state that solution, implement it, analyze the complexity.

10:47:04

So this is how your according assessment or an interview should proceed for you.

10:47:10

And let's see, let's pick up a coding problem and let's go from there.

10:47:14

So here we have a coding problem by then sub array with the given sum and we read the problem, but before that, you can see that here this notebook is fairly empty.

10:47:26

And what we're trying to do is we're trying to simulate the situation where you are on a call with somebody and they're interviewing you.

10:47:32

And typically, they would be using some platform like a collab, a date or maybe a platform where you can also run the code or a platform where.

10:47:42

The question is somewhere let's say on the right, it's already printed, it's from a pre selected database on on the right on the left and on the right, you can type your code and you can experiment with it.

10:47:52

Now, we're not using any third platform here, what we'll do is we'll simply simulate that in our Jupiter notebook.

10:48:02

Okay, so now we have this notebook running, we've clicked the run button on the Jovi notebook and here we are.

10:48:08

Now the question is, and this is a question that was asked during a coding interview for Amazon, of course, a lot of other companies may ask similar questions too.

10:48:16

You are given an array of numbers and these numbers are all non-negative.

10:48:22

You need to find a continuous sub array of the list which adds up to a given sum.

10:48:28

This is how interviewer might state the problem to you.

10:48:32

And then they may also tell you an example, sometimes they don't and if they don't, it's always a good idea to ask for example.

10:48:40

Now, you might sometimes feel that maybe if you ask too many questions, the interviewer might think that you don't know this or you're dumb in some way, but that's not true.

10:48:48

It's actually the opposite, the more questions you ask, the better the interview, the better the interviewer is able to convey what they want.

10:48:56

Now they're busy, they're doing 5 interviews a day and they have their entire day's work.

10:49:00

Sometimes they may just fail to state the question in its entirety.

10:49:04

And if you don't ask for clarifications, you may assume the wrong thing and go ahead and implement something that's completely wrong.

10:49:12

And that completely deals your interview and trust me, it happens more often than you might think.

10:49:20

Okay, so we here is one example.

10:49:24

So let's say interviewer did not provide an example.

10:49:26

You can ask them, can you please give me an example for this problem?

10:49:30

And then they come back to you and they say, suppose we have this array, 174, 2, 1, 3, 11 and 5.

10:49:36

These are all numbers and they're all non-negative.

10:49:40

Some of these could be zero as well, but suppose we have this array.

10:49:42

And I give you the number 10 that I want you to find the,

10:49:48

I want you to find a continuous array of the list which adds up to the given sum, which is 10.

10:49:54

So then they might also tell you that in this case, the solution is this array starting from position 4.

10:50:02

Starting from the number 4 and going all the way up to 3.

10:50:04

And you can check that there are no other ways to create 10, like if we took 17, that would be 8 and 174 would be 12.

10:50:10

On the other hand, 7, 4, 2 would be 12 again, but 4, 2, 1, 3 turns out to be 10.

10:50:16

And once again on the right, you will not be able to create the total of 10.

10:50:22

So this array is what you have to return.

10:50:24

Now what does it mean to return a sub-array?

10:50:26

To return a sub-array means to return the indices, which is the index of the starting term or end the index of the ending term.

10:50:34

And sometimes we know end Python when we are working with ranges, typically the end index is outside of the actual data.

10:50:42

So you could return the index of 4 and the index of 11.

10:50:44

So that we, so the index of 4 is 0, 1, 2, so 2 is the index of 4, 3, 4, 5, 6, index of 11, 6.

10:50:52

So if you return 2 and 6 and then I try to access the 2 to 6, 2 colon 6 range of the list, then you will get this list 4, 2, 1, 3.

10:51:02

And in fact, that's something that we can very quickly verify here.

10:51:06

Let's say L1, so you have 1, 7, 4, 2, 1, 3.

10:51:12

Now if I say that the start index is the start index i and the end index j are 2 and 6 respectively.

10:51:20

And you can see L1 of 2 to 6 is 4, 2, 1, 3.

10:51:26

So all the j is outside, so that doesn't get included when we put it as a range.

10:51:30

And then we put in 4, 2, 1, 3.

10:51:34

And you can also verify that the sum is 10.

10:51:38

All right, so that's the problem.

10:51:40

Now I've explained it to you in a lot more detail than an interviewer build.

10:51:44

But this is the process that you have to apply in your own mind.

10:51:48

And sometimes what you can also do is you can repeat the problem back to the interviewer.

10:51:52

That's a great idea.

10:51:54

You, they've stated the problem to you.

10:51:54

They may be given you an example.

10:51:56

Now you state the problem yourself in simple words.

10:51:58

Remember, that was step one.

10:52:00

So in the same way that I just have, you can state the problem.

10:52:04

And then you have to figure out what are the inputs and the outputs.

10:52:08

So the input, you have an array or arrays also a list in Python.

10:52:12

So let's say ARR 0, let's create, let's make this the first and example first input.

10:52:16

And that would be 1, 7, 4, 2, 1, 3.

10:52:20

And then the target, so your target sum is 10.

10:52:26

So that's the input here.

10:52:26

And then the output that we want to want is.

10:52:30

So this is the output 0, that would be 2, 6 as we've just verified.

10:52:40

So this is the input and output format.

10:52:44

Always make sense to just create some variables for that before you start coding.

10:52:48

The next step is to think of what are all the cases that a function should be able to

10:52:54

handle.

10:52:56

But actually before we do that, we should also write a function signature because we know what the input looks like.

10:53:00

We know what the output is going to look like.

10:53:02

And we know what, so we know what the function should look like.

10:53:06

So we can just say death.

10:53:08

And let's call this sub array sum.

10:53:12

And it's going to take an array.

10:53:14

It's going to take a target and there's going to be some logic inside it.

10:53:18

Okay.

10:53:20

All right.

10:53:20

So that was step one.

10:53:22

But it always helps to just write the function signature because if you've misunderstood to the problem still.

10:53:28

The interviewer can immediately correct you and tell you.

10:53:30

Hey, but you haven't taken a certain input or you were zoomed and input, which I've not provided.

10:53:34

Okay.

10:53:36

All right.

10:53:36

So now we have the function signature.

10:53:38

Now step two.

10:53:38

Remember step two was come up with an exhaustive list of

10:53:42

S cases to test the problem.

10:53:44

So you can do this in comments.

10:53:46

You can just create some comments.

10:53:46

And you can say, I'm thinking about the problem.

10:53:50

And I'm just trying to think what are all the cases we need to handle.

10:53:52

And this is a great quality.

10:53:54

This is not something people do often, but they should because this indicates that you're doing what is called

10:54:00

S-driven development, which means you are thinking about all the ways in which your

10:54:04

Code might be used and accounting for those before writing the code.

10:54:10

So kind of working backwards and it's a very useful way to avoid errors.

10:54:12

So now the first one could be a generic array.

10:54:18

Where the sub array is in the center somewhere in the center.

10:54:24

So which is what we have already seen here.

10:54:28

Now the sub array could be in the center or the sub array could be at the start.

10:54:36

Or the sub array could be at the end or it's possible that the sub array.

10:54:48

There is no such sub array.

10:54:50

So there's no sub array which adds up to 10.

10:55:00

You may also have the situation where you have a few zeros.

10:55:04

So you have a few zeros in the list, that's on option.

10:55:12

Here's one thing that can happen.

10:55:14

Then this could be that there are multiple sub arrays with the same sum.

10:55:24

Now this is where you might want to just clarify with the interviewer.

10:55:28

What happens if we get two sub arrays which add up to the same number the target.

10:55:32

And the interviewer might say find the shortest one or find the first one or find anyone.

10:55:38

But it's always good to clarify that.

10:55:42

Next one option could be that.

10:55:46

Or you could also ask them what is what happens if there is no sub array that adds up to 10.

10:55:50

And then they may tell you you can return non non or you can return minus one or whatever it is.

10:55:54

Or assume that there is always the sub array.

10:55:56

So that will help you write your code.

10:55:58

And then you can obviously you may have to work with the empty array.

10:56:02

You may also have to work with the sub array is a single element.

10:56:12

And whenever we say array, we also mean list in Python there practically speaking the same thing for our purposes.

10:56:22

Okay, we've listed quite a few test cases.

10:56:24

And in that process, we've come across a few more questions which we've clarified.

10:56:30

So now we're ready to start solving the problem.

10:56:32

Now at this point, what you may want to do is.

10:56:38

Maybe just ask for a couple of minutes and keep a pen and paper close to you.

10:56:42

So I'm going to use this tool.

10:56:46

Instead.

10:56:48

Yes, I'm going to use this tool instead.

10:56:50

So keep a pen and paper close to you so that you can work on this problem.

10:56:54

Now let's come up with the simplest possible solution.

10:56:56

It's so we have about two three minutes to come up with the solution.

10:57:00

And often the simplest solution is pretty obvious.

10:57:04

So in this case, one simple solution could be if I could simply try every sub array.

10:57:10

Then I will find at least one if that adds up to 10 if there is one.

10:57:16

So all I need to do now, each sub array is defined by a start index that is where the first element of the arrays and then and end index.

10:57:24

The end index is just next the next index the first index which is not in the array.

10:57:28

Right, so that's how we define a sub array remember.

10:57:30

So all we need to do is try all such values.

10:57:34

So all such values I come a j where I goes from zero to n minus one.

10:57:38

And where j goes from remember you could start out with the empty sub array.

10:57:42

So which means j also has the value i.

10:57:46

Here we are saying i and j both have the value two.

10:57:48

So l1 of two to two becomes the empty array.

10:57:52

So j grows from i to all the way beyond the last element.

10:57:58

Which means if the last element index is n minus one.

10:58:02

So j can go all the way up to n.

10:58:04

All right, so i goes from zero to n minus one and j goes from i to n.

10:58:08

And each time we started an i and we check each a so we check j equals zero and j equals one equals two j equals three four five and so on.

10:58:18

Then we move i again and then we start over again and then we say we start with j equals zero j equal to one j equal to two three four.

10:58:26

Okay, and and we keep doing this to be find an array and we have exhaust this way.

10:58:30

We will test all the sub arrays so the problem is solved.

10:58:34

So that's the brute force solution and what you should do first of all is explain that brute force solution.

10:58:40

It may seem that this is an obvious solution what's the point of explaining it.

10:58:44

But to mention it because at this point the interior knows nothing about you.

10:58:48

So they don't know if you can even come up with a solution to the problem right.

10:58:52

They're trying to assess can you think about problems and they're trying to assess can you write code now.

10:58:58

If you don't tell them the brute force solution then they don't even know if you figured out the brute force solution.

10:59:04

So do tell them the brute force solution.

10:59:06

And generally you do not have to code it.

10:59:10

You can do the analysis in your memory in your mind and you can sort of write the code in your mind picture the code and based on that come up with the.

10:59:20

Complexity analysis and directly say that the brute force algorithm will have such in such complexity.

10:59:26

Okay, now.

10:59:28

We will just write the code right now just to be very clear about it in case you've not but you're not yet clear on how to write the code.

10:59:34

But in an interview this is the part which you can skip in the interest of time.

10:59:40

So death.

10:59:44

Subbury.

10:59:46

I think it was called.

10:59:48

Subbury sum.

10:59:50

Subbury sum.

10:59:52

Let's call this subbury sum one.

10:59:54

The first approach that we're taking here we have array one and that's it.

10:59:58

You have array and then we have a target.

11:00:02

And we're saying remember that start i from.

11:00:06

So i goes from zero to n minus one.

11:00:10

That was the first thing.

11:00:12

So for i in range zero to n minus one and what's in?

11:00:18

Well n is simply the length of the array length of the array.

11:00:22

Then j goes from.

11:00:28

i to n.

11:00:32

Oops, so I made a small error here.

11:00:34

So it's zero to n because even in a range the last value is not taken.

11:00:38

So j goes from zero to i to n.

11:00:40

So for j in the range.

11:00:42

i to this should be n plus one then because we won't.

11:00:46

j to go all the way up to n.

11:00:50

Okay.

11:00:52

And now we simply check if.

11:00:54

The sum of array.

11:00:58

I to j and then we've seen this array.

11:01:00

I to j is going to give us all the indices starting a tie.

11:01:02

But ending just before g.

11:01:04

So if the sum of array.

11:01:06

i to j.

11:01:10

equals target.

11:01:12

Then we found the answer return i comma j.

11:01:16

It's it.

11:01:16

So check if.

11:01:18

sub array sum.

11:01:20

equals target.

11:01:26

And if not let's just return none.

11:01:28

Maybe this is what we agreed but let's return none.

11:01:30

And that's it.

11:01:34

So that's your.

11:01:36

That's your code.

11:01:38

It's about one two three four five lines of code.

11:01:40

Maybe six.

11:01:42

But that's a brute force solution.

11:01:46

If it's really short it doesn't hurt to write it because it then it's going to sit there and at least as a reference you have it.

11:01:52

But it's something you can discuss with the interviewer.

11:01:54

Should I I mean if you if you are clear about the brute force solution and you can tell it's complexity.

11:02:00

Then you don't have to write.

11:02:02

One other tip is whenever you're coding.

11:02:06

It's always helpful to simply add a small comment above.

11:02:10

So that even if the interviewer is not able to follow your code.

11:02:14

They can just follow your comments and they can tell if your general strategy is correct.

11:02:20

Once again reading code is hard and especially when.

11:02:24

You are not familiar with the coding best practices in the industry.

11:02:30

The code that you write is sometimes difficult to read.

11:02:32

So while you learn how to write good code in the meantime it always helps to just mention comment.

11:02:38

Makes it makes their job easier makes them easier makes it easier for them to evaluate you.

11:02:42

Otherwise you may spend five to ten minutes talking about.

11:02:46

Something in your code which either they misunderstood or you made a typo etc.

11:02:50

Okay.

11:02:52

So we have here the sub array sum one we've implemented the brute force solution.

11:02:58

Maybe let's also check out some cases in.

11:03:02

And see if this brute force solution works correctly.

11:03:06

So in an interview if you have the ability to run the code you can just run a few.

11:03:10

Samples the let's say I simply take a zero and target zero.

11:03:18

And you get the value two six and remember output zero also has the value two six.

11:03:24

So great.

11:03:26

It seems like our.

11:03:30

Our technique book.

11:03:32

Let's test a few more cases just to be sure.

11:03:34

So bury it in the end somewhere at the start.

11:03:36

Let's see if we can fix that.

11:03:38

So here is array zero.

11:03:42

Now if I take this remember four to one three.

11:03:44

Oops. I think I didn't complete it.

11:03:46

Let me also put in eleven comma five here.

11:03:50

Yeah. So remember four to one three is the solution.

11:03:54

Now if it simply take four to one three eleven five.

11:03:56

And call sub array sum.

11:04:04

And put in.

11:04:06

This number here and put in once again the target zero was ten.

11:04:10

Oh, this should be somewhere is some one. Okay.

11:04:12

Yeah. So now you can see four to one three is zero one two three, which is the range zero to four.

11:04:18

So it seems to have worked correctly.

11:04:20

Let's do the same thing. Now list this time. Let's put this at the end.

11:04:24

So one seven four to one three eleven five.

11:04:28

This works fine two two six.

11:04:30

Let's try another one. Let's try maybe.

11:04:34

17 and that probably cannot be found. Oh, it can one two.

11:04:40

Let's see one zero one two three four five probably the sum of all of these four.

11:04:44

Let's do six plus four ten.

11:04:48

Okay. Now maybe there's a problem here because it seems like 17 is not the right sum.

11:04:54

So you have one plus seven eight.

11:04:58

And eight plus four twelve twelve plus two fourteen fourteen plus four eighteen.

11:05:06

Okay. So this seems like a mistake then.

11:05:10

And we can even check this out.

11:05:18

So we have L one that's that.

11:05:22

Let's call that L two L two.

11:05:26

Oh, it says one to six. I think I misread it.

11:05:28

So we are ignoring the zero element.

11:05:32

So this does add up to seventeen. Okay. So seventeen does show up.

11:05:34

Let's try eighteen which takes up the entire array works fine.

11:05:38

Let's try maybe four which should just take the single number.

11:05:42

So that works fine two.

11:05:44

Let's try nineteen that should be none none.

11:05:48

We've tested this extensively and overall our solution seems correct.

11:05:54

This is the process whenever you write any code.

11:05:56

You should also test it out and it also gives more confidence to the interviewer.

11:06:02

But if you do not have the option to.

11:06:04

Estet out if you do if you're not able to run the code right now.

11:06:08

Then simply walk them through an example yourself like look at this example and then walk them through the example.

11:06:14

Okay. So now we have the brute force solution.

11:06:16

The next step is to analyze the brute force solution.

11:06:20

Now let's analyze it. So you have here one for loop.

11:06:22

And we know that counting for loop helps us count the number of operations.

11:06:26

Then we have another for loop. So one for loop can go from zero to end.

11:06:30

So this may run n times. Then we have another for loop which goes from i to n plus one.

11:06:34

Let's approximate here and say that it can run at most n minus one times or n times.

11:06:42

So n and inside each of these up at most n.

11:06:46

And then inside the second for loop you have the sum. So this is very important.

11:06:50

Now always carefully observe the operation inside your for loop.

11:06:52

So you have a sum which can be on an array of i to j.

11:06:56

Now remember i can be zero and j can have the value n.

11:07:00

That means in the the largest array that you can work with will have approximately the size n as well.

11:07:06

Right. So you have n and inside each of those you do n other loops and inside each loop you do work.

11:07:14

You do n additions right at most n additions. So that roughly gives you that this is going to be n times n times n.

11:07:22

So this is going to be an order and cube solution.

11:07:24

Okay. So if you are able to arrive at the order and cube solution at the order and cube complexity without implementing the solution.

11:07:36

Great. You have learned it. But if you're not able to arrive at the order and cube solution at the order and cube complexity.

11:07:42

Order and cube complexity for the brute force solution.

11:07:46

Then you probably need a little more practice because this should become second nature to you.

11:07:50

Just looking at a problem identifying the simplest solution and then finding the complexity of the simplest solution.

11:07:58

Okay.

11:07:58

All right. So now we have implemented it tested it and we've identified the complexity. Remember the next step.

11:08:04

Find the inefficiency and over come that inefficiency by applying the right technique.

11:08:12

So let's find the inefficiency then.

11:08:18

Here we have.

11:08:22

Let's say we are at this position. So let's say you're looking at.

11:08:28

7 4 2 let's say.

11:08:30

I has the value.

11:08:32

So you start out with I equal to 1 and j equal to 1 in the inner loop.

11:08:38

Then what we do is we increment j by 1 and then we calculate the sum and this sum is 7.

11:08:42

Then what we do is we increment j by 1 more and we calculate this loop and this sum and this sum is 7 plus 4 11.

11:08:50

Then we increment this window once again and then we calculate this sum and that is 7 plus 4 plus 2.

11:08:56

So 7 plus 4 11 plus 2.

11:09:02

13.

11:09:02

And then we move this and then we check it again.

11:09:05

So we're doing this over and over and over many many times, right?

11:09:08

Each time we are doing 7 plus 4 plus 2 plus 1 and 7 plus 4, 4 plus 2 plus 1 plus 3,

11:09:13

that seems like a lot of additional work.

11:09:15

Maybe we can just avoid that.

11:09:17

What we can do is we can, when we start out with a j,

11:09:19

we can keep running some and each time,

11:09:22

simply before incrementing j, add this upcoming element,

11:09:26

which is the jth element into that running some, right?

11:09:30

And that way we don't have to do that entire sum inside each of the inner loops.

11:09:35

So that's one optimization.

11:09:37

And this is how you should explain it.

11:09:38

That's one optimization that I have come up with.

11:09:40

The second optimization that we can come up with,

11:09:43

is that the moment the sum, the running sum that we calculating,

11:09:46

the moment the sum becomes greater than the target value.

11:09:50

We can skip all of these, right?

11:09:53

So we know that 7 plus 4 is greater than 10.

11:09:57

And we know that the array only contains non negative number.

11:10:01

So what that means is 7 plus 4 plus any of these numbers is always going to be greater than 10.

11:10:06

That you can, obviously you can see this,

11:10:08

the number is not going to decrease if we keep adding positive numbers.

11:10:13

And so as soon as the running sum crosses this value,

11:10:17

we can break out of the inner loop.

11:10:19

We do not need to continue and look for higher values of j.

11:10:24

The two optimizations helps to just write them down,

11:10:29

maintain a running sum that you don't forget it.

11:10:33

And the second optimization is,

11:10:37

when some exceeds target, break inner loop.

11:10:44

So now we have applied an optimization simply by just looking at the data.

11:10:55

And a lot of cases, it's very straightforward.

11:10:58

You don't even have to apply any special technique.

11:11:01

And in this case, we found these couple of optimizations.

11:11:04

So let's apply them.

11:11:05

So what we'll do is we'll define,

11:11:08

def sub array sum 2.

11:11:14

And here, once again, we have the array and we have the target.

11:11:19

And this time, we get the length of the array.

11:11:23

And once again, I goes from the same value.

11:11:26

So I goes from 0 to n minus 1.

11:11:31

Nothing changes here.

11:11:35

So for i in range 0 to n minus 1.

11:11:41

Now here is where we want to start a running sum.

11:11:44

So s equals 0.

11:11:45

This is our running sum.

11:11:47

Then for j in range.

11:11:52

Remember, we start out with i.

11:11:54

And we'll go all the way.

11:11:56

I keep making these mistakes all the time.

11:11:59

And by the way, these are called off by one errors.

11:12:02

We did was I wanted to go to at the address n minus 1,

11:12:06

but because ranges do not include the final value.

11:12:09

I put in what I put in n minus 1 was wrong.

11:12:12

I should be putting in n.

11:12:13

And I make these mistakes all the time,

11:12:15

even after many years of coding.

11:12:16

So always watch out for off by one errors.

11:12:19

Anyway, so j can take the range of 0 of i to n.

11:12:24

So here, we should put in n plus 1.

11:12:28

And now first we want to check if the running sum is equal to the target.

11:12:37

So assume that we've been calculating the running sum step by step.

11:12:40

And we'll write an n at this current point.

11:12:43

The sum has become equal to target.

11:12:45

Now if the sum has become equal to target,

11:12:47

then we simply return i comma j.

11:12:49

Because this sum includes the sum from index i all the way up to just before j.

11:12:56

So initially the j also has the value i.

11:12:59

So the sum is 0, which makes sense.

11:13:05

But if that is not the case,

11:13:07

we check if it is greater than the target.

11:13:09

So is it possible that our sum has already exceeded the target?

11:13:13

In that case, we don't need to continue this inner loop.

11:13:16

We can break out of this inner loop.

11:13:17

And the way to do that is by simply typing break.

11:13:22

And then if neither of these held through,

11:13:26

if neither of these was true,

11:13:28

so which is that the sum was not equal to the target.

11:13:30

It was not greater.

11:13:31

That means it is still less than the target.

11:13:34

So that means we need to then add area of j into the sum.

11:13:39

So we can say sum plus equal to sum plus equals area of j,

11:13:44

which is the same as sum equal to sum plus area of j.

11:13:49

In any case, area of sum plus equal to area of j.

11:13:52

So we have added the jth element.

11:13:55

Now remember,

11:13:57

if this is the pointer j,

11:13:59

we are added the jth element.

11:14:01

And then we will set j to j plus 1.

11:14:03

That will happen automatically when we come into the next iteration.

11:14:06

And the next iteration will once again check if the sum is equal to the target.

11:14:10

If it is equal, we return i.

11:14:12

Otherwise, we check if it is greater than the target.

11:14:15

If it is still less, we increment we move j once again.

11:14:19

So we add one and then we move j once again.

11:14:22

And then we check again.

11:14:24

So that's our running sum.

11:14:26

Looks good.

11:14:27

Now once again,

11:14:29

if we were if it was found,

11:14:31

it would have been returned somewhere here.

11:14:33

Since it seems like it was probably not found.

11:14:36

So if we come to the very end,

11:14:40

so here we return none, none.

11:14:42

And once again, let's test it out.

11:14:44

So let's try somewhere sum 2.

11:14:47

It gives you 2, 6,

11:14:48

somewhere sum 2 of none.

11:14:51

Okay, seems like there is an issue here.

11:14:54

Yes.

11:14:55

So this is why you need test cases.

11:14:57

So it seems that

11:15:00

rj took up an invalid value.

11:15:08

So why is that?

11:15:09

Well, that's because j can go to the point of n.

11:15:12

So the maximum value j can take is n.

11:15:15

So which means that you have already arrived at this position.

11:15:20

So now you can no longer increase the sum further.

11:15:23

So if you arrived at this position,

11:15:25

but you still not reach the total of n,

11:15:27

then that means you may need to increase it further,

11:15:29

but you can't increase further.

11:15:30

So there's no number here to add.

11:15:32

So what we should do is we should here add a check if j less than n.

11:15:37

It's since j can go all the way up to n.

11:15:43

And that's it.

11:15:44

So we had a small bug and we fixed it.

11:15:49

Now again, this is something that you should work out for yourself on pen and papers.

11:15:54

So even while doing the optimization,

11:15:56

you can ask for a couple of minutes,

11:15:58

play around with it on pen and paper,

11:16:00

write a few examples,

11:16:02

relax.

11:16:03

You can even take up to four five minutes.

11:16:05

And if you,

11:16:06

if you're not getting any ideas,

11:16:07

you can simply talk to the interviewer.

11:16:09

You can speak out loud,

11:16:10

explain your thought process.

11:16:12

And in a lot of cases,

11:16:13

they will give you a hint,

11:16:14

because they want to see you succeeding.

11:16:19

Okay.

11:16:21

So now this is the second implementation.

11:16:24

Let's see.

11:16:25

Okay, this time it worked.

11:16:26

None, none.

11:16:27

4 to 1, 3.

11:16:28

Let's put in 10 here.

11:16:29

They should give you the value 2,6.

11:16:32

Let's put in this.

11:16:35

So that's 0,3.

11:16:36

Let's test this out.

11:16:38

So that's 0,4.

11:16:43

Yeah, 0,4.

11:16:44

So it seems like it's working just fine.

11:16:51

Yeah, so seems like this is working pretty well.

11:16:54

So now we have the second optimized solution.

11:16:57

So let's look at the optimized solution and analyze it.

11:17:01

So we have one loop.

11:17:03

And then we have a second loop.

11:17:04

These two are the same.

11:17:05

But inside the second loop,

11:17:06

we are simply doing a constant operation.

11:17:08

We are just doing some comparison and one addition,

11:17:11

not up to n additions.

11:17:13

So the complexity goes from order of n cube to order of n square

11:17:17

by maintaining a running sum.

11:17:19

Great.

11:17:21

Now,

11:17:23

this at this point,

11:17:25

when you've described the solution to the interviewer

11:17:27

and maybe also coded it,

11:17:29

you might ask them,

11:17:30

is this good enough?

11:17:31

And they can see that you've thought about it.

11:17:34

You've thought about it.

11:17:35

You've found the solution and you've tested it in a test.

11:17:38

Well,

11:17:39

and at this point,

11:17:40

they may just say,

11:17:41

I'm happy with the solution.

11:17:42

This is good enough.

11:17:43

Or they may say,

11:17:44

Can you do better?

11:17:45

Now, when they say,

11:17:46

Can you do better?

11:17:47

Most of the time,

11:17:48

it suggests that there is a better solution.

11:17:51

So let's see.

11:17:52

Let's think about it a little more.

11:17:53

And let's see if there is a better solution.

11:17:55

Now,

11:17:56

to can you do better?

11:17:58

We apply the exact same technique.

11:18:00

We have analyzed the complexity.

11:18:02

And now we need to look for inefficiency.

11:18:04

Okay.

11:18:05

Now, we have removed the inefficiency on this side,

11:18:07

which is as we move J,

11:18:09

that is when we reuse the previous sum

11:18:13

to compute the next sum.

11:18:15

So we remove the inefficiency on this side.

11:18:17

And we've also added this,

11:18:19

also added this condition,

11:18:21

so that J only goes up to a certain point.

11:18:23

Now,

11:18:24

of course,

11:18:25

in the worst case,

11:18:26

J may always go up all the way to the end,

11:18:27

but at least in a lot of cases,

11:18:29

J will not go beyond the point,

11:18:31

where the sum becomes larger than the target.

11:18:34

So these are good optimizations,

11:18:36

but what about I?

11:18:37

What about the left window?

11:18:40

Now,

11:18:41

look at this here.

11:18:42

Now,

11:18:43

when you have seven,

11:18:44

four,

11:18:45

or let's start out all the way at one.

11:18:47

So we have one,

11:18:48

that's,

11:18:49

so first we start out with the empty,

11:18:51

empty sub array.

11:18:53

That has the sum zero.

11:18:54

Then we increment J.

11:18:55

So now the sum becomes one.

11:18:57

Then we increment J.

11:18:58

Now the sum becomes eight.

11:19:00

Then we increment J once again,

11:19:02

and now the sum becomes 12.

11:19:04

Okay,

11:19:05

the sum has become 12.

11:19:06

Now that's a problem.

11:19:07

So what do we do?

11:19:08

What we are saying is,

11:19:10

we will take I and set it to the next value,

11:19:13

and then we'll bring J back to zero,

11:19:15

or back to the value I.

11:19:17

So that we start with the empty sub array once again.

11:19:19

So now when we do seven,

11:19:21

and when we,

11:19:22

so that's,

11:19:23

that just has the value seven,

11:19:24

and when we do this,

11:19:25

we have to add up seven plus four.

11:19:27

Now here's something that we could have done

11:19:29

instead.

11:19:31

Now as soon as the value became larger than the target value,

11:19:35

we could have simply moved this here.

11:19:39

Does that make sense?

11:19:42

Let's think about it.

11:19:44

So,

11:19:46

till this point,

11:19:48

this total was less than 10.

11:19:50

As soon as we added this number on the right,

11:19:53

this total became more than 10.

11:19:55

Now we know that this total became more than 10,

11:19:57

that means that

11:20:00

if we slide this window,

11:20:02

if we slide the left window forward one step,

11:20:05

then the total may become less than 10.

11:20:07

Right,

11:20:08

it may still became,

11:20:09

stay larger.

11:20:10

In this case,

11:20:11

it stays larger,

11:20:12

or it may become less than 10.

11:20:14

So if the total now becomes less than 10,

11:20:16

then we can once again move this.

11:20:19

But if the total has not become less than 10,

11:20:22

so we will move this instead.

11:20:24

So now the total again is less than 10.

11:20:26

So we can once again move this.

11:20:29

And now the total still less than 10.

11:20:31

So we move this.

11:20:32

Now the total still less than 10.

11:20:33

And we moved this.

11:20:34

And we encountered 10 here.

11:20:35

But suppose we had not encountered 10.

11:20:37

Suppose this number was over instead.

11:20:40

Then what we would have to do is move this.

11:20:43

And now the number becomes less than 10.

11:20:46

So we always go.

11:20:48

We always try to maintain a window of size less than 10.

11:20:52

The moment the window becomes greater than 10,

11:20:55

we keep trying to reduce its size further.

11:20:58

To less than 10.

11:21:00

Right?

11:21:01

Or exactly 10 is well.

11:21:02

It's possible that the size may become exactly 10.

11:21:04

And then the problem is solved.

11:21:05

But we keep trying to reduce its size.

11:21:07

To a value till it becomes less than 10.

11:21:09

So to revise the algorithm,

11:21:12

we start out with both i and j at zero.

11:21:16

Then we increment j while the running.

11:21:19

Now we have a single running loop and a single loop,

11:21:22

essentially.

11:21:23

Increment j while the sum is less than 10.

11:21:28

The moment it becomes greater than 10,

11:21:30

we start incrementing i.

11:21:32

The moment the sum becomes less than 10,

11:21:35

or less than target,

11:21:36

we start incrementing j.

11:21:38

And if we encounter the point where the sum equals 10,

11:21:41

we have performed the answer.

11:21:43

So that's the algorithm.

11:21:44

So let's write it.

11:21:46

So bar A sum 3.

11:21:49

Now this is the array target.

11:21:54

Now we have i, we have j.

11:21:57

And we have sum.

11:21:58

All of them.

11:21:59

Let's call it s because sum is a reserved word in Python.

11:22:02

An existing function.

11:22:03

So all of these have the value zero.

11:22:05

Then we say while i is less than 10 array.

11:22:11

Let's call that n.

11:22:14

So let's create n equal to n a r.

11:22:18

i is less than n.

11:22:22

And j is less than n plus 1.

11:22:26

Remember, because j can take the value n as well,

11:22:29

it is the exclusive n index.

11:22:32

Now at this point you want to check first.

11:22:37

So if the sum s, the current sum running sum,

11:22:40

is equal to the target,

11:22:42

then we simply return i,

11:22:45

j.

11:22:47

L if sum is less than the target,

11:22:51

then we simply increment j.

11:22:56

Okay, so now we can move the window forward.

11:22:59

So we are incrementing j if the sum is less than the target.

11:23:03

So we increment j.

11:23:05

But before we increment j,

11:23:06

we should add the jth element to maintain the running sum.

11:23:09

So here we say s plus equals j or array of j.

11:23:20

And remember j can take the value n as well.

11:23:23

So that's where we do this.

11:23:26

Only if j is less than n.

11:23:29

If there is a indeed an element for us to add.

11:23:32

This is an error we face last time.

11:23:34

And you will discover this when you write the test anyway.

11:23:37

And then we say l if s is greater than target.

11:23:40

And we can also just say else here,

11:23:42

but just for clarity let's say l if.

11:23:44

In this case,

11:23:45

what we want to do is we want to move i forward.

11:23:48

So suppose we end up in a situation like this.

11:23:51

And we want to move this forward.

11:23:53

For that we need to subtract s array of i first.

11:23:56

So we say s minus equals,

11:23:58

which is equal to s minus.

11:24:00

Which is the same as s equals s minus.

11:24:05

Area of i.

11:24:07

And then we increment i.

11:24:09

So we move the left window forward as well.

11:24:13

So we then repeat this.

11:24:15

So we first move j to a point.

11:24:17

Then we as soon as we cross the target,

11:24:19

we start increasing i.

11:24:20

And then we keep doing that to match the target.

11:24:22

And then finally we return non comma non,

11:24:25

if we have not found it.

11:24:27

So that's our sub array sum 3.

11:24:30

This is seems like the most optimized solution.

11:24:33

And let's test it out.

11:24:35

So here we have sub array sum 3.

11:24:42

And let's test sub array sum 3 here as well.

11:24:45

Seems like it worked.

11:24:47

Let's see.

11:24:48

So if you put in 10 here,

11:24:50

you get 2 comma 6.

11:24:52

Let's say this is 4 to 1 3,

11:24:55

0 comma 4.

11:24:58

Let's put in 12 here.

11:25:03

That doesn't show up.

11:25:05

Let's put in 17 here.

11:25:07

0 comma 5.

11:25:09

13 1 comma 5.

11:25:15

Let's try 19.

11:25:16

Let's 3 comma 6.

11:25:17

Let's see 1 plus 3 plus 7 plus 9.

11:25:20

Yeah, that has the value 19.

11:25:22

Let's throw in a zero there and see if it works with zeros.

11:25:26

Let's 3 comma 7.

11:25:28

Works fine.

11:25:29

And let's see if it doesn't work out.

11:25:32

Yeah.

11:25:33

Okay.

11:25:34

Great.

11:25:35

So this solution is correct too.

11:25:37

Again, if you don't have the option to run the code,

11:25:40

you can simply pick one example

11:25:42

and walk through the working of the example.

11:25:45

Now we have sub array sum 3.

11:25:48

And once again,

11:25:50

we are ready to analyze the complexity.

11:25:53

Would be somewhat tricky.

11:25:55

It's a little bit unusual because there is a while loop with two variables.

11:25:59

But remember that in each while loop,

11:26:01

either we exit, which is the best case,

11:26:03

so we can ignore that.

11:26:04

Or we either increment j or we increment i.

11:26:09

So we increment j or we increment i.

11:26:15

And if we increment.

11:26:20

So j can go from the values 0 to n.

11:26:23

And I can go from the values 0 to n minus 1.

11:26:26

So the total number of increments can be,

11:26:28

and we can do, is that some of the number of possible values

11:26:31

of i and number of possible values of j.

11:26:33

Remember, this is not a product this time,

11:26:35

because you do not have an estate loop.

11:26:37

So for each value of i, you're not doing this.

11:26:39

Rather, you are incrementing each one.

11:26:42

And I only one of them each time.

11:26:45

So the sum of total number of values i can take is n.

11:26:48

The total number of values j can take is n plus 1.

11:26:51

So the total becomes,

11:26:54

this number of iterations becomes 2 n plus 1.

11:26:57

Now, of course, there's the,

11:26:59

you can verify that a constant amount of work is being done here.

11:27:03

So we finally end up with the conclusion that this is an order n algorithm.

11:27:09

So this is finally an order n algorithm.

11:27:12

So this is a good example of a problem where the step by step,

11:27:17

solution coming up with a simple solution.

11:27:19

And then thinking about the,

11:27:22

the inefficiency in the problem and then applying.

11:27:25

In this case, just common sense to solve the inefficiency step by step.

11:27:31

leads to the perfect solution and a very good solution in fact.

11:27:35

So you start out with an order n cube solution.

11:27:38

The order n cube is going to be pretty slow when you start hitting.

11:27:41

Let's say even a thousand,

11:27:44

even a thousand elements.

11:27:46

If you have 10,000 elements that will take forever,

11:27:48

it will take maybe an hour or so.

11:27:50

If you have a million elements,

11:27:52

it will take hundreds of years on the other hand.

11:27:57

Order n can work fine.

11:27:59

All the way up to a billion element.

11:28:01

So there's a huge difference between the,

11:28:04

somewhere is some one, two and three.

11:28:07

So there is some three can work instantly for a billion elements.

11:28:10

So there is some one will take forever,

11:28:13

even for a hundred thousand elements.

11:28:18

And so baritou is in between,

11:28:20

and you can do the math.

11:28:25

And this technique,

11:28:26

where you can almost certainly tell what the next step is.

11:28:31

So this was not really related to any of the algorithms or data structures that we have talked about.

11:28:38

This is what is called a greedy approach where you know some optimal strategy about the problem.

11:28:42

In this case,

11:28:43

you know that we can calculate the sums by maintaining a running sum.

11:28:47

So we just do that.

11:28:48

And then you also know that as soon as it becomes greater than a target,

11:28:51

we need to break out.

11:28:52

And then you know the next thing that when becomes greater than target,

11:28:55

rather you can simply update I.

11:28:57

So this is what what is called a greedy approach.

11:29:00

Where you somehow know that just doing this will fix it.

11:29:04

It does no real technique to be applied.

11:29:06

And these problems are somewhat tricky,

11:29:09

but you get the hang of these problems as well.

11:29:11

If you search for greedy problems online,

11:29:13

you get the hang of these by solving a few practice exercises.

11:29:17

Okay.

11:29:18

So that's our first interview problem.

11:29:23

And we've solved it in about 45 minutes.

11:29:26

And this is approximately how long you will have for an interview.

11:29:30

In a couple of minutes of introduction,

11:29:35

maybe a few minutes just you talking about a project

11:29:38

and the interviewer asking you questions.

11:29:40

But then the next 30 to 40 minutes will be dedicated

11:29:44

towards solving a problem.

11:29:49

And this is what roughly the process will look at.

11:29:53

Let's do one more example.

11:29:55

Let's pick another interview question.

11:29:57

And let's see if we can solve this one.

11:29:59

So this is slightly different.

11:30:01

So this gives us one more variation to study.

11:30:16

By the way, to run these,

11:30:18

you simply click the run button and select run on binder.

11:30:20

Okay. So this is an interview question that was asked

11:30:30

during a coding interview at Google.

11:30:33

And the question is given two strings A and B.

11:30:36

Find the minimum number of steps required to convert A into B.

11:30:41

So what you can do is you can perform operations

11:30:45

in each operation is counted as one step.

11:30:48

And the operations you can perform on a word are these.

11:30:51

You can either insert a character into the word

11:30:54

or you can delete a character from the word.

11:30:57

So for instance here,

11:30:59

you can see that if you are trying to convert

11:31:02

Intention into execution.

11:31:04

So either you can insert a character, for example,

11:31:06

you could insert C here,

11:31:07

or you can delete a character.

11:31:09

For example, you can delete I here

11:31:11

or you can replace a character.

11:31:13

That is you can take N and replace it with E.

11:31:16

You can take T and replace it with X.

11:31:18

And E does not need to be replaced.

11:31:20

And here we've inserted C.

11:31:22

And then here we substituted N for you.

11:31:24

So we've taken the word intention

11:31:26

and by performing a few changes,

11:31:28

character by character by either inserting,

11:31:30

deleting or replacing a character.

11:31:32

We have converted it into the string execution.

11:31:39

So the number of steps required here is one, two, three, four, five.

11:31:43

Now here's a challenge for you.

11:31:45

Try and work this out on paper and prove that this is the best solution.

11:31:49

So because we need to find the minimum number of steps required to convert A to B.

11:31:55

So that's the problem.

11:31:58

And this is a moderately hard problem.

11:32:02

And variations of this show up as well.

11:32:05

So let's start applying the method.

11:32:08

Now when you hear the problem,

11:32:11

a solution may not strike you upfront.

11:32:13

That's perfectly alright, don't panic.

11:32:15

Sometimes when you're not able to immediately come up with a solution or identify how to solve this problem,

11:32:20

you enter a sort of panic and then you're unable to think.

11:32:23

Don't do that.

11:32:24

Remember have faith in the method.

11:32:26

And we will apply the method and come up with a solution.

11:32:29

Try by step.

11:32:31

So the first thing is to state the problem in your own words.

11:32:36

To give in two strings,

11:32:38

we need to perform operations a series of operations on the first string.

11:32:42

The operations could be a deletion of a character.

11:32:45

Substitution of a character with another character or insertion of a character.

11:32:50

And through these operations, we need to convert it into a second string.

11:32:54

Okay, we have understood the problem.

11:32:56

If the interviewer had not given an example,

11:32:59

either you can state the example or you can just ask for an example.

11:33:02

Whatever makes a works for you.

11:33:05

So we've stated the problem.

11:33:07

Now what are the inputs to the problem?

11:33:09

The inputs are two strings.

11:33:10

So the inputs are strings like intention and execution.

11:33:14

So let's see maybe let's call them STR1.

11:33:19

This is intention STR2.

11:33:24

This is execution.

11:33:27

Now one thing you have to be careful about here is you do not want to capitalize

11:33:32

because sometimes what might happen is this I may match up with an I here in the proper solution.

11:33:38

But Python obviously treats small and capital letters differently.

11:33:42

Python doesn't know what's that the I which is lower case in the I which is upper case is the same.

11:33:48

So you will not be able to compare them.

11:33:49

So just to keep things simple, either make everything up a case or make everything lower case.

11:33:55

But yeah, this is what the input looks like.

11:33:58

And the output is going to be a single number.

11:34:01

So the output is simply going to be the edit distance.

11:34:04

So let's just call it output 1 and it is going to be the number.

11:34:07

5 and here is something that you can verify.

11:34:11

So that's the input that's the output and function signature.

11:34:16

So of course this term edit distances how this problem is described.

11:34:21

But here there is no edit there's no concept of edit distance that's mentioned.

11:34:26

So you can give a function name that makes sense for this problem.

11:34:30

So find the minimum number of steps required to convert A to B.

11:34:34

Okay, so let's just call it min steps for now.

11:34:40

So the function definition would be min steps and this would take an STR1 and this would take an STR2.

11:34:47

And it would return an output for now we'll just put in pass here.

11:34:53

All right, so now we have already clarified the problem.

11:34:58

If you had any questions, this would have been good time to ask the interviewer and make sure that you have a clear understanding.

11:35:03

Now you have stated the input output and function signature.

11:35:09

The problem has been communicated back and forth properly.

11:35:12

The first step is done.

11:35:14

The next step is to list out some test cases.

11:35:16

Once again, a very good quality listing out some test cases.

11:35:20

So you can say that now I'm just going to list out a few cases that I want my function to cover.

11:35:25

So that they will help me it will help me by writing the code.

11:35:30

Now one is the general case, which is listed above.

11:35:37

So this would be intention execution and we can take a few more examples like this.

11:35:42

Now one example could be where no change is required.

11:35:47

So you are given the same strings.

11:35:50

One case could be that all the characters need to be changed.

11:36:01

So these are the two extreme cases.

11:36:03

One is no change is required and second is all characters need to be changed.

11:36:06

Maybe added removed deleted lots of such things.

11:36:10

Then you can check both strings of equal length.

11:36:15

So in this case they are in fact of equal length.

11:36:18

Unequal length you can check both strings of unequal length.

11:36:21

One of the strings is empty.

11:36:26

Your function should be able to handle that too.

11:36:31

Then you may check things like it will if something only requires deletion.

11:36:36

If something only requires addition or if something only requires swapping.

11:36:43

All right such things.

11:36:47

I guess this is pretty good at this point.

11:36:49

So now we can probably move forward.

11:36:51

So we have stated some test cases.

11:36:53

Now you don't need to create all the test cases right now in an interview.

11:36:56

It can take a bit of time.

11:36:58

So let's just move ahead and the next step is to come up with the simplest solution to the problem.

11:37:03

Which is also called the brute force solution.

11:37:06

So now we have a lot more information about the problem.

11:37:08

In this meantime probably it has sunk into you and you may have been able to think of a brute force.

11:37:12

But if not, don't worry there is a simple trick.

11:37:16

I'll tell you which you can apply whenever you are stuck and you can't think of a brute force solution.

11:37:20

So we are looking at you looking at it.

11:37:22

Intention and execution.

11:37:23

What am I going to do?

11:37:24

Am I going to start from the left and right?

11:37:26

How do I check which one is?

11:37:29

How do I know if this is going supposed to be inserted or executed or.

11:37:34

Replace or substituted or deleted.

11:37:39

So the simple trick is whenever you are in doubt.

11:37:44

Think about recursion.

11:37:46

See if there is a way to solve this problem recursively.

11:37:50

And what do you mean by solving a problem recursively?

11:37:52

Can you reduce the overall problem to.

11:37:57

A combination of one or more sub problem.

11:38:01

So if you take a portion of the input and can you solve the same problem on the portion of the input.

11:38:07

And then use that to solve the overall problem.

11:38:10

So let's see.

11:38:11

Let's see if there is a recursive solution possible here.

11:38:14

So here I have the same thing.

11:38:16

Intention and execution.

11:38:18

Now with recursive solutions normally either start by looking at the first character or the last character.

11:38:24

So let's look at the first character factor of each string.

11:38:27

So we've given these two strings and we need to find.

11:38:30

The number of operations to change this string into this string.

11:38:33

Let's look at the first character.

11:38:38

Now suppose the first characters were in fact equal suppose this was not.

11:38:43

Intention but it this was and tension and this was execution.

11:38:48

So now we compare the first characters and we know that the first characters are equal.

11:38:52

Okay, so if the first characters are equal then obviously neither of them needs to be deleted or.

11:39:00

It removed or obviously this character is not need to be deleted or removed or switched.

11:39:04

It's already matching.

11:39:06

So what we can do is we can just ignore the first characters.

11:39:10

And we can simply look at the remaining string.

11:39:14

Okay, so intention and execution because the first characters are already equal.

11:39:18

Let's write that down so that we don't forget it.

11:39:22

And this is the recursive solution.

11:39:26

Now this is where you can take a moment to work this out on pen and paper and that's perfectly all right.

11:39:31

What helps is to just talk keep talking about what you're doing.

11:39:35

But for recursion now first thing we know is if the first character is equal.

11:39:43

Then ignore from both.

11:39:46

So just ignore the character of both strings and simply recursively solve the problem for the sub list and the other sub string without the first characters in each of the strings.

11:39:57

So you exclude e and exclude e from this and solve the problem for these two perfect.

11:40:02

Suppose the first character isn't equal. So that's another case.

11:40:05

Right. So that is the case where you have intention and execution.

11:40:10

So if the first character is not equal, then either the first character has to be deleted or the first character has to be swap.

11:40:19

So you may have to swap i with e or the first character or maybe something needs to be added before the first character.

11:40:26

Okay. Now let's see one by one.

11:40:30

So if the first character is not equal.

11:40:39

Either it has to be deleted or swapped or a character inserted before it.

11:40:53

There are only three possibilities right.

11:40:55

Of course it's possible that we may do some other things can insert characters after it and so on.

11:41:01

But add that position after applying an operation either the first character will get deleted or the first character will get swapped and will be changed to e.

11:41:11

Or the first character will now change to something else in the first original first character will become the second character.

11:41:18

Now let's look at each case. The first case is it has if it is deleted.

11:41:22

Now the power of the duty of recursion is that we don't need to guess which solution it is.

11:41:27

We can try all three recursively and then simply pick the best one.

11:41:31

So suppose we choose to delete the first character.

11:41:34

So suppose we say that we are deleting the first character.

11:41:37

Now what that means is we've performed one operation and we've deleted the first character.

11:41:43

So now what we're left with is this.

11:41:47

So now what we end up is the second string is remain the same. Only the first string has changed where we have lost the first character.

11:41:54

Now what we end up with is with the sub problem where we need to find the minimum number of steps to change and tension and t e and ti on into execution.

11:42:03

Okay. So in this case if the it has to be deleted then recursively find.

11:42:13

Then recursively solve after ignoring first character of STR1.

11:42:21

Okay. That's one possibility.

11:42:27

And you get the recursive solution and you simply add one to it. That tells you the solution if you delete the first character.

11:42:35

The next option is that we change the first character i to e.

11:42:39

Now if we change the first character i to e. So one operation has been performed and then now these two have become equal.

11:42:46

Now that these two have become equal we can move this forward and we can move this forward.

11:42:52

Now we can simply recursively solve the problem for intention and execution.

11:42:56

Find the minimum edit distance between the two and simply add one to it to get the number of steps required to change intention to execution.

11:43:07

By swapping the first character right from i to e.

11:43:12

So in this case you recursively solve after ignoring the first character of each.

11:43:24

So it is one plus in both cases it is one plus the recursive solution after ignoring the first character of each.

11:43:33

Because the one operation is something that has been performed.

11:43:36

Okay. Now the final case.

11:43:39

The final case is you have intention and execution. Now we decide that we are going to shift this string forward and we are going to include we are going to introduce an e here.

11:43:51

So we are going to introduce e here.

11:43:55

Now what happens is the e is matching the e.

11:43:59

Now i has gone on to the first position.

11:44:02

So effectively what has happened is that we need to recursively solve the problem or the original string intention.

11:44:10

And the second string with the first character removed because we have inserted something before the first character in the first string.

11:44:16

So that is going to match with the first character of the second string.

11:44:19

And hence we simply need to recursively solve the problem for these two.

11:44:23

In this case what we are doing is the solution is one plus recursively solve after ignoring the first character of STR2.

11:44:38

Okay sounds good. Looks like we've done that. Now what's the end solution going to look like the end case. Remember in recursion.

11:44:47

This is all well and good but at some point we are going to hit some kind of an end.

11:44:51

So let's see. Let's see if we can define such an end scenario.

11:44:55

So maybe let's say we have been performing recursion and then we ended up at a situation like this where.

11:45:03

There is nothing left in the second string but you still have some characters left in the first string right.

11:45:08

So you are at this position now.

11:45:10

And here this is gone. There's nothing left in the second string.

11:45:15

So in this case to change recursive to change TION into the empty string all we need to do is delete all four.

11:45:23

So if you have a few characters if the second string becomes empty and you simply find the number of remaining characters in the first string and delete them.

11:45:32

So that is the number of operations or the other possibility is that the second string still has some characters.

11:45:40

But you've run out of characters on the first string.

11:45:43

So if you run out of characters on the first string but the second string still has some characters.

11:45:48

Then in that case what you need to do obviously is you have the empty string and you need to take this.

11:45:56

You convert this empty string into TION that is a recursive problem you're solving.

11:46:00

So you that you can do by adding TION.

11:46:04

So you add TION and that is again going to be four steps which is the number of characters remaining in the second string.

11:46:13

So these are the two end cases. Now of course if both of them are empty then the answer is zero but if either of them is empty the answer is the number of remaining elements in the other one.

11:46:24

So let's write the solution.

11:46:26

Now we figured out the solution it took some time but again this is not a very straightforward problem.

11:46:30

There are a few cases to figure out.

11:46:33

And while you are doing this while you're identifying each case either you can say it out loud to the instructor or you can write it as a comment.

11:46:43

Whatever you feel more convenient with because the interviewer cannot see the work that you're doing on paper.

11:46:50

So it's very important for you to be able to convey it and that is why all this while in this course we have been saying that you need to express the solution in simple words.

11:46:58

Because you need to L the other person that you know the solution and they should be able to understand what you're saying without looking at your work without looking at the images that you've drawn.

11:47:11

And a great way to do it is either by writing or by speaking.

11:47:16

Let's define it then death.

11:47:19

What's it called min steps?

11:47:23

And min steps is it takes STR1 and STR2.

11:47:28

Great.

11:47:30

Now we are doing recursion and in recursion what we're tracking is the which character we are currently at.

11:47:37

So we could be at the 0th character or the first character or the second character in string 1.

11:47:43

And we could be at the 0th character second character in string 2.

11:47:48

So the starting point of this window determines the sub string that we're solving the problem for.

11:47:54

So ideally, when we want to solve this problem for these two substrings, we can simply pass those substrings.

11:48:02

But creating sub substrings as a cost because you have to copy those characters out and then allocate some memory and put them into a new place.

11:48:11

So an easier way is to simply keep a pointer.

11:48:14

So we will keep two pointers, i1 and i2.

11:48:18

And these will signify that we should be skipping while computing min steps.

11:48:24

We should be skipping the first i1 characters or we should be starting from the i1 index.

11:48:29

And we should be starting from the i2 index for STR2.

11:48:33

So in your window, if the i1 th index, if the starting index is equal to the length of string 1.

11:48:42

So this is the end case and remember the end case while coding is always written first.

11:48:47

So if this is equal to length of STR1, then we have known we have seen here that we need to perform these many additions.

11:48:55

So we simply return in this case STR, length of STR2 minus i2.

11:49:06

And you can verify that this is the amount number of additions required.

11:49:10

L if on the other hand i2 is equal to length of STR2.

11:49:15

So which means that you have exhausted the second string but the first string still has some values left.

11:49:21

So in this case, you need to remove the delete the remaining values in the first string.

11:49:25

So you just type length of STR1 minus i1.

11:49:30

So these we have now solved the trivial cases.

11:49:35

Now let's see L if STR1 of i1 and STR2 of i2.

11:49:42

Which means the first characters of each substring that we are working with.

11:49:47

Remember we are just using arrays as a, we are just using indices as an optimization.

11:49:52

What we really want to work with is substring.

11:49:54

So the first character of each substring, STR1 of i1 and STR2 of i2 is equal.

11:49:59

Now if the first character is equal, e and i are equal, then we simply ignore both and solve the problem.

11:50:04

So all the problem for the remaining string.

11:50:06

So we simply say return main steps.

11:50:10

And we pass in STR1, we pass in STR2.

11:50:15

And then we simply pass in i1 plus 1 here and we pass in i2 plus 1 here.

11:50:19

So what this is saying is that now we want to recursively solve the problem.

11:50:24

Or a substring starting at i plus i1 plus 1.

11:50:28

So we have ignored the first string of the current substring.

11:50:31

Similarly we have ignored the first character of the current substring or of the second string.

11:50:36

So we ignore the first characters and that's it.

11:50:38

And there are no steps required here.

11:50:40

No operations required here right now because the first characters are equal.

11:50:44

Now finally, this is the final case else.

11:50:47

Here we want to return one.

11:50:50

So we have to perform one operation.

11:50:52

Either it is an insertion, deletion or swap.

11:50:57

And what we can do is we can recursively check the first or the number of minimum steps required for each case of insertion deletion and swapping.

11:51:07

And simply pick the minimum one.

11:51:09

And if to it we add one.

11:51:11

Then we get the total minimum number of steps we need to perform for the entire list right.

11:51:16

So again recursion is very useful because you can simply assume that you have the function which solves the problem.

11:51:22

And you simply need to take the result of the sub problem and combine them.

11:51:25

So we take the minimum of the first option is if the first character of str1 has to be deleted.

11:51:33

So which is let's say we choose to delete i.

11:51:36

If we choose to delete i, then that means we have to solve the problem for these two.

11:51:41

So we say one plus recursively solve the problem after ignoring the first character of str1.

11:51:47

So we solve main steps for str1, str2.

11:51:52

Now since we've deleted the first character of str1, we can skip ahead into the next.

11:51:57

Because we are solving the problem now for the starting from the next index.

11:52:01

And i2 remains the same right.

11:52:03

So remember here we have not affected it.

11:52:05

So we need to solve this problem recursively.

11:52:08

So this was the case of deletion.

11:52:15

Next we have the option where you have swapped the first character.

11:52:21

So we have taken e and we have converted that it in, we have taken i converted it into an e.

11:52:26

If we did that, so then we can say that we can now these two characters are matching.

11:52:31

So now we can simply recursively solve the problem for the next character onwards after ignoring the parent character.

11:52:37

So this becomes str1 plus str2 plus i1 plus 1 plus i2 plus 1.

11:52:47

So this is swap or replace.

11:52:54

And you might notice that this is this turns out to be the same recursive call as this.

11:52:59

Except that we will add one to it because we have done the swap.

11:53:04

And finally if you are adding, so if you're adding inserting.

11:53:09

So finally if you're inserting here something, so if you are inserting e here, let's say.

11:53:17

So in this case, what we'll do is now we'll recursively solve the problem for intention and execution without the e in front.

11:53:28

So we skipped the first character of the second string.

11:53:32

So we have main steps str1, str2, i1 and i2 plus 1.

11:53:39

So this is rather nice in symmetric.

11:53:45

And that's it. So this should be it. Let's run this.

11:53:50

Okay, there is a syntax error here that's perfectly fine.

11:53:56

There needs to be a comma here that's fine to.

11:54:01

I make a lot of syntax errors all the time and of course off by one errors, I'm sure there are a few.

11:54:07

But yeah this is the minimum number of steps and this is the recursive function not too bad.

11:54:14

Two four six around eight lines of code.

11:54:18

And let's test out some of the test cases here.

11:54:22

I'm just going to copy the test cases out here below.

11:54:26

And let's test a general case which is intention and exception.

11:54:31

So let's see main steps.

11:54:38

Intention and exception.

11:54:42

It says five four.

11:54:47

Okay, why does it say four?

11:54:53

Maybe let's test.

11:54:56

Let's test a more simpler case first which is one of the strings being empty.

11:55:00

Let's say we have intention and one of the strings is empty.

11:55:05

So we will need to delete a let's just say int and one of the strings empty.

11:55:10

This looks fine. We will need to delete all three of these.

11:55:13

And that in some way tests out this case where.

11:55:17

Also it tests out the second case where the second string is empty.

11:55:22

Now we can test this case.

11:55:26

In this case also the in this case also the solution is three great looks fine.

11:55:33

Let's test this case where STR one I one and STR two I two are equal.

11:55:38

So if you have integer and let's say you have India.

11:55:46

So I and I and would be the same.

11:55:49

So these would get skipped and here is where the recursion would kick in.

11:55:53

So if you would have to be changed to D and then you would have to add I and E.

11:55:57

That looks fine too.

11:55:59

And let's check intention and exception once again. I don't know what's wrong here.

11:56:08

Let's see.

11:56:10

So possibly is it possible to do it with four I don't know it's maybe possible to do it with just four changes.

11:56:32

If you change I you delete I and then you delete N and then you delete P.

11:56:39

Delete I substitute these two.

11:56:49

I don't think it is possible which is four changes.

11:56:54

So there's probably an issue.

11:57:00

I don't know what's wrong here. It's possible I may have made a mistake here.

11:57:05

Let me try another Saturday and Sunday.

11:57:13

Okay. So Saturday, STR needs to be changed to Sunday, SUN.

11:57:20

Now S is the same. So ATUR needs to be changed to UN.

11:57:26

So you remain the same. Now if we can what we can do is we can probably delete a delete P

11:57:33

and take replace R with N. So this seems to be fine.

11:57:44

All right. So we'll probably unless I'm not seeing this.

11:57:48

So you have in tension and you have exception.

11:57:55

Unless I'm not seeing something it seems like we may have made a mistake.

11:57:59

One thing we could do is we can simply print out the strings that we're checking.

11:58:03

So let's see STR one is Ivan onwards and STR two is I two onwards.

11:58:25

We were first checking intention and exception. Then we check.

11:58:29

It's also print the result here.

11:58:41

Okay. So at this point, I would probably look through the loop here and see if it is correct.

11:58:47

Coming properly. So you have intention and exception. First we delete I.

11:58:50

Then we delete N. Then we delete T. Then we delete okay. Then we compare ENE.

11:58:55

So then we come back to N and exception and so on.

11:59:00

I think we'll have this might take some time to fix.

11:59:03

We'll come back to intention and exception.

11:59:05

But supposing we've solved the.

11:59:13

Supposing we've written the recursive solution correctly.

11:59:16

We have the recursive solution here.

11:59:18

So let me just grab that and put that in here.

11:59:22

Let's see what's different.

11:59:37

Okay. Probably the answer is four because I'm still getting four.

11:59:40

But supposing we have the recursive solution here.

11:59:42

We have main edit distance. This is the recursive solution.

11:59:46

And now what you need to do is you need to find out the complexity of the recursive solution.

11:59:52

Now to find the complexity of the recursive solution.

11:59:55

What we can do is simply look at the recursive calls in the worst case.

12:00:00

So how you start out is you start out with a string of length N1.

12:00:04

Let's say an string of length N2.

12:00:06

We have one string of length N1 and one string of length N2.

12:00:13

Then you call either you call this main edit distance with I1 plus 1 and I2 plus 1.

12:00:20

So STR1 and STR2 you call them with I1 plus 1 and I2 plus 1.

12:00:25

So that's one possibility.

12:00:30

Or you call three recursive calls.

12:00:33

Now one recursive call is the good case where these two match up.

12:00:36

So we want to look at the worst case where these two things don't match up.

12:00:40

So in that case you make three recursive calls.

12:00:42

So you make three recursive calls.

12:00:44

And in each recursive call you are then going to reduce the problem size by one.

12:00:49

So you're either going to decrease I2 or you're either going to decrease the size of the first string.

12:00:55

Or you're going to decrease the size of the second string.

12:00:58

Or you're going to decrease the sizes of both strings.

12:01:01

So just to keep things simple.

12:01:04

Let's assume that in all three we are decreasing the size of either one of the strings by one.

12:01:08

So we are decreasing the total problem size which is N1 plus N2 by one.

12:01:13

So the number of levels of recursion is going to be the total number of total length of each of the two strings.

12:01:20

So let's maybe just draw that graph here as well.

12:01:25

So let's take this.

12:01:27

So here you have N1 comma N2.

12:01:30

So let's assume these are the lengths of the two strings.

12:01:38

Now N1 plus N2 what happens to it is that this N1 plus N2 calls three recursive functions.

12:02:01

So there are three recursive functions.

12:02:03

So let's just draw those three recursive functions.

12:02:08

So we have those three recursive functions here.

12:02:16

Let's take this two.

12:02:23

And then those three recursive functions.

12:02:26

What we have is.

12:02:36

You reduce either you reduce the size of the first string or you reduce the size of the second string or you reduce the size of both strings.

12:02:45

So either you end up with N1 minus 1 and N2.

12:02:52

And let's reduce the size of that.

12:03:01

We end up with N1 and N2 minus 1 or we end up with N1 minus 1 and N2 minus 1.

12:03:17

So these are the three recursive calls that we're doing.

12:03:20

And then each of these will once again make three more recursive calls.

12:03:28

And so on. Now what is the depth overall depth of this recursive call?

12:03:33

Now because we can see that each time the size of the problem reduces by one.

12:03:38

So if the size of problem is N1 by plus N2 in this case it reduces by one in this case it reduces by one.

12:03:43

And in this case it reduces by two but for simplification let's say it reduces by one here.

12:03:48

So the total size of the problem the total number of levels in this tree is going to be N1 plus N2.

12:03:58

So you have three problems in the first layer the second layer we'll have three square problems.

12:04:02

The third layer we'll have three cube problems the three times three times three.

12:04:06

And similarly you can go ahead and you'll find that at the last layer you'll have three to the power N1 plus N2 minus 1 layers.

12:04:13

And if you then altogether all the layers what you end up with is that total total number of sub problems is three to the power N1 plus N2.

12:04:23

So you have a total of three to the power N1 plus N2 sub problems that you end up creating.

12:04:31

And because of that you have the complexity three to the power of N1 plus N2 in this case.

12:04:42

So that's that's the complexity so here we have a recursive solution and then we have the complexity of the recursive solution which is exponential three to the power of N1 plus N2.

12:04:53

Now at this point it will make sense to add memoizations so whenever you see recursive solutions and you see repeated problems for example here itself you can see a repeated problem.

12:05:03

And then you can see that this problem will get repeated inside this problem and inside this problem too.

12:05:08

So there are a lot of repetitions and all we need to do is remove some of those repetitions and to remove those repetitions we can use memoizations.

12:05:15

So what happens in the memo solution it is exactly the same as the recursive solution but before doing any computation we check a memo we check a dictionary if we already have the solution for the changing variables which is I1 and I2.

12:05:33

And if we have those.

12:05:37

If we have those solutions what we need to do is just return them directly.

12:05:43

If we do not have those solutions we need to compute the solutions put them in the memo and then return the value from the memo.

12:05:50

So let's write the memo is version so we have min edit instance with STR1 and STR2.

12:06:01

And this we are calling memo.

12:06:16

Okay this we are calling memo.

12:06:19

Now we have a memo.

12:06:25

The memo is going to be a dictionary and the dictionary is empty and then we define a function recurs.

12:06:34

So in memoization normally of to write a recursive helper function now you can either write this outside or inside.

12:06:42

Because well it will have access to STR2 and they do not need to be passed in.

12:06:49

So here we have I1 and I2 and first thing we do is recreate a key so the key is I1 comma I2.

12:06:55

Now if key in memo which means if we have already computed the solution then we simply return memo of key.

12:07:02

If not then we have all the other cases so now we have LIF.

12:07:06

Now we can check if I1 equals LEN of STR1.

12:07:17

In that case don't return set the memo of key to LEN of STR2 minus I2.

12:07:28

LIF I2 equals LEN of STR2.

12:07:39

Then we return memo of key is LEN of STR1 minus I1.

12:07:52

Okay in this case then we check if the first elements are equal.

12:07:56

We have the exact same logic you can see the same cases coming up here.

12:07:59

So if you have STR1 of I1 equals STR2 of I2.

12:08:06

In this case we have memo of E equals.

12:08:09

We simply ignore the first characters so we increment I1 and I2.

12:08:12

So exactly what we have done here.

12:08:14

So we simply call recurs this time with I1 plus 1 and I2 plus 1.

12:08:19

So we always call the recursive function but inside the recursive function.

12:08:22

If it is already been computed it will return from the memo.

12:08:26

And finally if we have and this is the final case which is where they are not equal.

12:08:32

So here memo of key becomes 1 plus min of let's see here.

12:08:44

So we have recurs so the insertion cases.

12:08:49

We will ignore the first element.

12:08:52

So the deletion cases we will ignore the first element of the current range from the first string.

12:08:57

So we recall recursive I1 plus 1 and I2.

12:09:01

Otherwise we call recursive with I1 plus 1 and I1 plus I2 plus 1.

12:09:06

This is the case where we swap the first element of the first string.

12:09:10

So we can just recursively check after ignoring the first element of each.

12:09:15

And then we have recurs with I1 comma I2 plus 1.

12:09:22

I2 plus 1 and there we go and that's it.

12:09:26

So now we have stored it in the memo and then we simply return memo of e at the variant.

12:09:32

And finally we call recurs 0 0 and that is our solution.

12:09:41

And there is a syntax error you can fix these syntax errors easy to fix.

12:09:46

And I've just realized that the solution in this case might actually be for because what we can do is we can change n to p.

12:09:56

So that's one step.

12:09:57

We can replace i n t with e xc.

12:10:01

So we replace i n t with e xc that's three changes.

12:10:04

We don't change e and we replace n with p.

12:10:08

The solution is for so our solution was correct.

12:10:11

There was no issue there.

12:10:14

In fact, this is not the best solution.

12:10:17

This is a sub optimal solution.

12:10:20

So this output should be forward.

12:10:23

And that's okay.

12:10:24

This is something that happens all the time where you miss something.

12:10:28

And you just assume that you just say that you're going to come back to it at the end and then you move forward.

12:10:35

And you're assuming that that code was right and then you realize either you are correct or what you mistake was.

12:10:44

It's probably going to happen in one of five interviews anyway.

12:10:48

Okay, so now we've written a memo is solution.

12:10:51

Great.

12:10:52

And we can start checking the memo is solution now.

12:10:54

So minimum edit distance memo.

12:10:56

Let's call main edit distance memo.

12:10:58

And we get back the value for looks fine.

12:11:01

Let's try sat 30 and Sunday as we have.

12:11:15

So that's three.

12:11:16

So what you will do is you will leave a as it is change 80 you are to UN by removing 80 and changing R to N.

12:11:25

That seems fine.

12:11:26

Let's test out some cases like this.

12:11:29

Okay, this is three six eight characters.

12:11:31

So that seems right.

12:11:33

We simply delete all the characters.

12:11:35

Let's check out this.

12:11:38

Here also eight characters.

12:11:40

We have to add eight characters.

12:11:41

Let's say we have ABC and XYZ.

12:11:45

So this should be three.

12:11:47

If it is XYZK, then maybe that will be four.

12:11:52

What if it's XYZA?

12:11:54

In this case, also it's four.

12:11:57

So this seems to be working fine.

12:12:00

We have now taken the recursive solution.

12:12:03

Identified the inefficiency calculated the complexity which was exponential.

12:12:07

Identified the inefficiency and then which was repeated sub problems.

12:12:11

And then fix the inefficiency by calling main edit by using memoization.

12:12:17

And now how do you compute the.

12:12:21

Time complexity of memoization.

12:12:23

Well, the argument is if you only need to compute the solution for a key once.

12:12:28

And the computation apart from the recursive calls simply involve some comparison and a fixed number of comparison and an addition.

12:12:36

So the time required to compute assuming you have the recursive solutions is constant.

12:12:41

So that means if you simply count the number of memoizations that can possibly occur.

12:12:46

That gives you an upper bound on the total number of operations.

12:12:50

It will be some multiple of that some constant multiple.

12:12:54

So I want can take the values 0 to n1 where n1 is the length of string 1.

12:12:59

And I do can take the values 0 to n2 where n2 is the length of string 2.

12:13:03

So memo the keys and memo can be i1 comma i2.

12:13:07

So we have n well n1 values for i1 and 2 values for i2.

12:13:10

So that makes it n1 times n2.

12:13:13

That's a number of keys and that because there's a constant amount of time.

12:13:18

Extraditional time required to compute the solution for a key.

12:13:21

That is also the complexity. So the complexity is order n1 plus n2.

12:13:25

So we've gone from 3 to the power of n1 plus n2 which grows very quickly.

12:13:29

Even for 3 to the power of.

12:13:34

3 to the power of 10 is pretty high.

12:13:38

We can check it out here 3 to the power of 10 is something like 59,000 3 to the power of 100.

12:13:43

So if you have n1 plus n2 then that's e to the 47 that's going to be a lot of operations on the other hand.

12:13:49

If for in with memoization it is only going to take let's say the 100 is split as 2 strings of length 50 and 50.

12:13:58

Only going to take 2,500 operations.

12:14:01

So where it towards taking 10 to the 47 operations now it takes only.

12:14:06

2,500 operations which is pretty small.

12:14:09

And still work with lists of size up to 10,000 or 100,000 very easily using the memoization.

12:14:18

So that covers this problem.

12:14:20

And keep talking through your solution even as your stock even as your confused just as I was.

12:14:26

It's helpful to just keep spend maybe 2 or 3 minutes trying to solve the issue.

12:14:31

And if you're not able to solve the issue just say that this is something I'll fix later and then move on assuming that you fixed it.

12:14:37

And then keep talking and keep continue keep working on the solution and at some point later it's possible that the solution might try to.

12:14:49

Okay.

12:14:50

Now at this point.

12:14:52

You may be asked sometimes to implement a dynamic programming or an iterative solution.

12:14:58

Like though when you talk to the interviewer and you're telling them that this is how I'm thinking I'm doing.

12:15:03

To a recursive solution first and I can see that maybe there are going to be some problems there then I'm going to then apply dynamic programming.

12:15:13

So you can just check with them and in most cases they will accept a memoization solution because the dynamic programming solutions can take a little bit of time to solve to.

12:15:22

And they're always off by one errors and it's also difficult to explain the solution so you can most most cases get away with memoization but if they do ask you to do it with iteratively with dynamic programming then you'll have to go ahead and implement the dynamic programming solution.

12:15:37

So once again take a couple of minutes now and work it out on a piece of paper and then go back to them now for dynamic programming remember you have to create a table essentially.

12:15:48

So what the table will look like in this case is let's see if we can simulate a table.

12:15:56

So what the table will look like is.

12:16:01

Let's create a new sheet.

12:16:06

And in this sheet let's put the two words which is intention.

12:16:13

Okay and let's put the word exception as well.

12:16:43

Move this down to and let's also put in the indices ultimately this is what a dynamic programming looks like programming problem looks like.

12:17:08

You are ultimately going to create a table here.

12:17:17

And how we start filling the table is the ijth element so let's say this element.

12:17:24

So this element represents the edit distance or the number of operations required to convert i and p e into e x c e.

12:17:36

And how do you check what the solution is now you know that e and e are equal so the final elements are equal so what that means is we look at this value then this value should tell us.

12:17:49

What is the minimum edit distance between e x e and e x e now since we can simply add e to each string and get this solution.

12:17:55

That means this solution is equal this value should is equal to this value alright.

12:18:00

So in the case where the corresponding elements are equal we simply copy over the value diagonally left top left value onto the current cell.

12:18:10

The other option is if they are not equal so let's say if we are here where here you have n and here you have p now there are three possibilities you you want to find.

12:18:21

The minimum edit distance between i and t and e x c e p.

12:18:28

Now n is not equal to p and this is the original string so either we delete n now if we delete n then we need to find the solution for i and t e and e x e p.

12:18:41

So if we delete n then this value will become one plus this value that's one possibility.

12:18:47

Or another possibility is that we swap n so we swap n for p.

12:18:53

So now you get this becomes p and this becomes p so this value becomes will become one plus this value because now we can ignore the p and simply get this previous solution for e x e and i and t e.

12:19:04

So this value becomes one plus this value.

12:19:06

Or the final option is that you can insert something just before n so if you insert something just before n.

12:19:18

Which is going to be p so if you insert p just before n.

12:19:26

So if you insert p just after n or before if you insert p just after n then you have p after it already so you can just look at this value.

12:19:35

And this value is going to be one more than this value in the case that you insert something insert p after n.

12:19:42

Right so there are three ways to come to this value either by deleting n or by inserting p or by changing n to p and what you can do is you can take the minimum of three values or these three values and add one to obtain this value.

12:19:58

So that's the logic roughly speaking and you start from the left so you see okay e and i they're unequal so you need one operation to change them and there's nothing else to consider so that's done.

12:20:09

Then e and n they are unequal now you need what you can do is you can either delete n.

12:20:21

If you delete n then you simply need to check e and i.

12:20:26

And you know that the solution for e and i is one so this would be two.

12:20:30

Another other option is that you could possibly insert something but if you insert something the length of i and is going to increase so that's going to cause a problem.

12:20:38

So you can't insert anything another option is you change n with e but if you change n with e then you will no longer be able to.

12:20:49

If you change n with e then you will no longer be able to use this solution.

12:20:56

Right because now you will have to match i with the empty list.

12:21:01

So that's going to be one as well so overall you end up with two and this is how you start filling the list so you start filling up from left to right.

12:21:07

And left to right and keep going top to bottom as you fill out this list finally you will fill out this final value exception and intention and that will be your solution.

12:21:16

So that's the dynamic programming solution and you can see that it's getting tricky to convey the entire solution because there are so many cases involved here so typically you will not find dynamic programming solutions to requested in interviews and it will help you to just stick to the memoization solutions.

12:21:33

All right.

12:21:34

So with that we have covered two common interview questions and you can keep going the idea here is to just apply the method.

12:21:44

Remember the remember the method the problem solving template that we've covered state the problem.

12:21:51

Identify input and output formats write a function signature come up with some example inputs and outputs or at least the scenarios come up with a correct solution stated in plain English.

12:22:00

Implement the solution.

12:22:02

Estet using example inputs and fixed bugs if you face any then analyze the algorithms complexity and identify inefficiencies and finally apply the right technique to overcome the inefficiency and you repeat the process.

12:22:13

Going back and stating the solution implementing analyzing and repeating now you in some cases you do not need to implement the root force solution if you don't have the time.

12:22:23

But when you're working with recursive solutions it always helps to implement brute force first before you do memoization or dynamic program.

12:22:29

And some tips ask questions as many questions as you can as many as you need to clarify the problem show an example all of the method don't panic.

12:22:42

If you get stuck it's a certain point.

12:22:46

Give it a couple of minutes sometimes you can even ask the interviewer and they may be able to tell you that.

12:22:53

Maybe what your error is or maybe you're not stuck at all what your resume you're simply assuming something incorrectly.

12:23:00

But beyond a few minutes what you want to say is that let I fix this later assuming this is correct let's move on and then talk about complexity and optimization and such and such and such.

12:23:13

Very important is to state the brute force solution to the interviewer and if you are unable to figure out a more optimal solution.

12:23:21

Then the best thing you can do is to offer to implement the brute force solution so that you can at least demonstrate that you are able to write code and it's all right in a lot of cases you will not be able to figure out the optimal solution and in some cases there may not be an optimal way.

12:23:35

So there are some there are certain problems where there is just one way and that is the hard way or the brute force way and this is typically very true with a family of problems called back tracking something we've not really covered in a lot of detail.

12:23:49

But it is also another form of recursion.

12:23:55

So what you do next so the next step for you is to review this lecture video and solve these problems yourself or take more problems ideally what you want to do is you want to take all the five different techniques that we've covered and let's quickly review what those five techniques were.

12:24:15

The first one was binary search so we looked at linear search and binary search which is a form of dividing conquer.

12:24:21

And along with that we also understood the complexity and big connotation and then you had some homework on linked list and python classes.

12:24:27

But binary search is something that comes up often and the hint to detect binary search is simply to look for order whenever you see something being something being mentioned mentioned as sorted.

12:24:39

Now that is an indication for you that this may be binary search sometimes what you may have to do is you may have to get things into a sorted form maybe by taking.

12:24:49

Replacing elements by some of values till that element or so on and once you get things into a sorted form maybe then you can do binary search.

12:24:59

That's one way to go about it and once again just two five to ten problems on binary search and you will be able to identify pretty much any binary search question in an interview.

12:25:09

Then the next topic that we looked at was binary search trees traverses and here is something.

12:25:17

That is a generally asked very directly so you will be given a question like binary search tree do something with a binary search tree.

12:25:25

And you can answer that question directly we've covered a lot of different things here so do check out lesson two for all the different things you can do with binary search trees traverses balancing.

12:25:35

And most of these are recursive solutions so it's also good exercise on recursion and we also looked at balance binary trees and how can we optimize them further.

12:25:43

Then you had an assignment on hash tables so hashing is a again a common question that is often asked so we built hash tables from scratch in python and we also handled collisions using a technique called linear probing.

12:25:55

And so this is something you can check out in assignment two so you may get asked just to implement a hash table and python or implement.

12:26:03

Pollution resolution in a hash table in which case you can use linear probing.

12:26:09

Then you have the sorting algorithms where we looked at bubble sort and insertion sort merge sort using dividing conquer and quick sort where we had a quadratic worst case complexity but.

12:26:21

a logarithmic average complexity and that's a good thing because merge sort although it is logarithmic in the.

12:26:29

Worst case it still takes up a lot of space and space allocation is slow and you may also not have the memory.

12:26:36

So that's why we sometimes use prefer quick sort over merge sort when we are constrained for space.

12:26:43

Then assignment three is pretty interesting variable implement an optimal algorithm for polynomial multiplication using dividing conquer so to check out assignment three as well.

12:26:53

Then we looked at dynamic programming we looked at recursion memorization sub sequence and abstract problems and then we finally also didn't cover back tracking and pruning but we'll there are some questions there in the lesson notebook which you can try out which use back tracking and pruning as well.

12:27:10

Then we looked at graph algorithms the last time which was graphs and adjacency list and adjacency matrices.

12:27:17

We looked at the depth first and breadth first search and how to implement them and we also looked at shortest parts and directed and weighted graphs.

12:27:25

This is a very important topic breadth first and depth first search you will get many questions related to these so do solve maybe five questions on each of these topics.

12:27:34

And you should be good with most graph problems as in interviews.

12:27:38

Now this project for you the course project if you haven't seen it already is to pick a coding problem so you can pick a coding problem from an online source like lead code hacker rank geeks or geeks etc.

12:27:48

And then use the problem solving template that we've shared with you.

12:27:52

This problem solving template as a starting point so just give it a name and then write the problem statement and implement the solution step by step.

12:28:01

Use the problem solving template to solve the problem using the method you've learned in the course then document your solution add explanations wherever required or form the complexity analysis.

12:28:11

All of this you should add in the Jupyter notebook and then publish your notebook to your joven profile.

12:28:16

And finally you can submit the link to your joven notebook here.

12:28:21

And you can check out the discussion where you can change where you can post what you what you're working on to do post your notebook as well.

12:28:36

And finally today we have looked at a couple of real interview questions from Amazon and Google and how to go about solving them.

12:28:45

And we also addressed a few issues that we faced along the way.

12:28:51

So that was a helpful exercise.

12:28:56

And that's it. So now you can review the lecture video, execute the Jupyter notebooks complete the assignments and attempt the optional questions.

12:29:03

So that the topics that we've covered they get consolidated and you do not ever have to look at this lecture again.

12:29:09

The practice is what really reinforces and consolidates your learning.

12:29:14

Complete the assignments and attempt the optional questions to practice and participate in forum discussions also very useful when you participate in forum discussions.

12:29:23

Why by answering questions a lot of your own doubts get cleared to do participate in forum discussions and then join or start a study group of possible getting together with a group of four or five people is great.

12:29:34

It really helps you focus and improve your understanding by discussion.

12:29:43

So that status structures and algorithms in Python with that.

12:29:47

Thank you very much for joining us on this journey as we learn data structures and algorithms in Python.

12:29:53

A very useful topic to improve your coding skills and also something that you will almost certainly encounter in one of your interviews no matter which company you're applying to.

12:30:03

So I hope this is helpful to you. Do let us know on the forum how this course helped you if it did.

12:30:10

You can let us know in the YouTube comments as well. If you have questions if something was not clear do post that to when we make sure to come up with clearer explanations and clearer examples the next time.

12:30:22

And if you have any feedback for us to post it in the comments or send us an email at support at www.ai.

12:30:32

With that, I will take leave and I will see you in the forums. This is not the end of our journey with you.

12:30:38

So do stay active on Jovind. There's a lot of great activity happening. Do check out the forums. The newsletter and stay tuned for our next course. Thank you and goodbye.

00:00

Introduction to DS & Algos

01:48

Course Overview

03:46

Lesson Structure

08:40

Understanding Jupyter Notebooks

13:57

Problem Solving Strategy

15:52

Find Card Position in List

15:58

Minimizing Accesses

16:20

Defining Problem Clearly

17:05

Function Signature and Structure

19:03

Creating Test Cases

33:20

Brute Force Solution Overview

34:44

Understanding Linear Search

35:31

Implementing the Solution

37:27

Testing the Function

38:11

Utilizing Evaluate Test Case

49:29

Worst Case Analysis of Problems

50:38

Understanding Algorithm Efficiency

51:00

Algorithm Complexity Terms

52:09

Time Complexity Explained

53:09

Space Complexity Overview

53:44

Understanding Big O Notation

53:50

Dropping Constants in Complexity

54:01

Analyzing Time Complexity Trends

54:45

Understanding Linear Search Complexity

59:14

Introduction to Binary Search

01:22:49

Analyzing Iterations in Algorithms

01:23:37

Understanding Logarithmic Outcomes

01:24:01

Time Complexity of Binary Search

01:24:44

Space Complexity of Binary Search

01:25:04

Comparing Linear and Binary Search

01:26:44

Evaluating Test Case Performance

01:27:46

Linear vs Binary Search Timing

01:29:43

Understanding Algorithm Optimization

01:33:04

Implementing the Generic Strategy

01:35:01

Binary Search in Practice

02:27:00

Understanding Edge Cases in Binary Search

02:27:49

Importance of Debugging Techniques

02:28:12

Analyzing Algorithm Complexity

02:28:46

Submission Options for Binary Search Assignment

02:29:06

Final Steps in the Binary Search Practice

02:29:15

Handling Submission Errors

02:30:00

Testing Your Function Properly

02:30:24

Understanding Assignment Submission

02:31:11

Exploring Optional Interview Questions

02:37:58

Introduction to User Profiles in Python

02:46:06

Defining Special Functions in Classes

02:47:38

Creating a User Database Class

02:49:30

Implementing Methods in User Database

02:50:30

Testing User Profiles and Database

02:58:06

Analyzing Time Complexity of Operations

03:03:19

Using Jovian to Save Notebooks

03:05:03

Understanding Binary Trees

03:06:09

Properties of Binary Trees

03:07:11

Introduction to Binary Search Trees

03:14:03

Implementation of Binary Trees in Python

03:19:11

Creating a Tree from Tuple

03:20:06

Understanding Recursion Basics

03:24:47

Converting Tree to Tuple

03:25:14

Visualizing Tree Structures

03:27:23

Binary Tree Traversals Explained

03:36:10

Understanding Binary Search Trees

03:36:19

Properties of Binary Search Trees

03:37:15

Checking if a Tree is a BST

03:37:46

Finding Minimum and Maximum in a Binary Tree

03:45:29

Inserting Nodes in a BST

03:52:30

Finding Nodes in a BST

03:53:46

Updating a Node Value

03:55:39

Listing All Key-Value Pairs

03:57:39

Understanding Tree Balance

04:02:40

Creating Balanced BST from Sorted List

04:09:17

Performance of Balanced BSTs

04:09:44

Insertion Complexity Analysis

04:11:43

Improving Data Structure Efficiency

04:13:10

Defining a Tree Map Class

04:15:04

Exploring Special Methods in Classes

04:25:52

Understanding Binary Tree Efficiency

04:26:41

Key Properties of Binary Search Trees

04:27:25

Operations on Binary Search Trees

04:28:13

Understanding B Trees in Databases

04:30:37

Introduction to Hash Tables

04:42:31

Creating Custom Test Cases

04:42:53

Hashing Function Overview

04:43:21

Implementing a Hashing Algorithm

04:50:08

Retrieving Data Using Hashing

04:53:10

Using List Comprehension in Python

04:59:11

Creating a Hash Table

05:00:25

Inserting Key-Value Pairs

05:02:04

Updating Values in Hash Table

05:04:24

Handling Data Collisions

05:06:29

Implementing Linear Probing

05:15:56

Hash Tables vs. Binary Search Trees

05:16:42

Understanding Assignment Two

05:16:47

Introduction to Lesson Three

05:17:48

Overview of Sorting Algorithms

05:19:06

Executing Code in Jupyter Notebooks

05:32:31

Understanding Bubble Sort

05:34:13

Bubble Sort Steps and Explanation

05:35:00

Implementing Bubble Sort Functionality

05:38:30

Testing the Bubble Sort Implementation

05:46:00

Analyzing Bubble Sort's Time Complexity

05:49:13

Insertion Sort Complexity Analysis

05:49:30

Free Online Jupyter Notebooks

05:50:29

Capturing Jupyter Notebooks with Jovind

05:50:38

Understanding Divide and Conquer

05:52:34

Introduction to Merge Sort

06:05:55

Understanding Print Statements for Debugging

06:07:24

Analyzing Merge Sort Efficiency

06:08:54

Understanding Merge Sort Recursive Calls

06:10:05

Exploring Merge Operations in Depth

06:15:25

Understanding Time and Space Complexity of Merge Sort

06:25:31

Understanding QuickSort Basics

06:25:31

Partition Function Mechanics

06:25:42

Recursive Calls in QuickSort

06:33:23

QuickSort Performance Evaluation

06:38:26

Space Complexity of QuickSort

06:41:11

Sorting Objects by Likes

06:41:41

Creating a Notebook Class

06:42:38

Implementing Custom Comparison

06:45:00

Merging Sorted Notebooks

06:45:02

Using Merge Sort with Objects

06:45:40

Merging Sorted Lists with Custom Comparison

06:47:30

Sorting Notebooks by Title

06:49:10

Applying Comparison Operators in Sorting

06:49:40

Exercises on Sorting Implementations

06:54:30

Introducing Problem Solving Templates

07:12:35

Creating Test Cases for LCS

07:15:12

Identifying Recursive LCS Scenarios

07:18:25

Constructing the Recursive Function Logic

07:22:21

Understanding Recursive Solution Structure

07:23:54

Implementing the Recursive LCS Solution

07:36:29

Understanding Recursive Function Complexity

07:38:04

Analyzing Recursive Function Inefficiencies

07:38:35

Implementing Memoization Technique

07:39:17

Tracking Intermediate Results

07:43:20

Improving Performance with Memoization

07:47:33

Understanding Memoization vs Dynamic Programming

07:47:47

Dynamic Programming Fundamentals

07:48:53

Creating and Filling the Matrix

07:52:25

Comparing Elements for LCS

07:55:21

Filling Out the DP Table

08:03:26

Understanding Memoization vs Iterative Solutions

08:05:39

Dynamic Programming Problem Solving

08:06:23

Exploring the Knapsack Problem

08:08:51

Defining Input and Output Formats

08:11:48

Maximum Profit Calculation

08:12:45

Example Inputs & Test Cases

08:14:27

Identifying Optimal Solutions

08:18:27

Developing Recursive Solutions

08:20:15

Recursive Function - Max Profit

08:27:21

Dynamic Programming Approach

08:36:13

Filling the DP Table for Knapsack Problem

08:37:29

Handling Off by One Errors

08:41:45

Understanding Dynamic Programming Complexity

08:43:20

Navigating the Dynamic Programming Forum

08:44:11

Introduction to Graph Algorithms

08:53:05

Understanding Graph Representation

08:56:09

Paths and Neighbors in Graphs

08:56:31

Introduction to Adjacency Lists

08:58:25

Creating a Graph Class in Python

08:59:41

Building the Adjacency List

09:00:06

Implementing Graph Traversal Techniques

09:00:06

Implementing Graph Traversal Techniques

09:00:50

DFS - Depth First Search Introduction

09:00:50

DFS - Depth First Search Introduction

09:02:01

Creating Lists of Empty Lists

09:05:08

Understanding Adjacent Lists in Graphs

09:07:09

Initializing the Graph with Edges

09:07:10

BFS - Breadth First Search Introduction

09:07:10

BFS - Breadth First Search Introduction

09:08:29

Using Enumerate to Simplify Code

09:11:42

Printing the Graph Structure

09:11:50

Comparing DFS and BFS

09:11:50

Comparing DFS and BFS

09:17:30

Real World Applications of Graph Traversal

09:17:30

Real World Applications of Graph Traversal

09:22:35

Understanding Heaps and Priority Queues

09:25:07

Heap Operations Explained

09:28:45

Binary Heap Introduction

09:36:40

Building a Max Heap from Array

09:39:40

Heap Sort Algorithm Overview

09:39:40

Understanding Heap Sort Algorithm

09:46:20

Heap Sort Steps Explained

09:48:00

Performance Analysis of Heap Sort

09:49:10

Applications of Heap Sort

09:50:50

Comparing Heap Sort with Other Algorithms

10:34:15

Dijkstra's Algorithm Explained

10:34:31

Running Time Complexities Overview

10:35:14

Analyzing BFS Complexity

10:37:34

Understanding Shortest Path Algorithm

10:40:15

Improving Dijkstra's Algorithm with Min Heap

10:49:12

Understanding Sub-array Problems

10:49:25

Asking for Clarification in Interviews

10:52:07

Identifying Input and Output Formats

10:53:37

Developing Test Cases for Sub-arrays

10:58:36

Implementing Brute Force Solution

11:05:53

Brute Force Solution Analysis

11:07:56

Analyzing Complexity of Solutions

11:10:54

Identifying Inefficiencies in Logic

11:16:20

Optimizing the Brute Force Approach

11:21:43

Implementing Optimized Subarray Sum

11:22:31

Implementing Subarray Sum Optimization

11:24:29

Testing Subarray Sum Implementation

11:27:25

Understanding Complexity of the Algorithm

11:30:01

Introduction to Edit Distance Problem

11:39:13

Recursive Edit Distance Explanation

11:47:18

Analyzing Character Deletion and Insertion

11:48:32

Final Recursive Function Steps

11:53:49

Testing Edge Cases in Recursive Function

11:55:52

Implementing Edit Distance with Recursion

11:59:41

Analyzing Recursive Solution Complexity

12:05:02

Introduction to Memoization

12:10:19

Optimizing Edit Distance with Memoization

12:12:35

Understanding Memoization Complexity

12:14:49

Dynamic Programming Implementation Steps

12:16:57

Explaining the Edit Distance Table

12:21:43

Reviewing Problem-Solving Techniques

12:29:13

Participate in Forum Discussions

12:29:22

Collaborating in Study Groups

12:29:33

Completing Assignments and Optional Questions

12:29:46

Conclusion and Closing Remarks

12:30:50

Understanding Recursion Concepts

12:34:12

Base Case in Recursion

12:41:42

Recursive Function Implementation

12:50:50

Visualizing Recursive Process

12:57:32

Understanding Stack Overflow

00:00

What is the course structure and focus?

02:44

How to access the course resources easily?

09:16

What is the first programming problem we'll tackle?

10:36

Why learn data structures for interviews?

15:58

How many times should you access elements in the list?

16:20

What is the importance of stating the problem clearly?

19:00

Why write out test cases before coding?

24:50

How do you handle edge cases effectively?

33:35

Why is it crucial to share your brute force solution during the interview process?

35:02

How can you express your algorithm in your own words for better clarity?

41:20

What strategies help you pinpoint errors in your code effectively?

46:33

How to ensure your solution passes all test cases after changes?

49:46

How many minimum cards does Bob really need to flip?

50:33

What does algorithm analysis focus on beyond execution time?

51:50

Why is worst-case complexity crucial in programming?

53:10

How does input size impact the space used by algorithms?

53:44

How do we represent worst-case complexity with Big O notation?

53:50

Why drop constants when analyzing algorithm complexity?

59:17

What are the steps for applying binary search effectively?

01:11:42

How to fix issues with binary search for multiple occurrences?

01:22:59

How is k related to n in binary search iterations?

01:24:01

What does the logarithmic time complexity look like in binary search?

01:25:04

Can you explain the difference in complexities between linear and binary search?

01:25:23

What happens when we use large test cases for algorithm comparisons?

12:30:50

What are the fundamental principles behind recursion?

12:34:12

How do you effectively establish a base case in your function?

12:41:42

What steps are needed to implement a recursive solution in Python?

12:50:50

How can visualizing recursion help in understanding the concept better?

01:27:46

How much faster is binary search compared to linear search in large datasets?

02:23:30

What is the significance of distinguishing between the left and right in binary search?

02:25:20

Why is discussing your understanding of problems crucial in interviews?

02:20:33

How does the number of iterations play into algorithm complexity analysis?

02:27:05

What edge case should we consider in binary search scenarios?

02:27:28

How can print statements assist in debugging during tests?

02:28:13

Why is it crucial to analyze your algorithm's performance?

02:29:06

What are the steps for submitting the binary search assignment?

02:30:00

What is the best strategy for testing your function before submission?

02:29:15

How can you effectively handle errors during code submission?

02:38:03

What are the key operations for managing user profiles?

02:40:39

How do you represent user profiles using classes in Python?

02:46:06

What are the special functions defined in a class?

02:47:38

How to create a user database class in Python?

02:50:16

What scenarios can be used to test user database methods?

02:58:12

How does time complexity impact database operations?

03:03:21

What is the main advantage of using Jovian for Jupyter notebooks?

03:06:12

How does a binary tree structure work in data organization?

03:07:13

What essential properties define a binary search tree?

03:14:08

How can you implement a binary tree structure using Python?

03:19:16

What method can convert tuples into tree structures effectively?

03:20:07

How does recursion play a role in creating binary trees?

03:27:28

What are the steps to perform in order traversal of binary trees?

03:30:05

How does pre-order traversal differ from in-order traversal?

03:36:19

What defines a binary search tree's unique properties?

03:37:15

How do you verify if a binary tree is a BST?

03:37:46

What are efficient ways to find tree minimum and maximum keys?

03:45:29

How does the insertion process affect tree structure?

03:52:33

How do we find a node in a balanced tree efficiently?

03:54:03

What’s the best way to update a value in a binary search tree?

03:56:17

How does in-order traversal yield sorted key-value pairs?

04:10:47

How much faster is a balanced BST compared to a regular list search?

04:11:47

What is the biggest advantage of using the right data structure for users?

04:21:50

How can we encapsulate functionality for ease of use in Python classes?

04:26:02

How does maintaining balance improve binary tree efficiency?

04:27:19

What are the key properties that make binary search trees useful?

04:37:07

Why are hash tables crucial for data retrieval speed?

04:42:36

How can you create your own test cases effectively using asserts?

04:43:21

What makes a good hashing algorithm and how to implement one easily?

04:53:08

How does list comprehension simplify operations on lists in Python?

05:04:24

How do you prevent data loss in a hash table due to collisions?

05:03:47

What happens when two different keys hash to the same index?

05:05:17

How can you find an empty index in a hash table using probing?

05:15:56

When should you use hash tables over binary search trees?

05:23:52

How do we effectively test sorting algorithms with various cases?

05:20:38

What is the most efficient way to handle millions of notebook entries?

05:34:15

How does bubble sort push large numbers to the end?

05:37:27

What makes bubble sort simple in Python?

05:46:56

Why does bubble sort take significant time with large datasets?

05:49:21

How does insertion sort compare to bubble sort in efficiency?

05:50:41

What is the divide and conquer strategy and how does it work in sorting?

05:55:19

How does the merge operation play a critical role in merge sort?

06:05:55

How can print statements simplify debugging our algorithms?

06:06:41

What makes merge sort significantly faster than bubble sort?

06:09:23

How do merge operations lead to the final sorted array?

06:25:36

What is the role of the pivot in QuickSort's partitioning process?

06:27:31

How do pointers help in the partition operation of QuickSort?

06:36:00

Why is QuickSort efficient compared to Merge Sort in practice?

06:41:11

How do we sort notebooks based on likes efficiently?

06:42:38

What does a custom comparison function look like for objects?

06:45:02

How does merge sort handle object sorting in Python?

06:46:40

How do we sort notebooks by likes and titles effectively?

06:47:40

What makes a great custom comparison function for sorting?

06:53:40

How does this assignment template help streamline your coding tasks?

07:12:35

How do we create comprehensive test cases for the longest common subsequence algorithm?

07:15:12

What steps should we follow to build a recursive function for finding LCS?

07:22:21

How can visualizing the recursive tree help us understand the LCS algorithm better?

07:43:20

How does memoization significantly speed up recursive algorithms?

07:46:22

Can storing intermediate results really reduce computation from billions to hundreds?

07:46:39

What role do indices play in the efficiency of storing results?

07:47:47

What are the downsides of memoization in larger problems?

07:48:47

How does dynamic programming provide a solution to recursion issues?

07:51:54

What does the dynamic programming table represent in LCS?

08:04:40

How to visualize dynamic programming solutions effectively?

08:04:46

What strategies help avoid off-by-one errors when building tables?

08:07:14

How does the Knapsack problem illustrate decision-making in optimization?

08:13:48

How do we ensure our test cases cover all scenarios for the knapsack problem?

08:25:30

What are the risks of computing the maximum profit via recursion without memoization?

08:31:19

What is the structure and logic of the dynamic programming table for the knapsack problem?

08:37:29

What common mistakes occur in dynamic programming implementation?

08:41:45

How do we derive the time complexity in our dynamic programming solution?

08:47:52

Why is structuring questions important when representing graphs?

08:53:38

How to effectively represent graph connections with edges?

08:56:40

Why is adjacency list a more efficient graph representation?

08:58:25

What are the steps to create a graph class in Python?

09:00:50

What is the main difference between DFS and BFS in traversal?

09:07:10

How do traversal techniques affect data structure performance?

09:17:30

What are some practical applications of graph traversal algorithms?

09:07:10

How do Depth First Search and Breadth First Search differ in graph traversal?

09:17:30

What are some practical applications of graph traversal algorithms?

09:11:50

How does DFS navigate through graphs compared to BFS?

09:02:17

What common bug should you avoid when creating lists in Python?

09:04:04

How can you use the range function to generate multiple empty lists efficiently?

09:08:49

What is the advantage of using enumerate for cleaner code?

09:22:35

What is the purpose of heaps in data structures?

09:25:07

How do priority queues utilize heaps for efficiency?

09:39:40

What steps are involved in heap sort?

09:46:20

What makes Heap Sort stand out among sorting algorithms?

09:50:00

How does Heap Sort compare in performance to QuickSort and Merge Sort?

09:49:10

What are practical applications of Heap Sort in real-world data sorting?

10:35:14

What are the complexities of BFS and Dijkstra's algorithm?

10:40:25

How can a min-heap optimize the shortest path algorithm?

10:48:47

Why asking clarifying questions in interviews is beneficial?

10:49:19

What questions should you ask when the problem is unclear?

10:53:54

How can you ensure your solution handles all edge cases?

10:58:33

What does a brute-force solution look like in practice?

11:09:15

How can maintaining a running sum optimize our solution?

11:21:11

What are the steps in implementing the optimized algorithm using two pointers?

11:12:01

Why is it essential to check for off-by-one errors during coding?

11:22:31

How does the running sum technique help optimize the subarray sum problem?

11:28:30

What is the greedy approach in solving algorithmic problems?

11:31:42

How many steps are required to convert one string to another in the edit distance problem?

11:40:01

What key operations are performed in the recursive edit distance approach?

11:40:54

How does the solution differ depending on character match or mismatch?

11:45:02

How do we define the end cases for the recursive function effectively?

11:56:48

What corrections can improve the recursive solution for edit distance?

11:59:51

How do recursive calls impact the complexity of edit distance?

12:05:02

What are the benefits of using memoization for edit distance calculations?

12:12:35

How does memoization change the complexity?

12:17:07

What key operations apply for the edit distance algorithm?

12:21:59

How does the problem-solving template improve coding interviews?

12:29:13

How can forum discussions clear up your doubts during the learning process?

12:29:22

What's the value of collaborating with peers in study groups for mastering concepts?

12:29:33

Why is it essential to complete assignments and tackle optional questions?


RekursionString (Informatik)Software-EntwicklungAlgorithmusLösung von ProblemenPython (Programmiersprache)Objektorientierte ProgrammierungStruktur der DatenAufgabenstellung (Informatik)

Beschreibung

Der Inhalt konzentriert sich darauf, die grundlegenden Konzepte von Datenstrukturen und Algorithmen zu lehren, wobei Python als Hauptsprache verwendet wird. Dazu gehört das Lernen über verschiedene Arten von Datenstrukturen wie Arrays, verkettete Listen, Stacks, Warteschlangen, Bäume und Graphen. Das Video geht auch auf algorithmische Techniken wie Sortieren, Suchen und das Durchlaufen von Graphen ein. Ein grosser Schwerpunkt liegt darauf, zu verstehen, wie diese Konzepte in realen Szenarien angewendet werden, um komplexe Probleme zu lösen. Am Ende dieser lehrreichen Erfahrung sollten die Lernenden ein solides Verständnis der Programmiergrundlagen erlangen, ihre Problemlösungsfähigkeiten verbessern und Vertrauen gewinnen, technische Interviews mit Leichtigkeit zu meistern.