Learniverse

Einführung in Datenstrukturen

00:30

All right, so this is CS50 and this is lecture four. So we're here in beautiful, low,

00:56

lecture halls and senders is in use today and we're joined by some friends that will soon

00:59

be clear and presence in just a moment. But before then, recall the last time we took a look

01:05

at CS50 IDE. This was a new web-based programming environment, similar in spirit,

01:09

to CS50 sandbox and CS50 lab, but added a few features. For instance, what features did it

01:14

add to you? To your capabilities. Yeah. What's that? The debugger. So debug 50, which opens that

01:22

side panel that allows you to step through your code, step by step, and see variables? Yeah. Sorry,

01:29

check 50 as well, which is a CS50 specific tool that allows you to check the correctness of your

01:33

code, much like the teaching fellows would when providing feedback on it, running a series of tests

01:38

that pretty much are the same tests that a lot of the homeworks will encourage you yourself to

01:42

run manually, but it just automates the process and anything else? So that is true, too. There's a little

01:53

hidden easter egg that we don't use this semester, but yes, indeed. If you look for a small

01:58

puzzle piece, you can actually convert your C code back to scratch, like puzzle pieces, and back,

02:03

and forth, and back to forth, thanks to cream and some of the team. So that is there. But by now,

02:08

it's probably better to get comfortable with text as well. So there's a couple of other

02:11

tools that we've used over time, of course. Besides, check 50 and debug 50, we've of course

02:16

used print-tap, and when is print-tap useful? Like, when might you want to use it, beyond needing

02:23

to just print something because the problem set tells you to. Yeah. Yeah, so to find where

02:28

your bug is. If you just kind of want to print out variables, value, or some kind of text,

02:33

just so you know what's going on. And you don't necessarily want to deploy debug 50, you can do that,

02:37

went out. Good, yeah. Indeed. Well, and you really, so you might want to use print-tap when

02:51

you have, maybe a nested loop, and you want to put a print-tap in the inside loop, so it's to see

02:56

when it kicks in. Of course, you could use debug 50, but you might end up running debug 50,

03:01

you're clicking next, next, next, next, next, next, there's so many times that that gets a little

03:04

tedious, but do keep in mind, you could just put a breakpoint deeper into your code as well,

03:08

and perhaps remove an earlier breakpoint as well. And honestly, all the time,

03:12

whether it's in C or other languages, do I find myself occasionally using print-tap,

03:16

just to type out print-tap in here, just so that I can literally see if my code got to a certain

03:21

point in here to see if something's printed. But the debugger you're going to find now,

03:25

and henceforth, so much more powerful, so much more versatile. So if you haven't already got into

03:29

the habit of using debug 50, biomeans start and use those breakpoints to actually walk through

03:34

your code, where you can, or see what's going on. So style 50, of course, checks the style of

03:38

your code, much like the teaching fellow's might, and it shows you in red or green,

03:42

what spaces you might want to delete, what spaces you might want to add, just a pretty

03:45

things up, so it's more readable for you and others. And then what about help 50?

03:49

When should you instinctively reach for help 50? Yeah, exactly. Yeah, when you don't understand

03:57

an error message, so you're compiling something, you're running a command, it doesn't really

04:00

quite work, and you're seeing a cryptic error message. Eventually you'll get the muscle memory

04:04

in the sort of exposure to just know, oh, I remember what that means, but until then, run

04:08

help 50 at the beginning of that same command, and it's going to try to detect what your error is

04:13

and provide TF like feedback on how to actually work around that. You'll see two on the course's

04:19

website is an wonderful handout made by Emily Hong, one of our own teaching fellows that introduces

04:25

all of these tools, and a few more, and gets you into the habit of thinking about things

04:30

as kind of a flowchart. If I have this problem, then do this, else if I have this problem,

04:33

do this other thing, so do check that out as well. But today, let's introduce really the last,

04:39

certainly for see of our command line tools that's going to help you chase down

04:43

problems in your code. Last week, recall that we talked about memory, a lot. We talked about

04:48

malloc, allocating memory, and we talked about freeing memory in the like, but it turns out you

04:52

can do a lot of damage when you start playing with memory. In fact, probably by now, almost everyone,

04:57

segmentation fault. Yeah, so that's just one of the errors that you might run into, and frankly,

05:03

you might have errors in your code now and henceforth that have bugs, but you don't even

05:08

realize it because you're just getting lucky, and the program is just not crashing, or it's not

05:12

freezing. But this can still happen, and so Valgrind is a command line program that is probably

05:18

looks the scariest of the tools we've used, but you can also use it with help 50 that just

05:22

tries to find what are called memory leaks in your program. We're called that last week,

05:26

we introduced malloc, and malloc lets you allocate memory. But if you don't free that memory,

05:31

by literally calling the free function, you're going to constantly ask your operating system,

05:35

macOS, Linux, Windows, whatever, can I have more memory? Can I have more memory? Can I have more

05:39

memory? And if you never literally hand it back by calling free, your computer may very well slow

05:44

down, or freeze, or crash, and frankly, if you've ever had that happen on your Mac or PC,

05:48

very likely, that's what some human accidentally did. You're she just allocated more and more

05:52

memory, but never really got around to freeing that memory. So Valgrind can help you find those

05:58

mistakes before you or your users do. So let's do a quick example. Let me go into CS50 IDE,

06:03

and let me go ahead and make one new program here. We'll call it memory.c, because we'll see

06:09

later today how I might chase down those memory leaks. But for now, let's start with something

06:13

even simpler, which all of you have maybe done by now, which is to accidentally touch memory

06:18

that you shouldn't, changing it, reading it, and let's see what this might mean. So let me do the

06:22

familiar at the top here, include standard IO. Well, let's not even do that yet. Let's just do this

06:31

first. Let's do int main void, just to start a simple program, and in here, let me go ahead and just

06:37

call a function called f. I don't really care what its name is for today. I just want to call

06:41

the function f, and then that's it. Now this function f, let me go ahead and define it as follows.

06:47

Void, f void, it's not going to do much of anything at all. But let's suppose just for the sake of

06:52

discussion that f's purpose and life is just to allocate memory. For whatever useful purpose,

06:56

but for now, it's just for demonstration sake. So what's the function with which you can allocate

07:01

memory? Malach. So suppose I want to malach space for, I don't know, something simple like just

07:07

one integer, like we're just doing this for demonstration purposes, or actually let's do more.

07:11

10 integers, 10 integers. I could of course do, well, give me 10, but how many bytes do I want?

07:18

How many bytes do I need for 10 integers? Yeah, so I can do literally size of int, and most likely,

07:24

the size of an int is going to be for probably on many systems today. It's just four bytes,

07:32

or 32 bits, but you don't want to hard code that list. Someone else is computer not

07:36

used those same values, so the size of an int. So 10 times the size of an int. Malach returns

07:41

what type of data. What is it going to hand me back? Yeah, returns an address or pointer.

07:49

Specifically, the address, 100, 900, whatever, of the chunk of memory, it just allocated for you.

07:55

So if I want to keep that around, I need to declare a pointer. Let's just call it x for today.

08:00

That stores that address. Could call it x, y, z, whatever, but it's not an int that it's

08:04

returning. It's the address of an int. And remember, that's what the star operator now means.

08:08

The address of some data type. It's just a number. All right, so now, if I were to first,

08:15

let's clean this up, turns out that to use Malach, I need to use standard lib.h. We've

08:20

saw that last week, albeit briefly. And then, of course, if I'm going to call f, what do I have to

08:25

do to fix this code? Yeah, I need to declare it up here, or I could just move f's implementation

08:32

up top. So I think this works, even though this program at the moment, it's completely stupid.

08:36

It doesn't do anything useful, but it will allocate memory and now do something with it as follows.

08:41

If I want to change the first value in this chunk of memory, well, how might I do that? Well,

08:46

I've asked the computer for 10 integers or rather space for 10 integers. What's interesting about Malach

08:54

is that when it returns a chunk of memory for you, it's continuous, back to back. And when you

08:59

hear continuous or back to back, what kind of data structure does that recall to mind?

09:05

In a ray. So it turns out we can treat this just random chunk of memory like it's an array.

09:10

So if we want to go to the first location in that array of memory, I can just do this and put

09:15

in the number, say, 50, or if I want to go to the next location, I can do this. Or if I want to do

09:22

the next location, I can do this. Or if I want to go to the last location, I might do this. But

09:29

is that good or bad? Why bad?

09:36

Yeah. So it's out of bounds. This is sort of week one style mistakes when it came to loops,

09:40

recall with four loops or while loops, you might go a little too far. And that's fine.

09:43

But now we actually will see we have a tool that can help us notice these things. So,

09:47

hopefully just visually it's apparent that what I have going on here is just a mind 12. I have a

09:53

variable X that's storing the address of that chunk of memory. And then online 13, I'm just

09:58

trying to access location 10 and set the value 50 there. But as you note, there is no location 10.

10:04

There's location 0, 1, 2, 3 all the way through 9. Of course. So how might we detect this with

10:10

a program? Well, let me go ahead and increase my terminal window just a bit here. Save my file.

10:15

And let me go ahead and compile, make memory. Okay. All is well. It compiled without any error messages.

10:20

And now let me go ahead and run memory. Enter. All right. So that worked pretty well. Let's actually

10:27

be a little more explicit here. Just for good measure. Let me go ahead and print something out.

10:31

So print F, percent I for an integer. And let's make it this more explicit. You

10:37

imported percent I. And then comma X bracket 10. And what do I have to include to use print F?

10:45

Yeah. So standard I. Oh, so that's just quickly out that standard I. Oh, dot H save. All right.

10:52

Let me recompile this. Make memory. Enter. And now let me go ahead and do dot slash. Whoops, dot slash

10:58

memory. Huh. Feels like it's a correct program. And yet for a couple weeks now, we've been claiming

11:05

that don't do that. Like don't go beyond the boundaries of your array. So how do we reconcile this?

11:11

Feels like buggy code, or at least we've told you it's buggy code, and yet it's working.

11:17

Yeah. That's a good way of putting it. Let's still there's something that I'm on that.

11:24

Okay. But yeah, and I think if I heard you correctly, you said C doesn't scream if you go too far.

11:36

Yeah, okay. So that's a good way of putting it. Like you can get lucky in C. And you can do something

11:40

that is objectively pedagogically like technically wrong, but the computer's not going to crash.

11:45

It's not going to freeze because you just get lucky because often for performance reasons,

11:49

when you allocate space for 10 integers, you're actually going to get a chunk of memory back.

11:53

That's a little bigger than you need. It's just not safe to resume that it's bigger than you need.

11:58

But you might just get lucky and you might end up having more memory that you can technically

12:02

get away with touching or accessing or changing. And the computer's not going to notice,

12:06

but that's not safe because on someone else's macro PC, their computer might just be operating

12:10

a little bit differently than yours. And bam, that bug is going to bite them and not you.

12:14

And those are the hardest, most annoying bugs to chase down, as some of you might have experienced.

12:18

Right? It works on your computer, but not a friend's or vice versa.

12:21

These are the kinds of explanations for that. So Valgrant can help us track down even these most

12:26

subtle errors. Program seems to be working. Check 50, or tools like it might even assume that it's

12:31

working because it is printing the right thing. But let's take a look at what this program

12:35

Valgrant thinks. Let me increase the size of the terminal window here and go ahead and type in

12:40

Valgrant dot slash memory. So same program name, dot slash memory, but I'm pre fixing it with the

12:46

name Valgrant. All right, unfortunately Valgrant is really quite ugly and it prints out a whole bunch of

12:52

stuff here. So let's take a look. At the very top, you'll see all these numbers on the left and that's just an

12:57

unfortunate aesthetic, but we do see some useful information. Invalid read of size four and then it has

13:03

these cryptic looking letters and numbers. What are those? They're just addresses and hexadec

13:10

small. It doesn't really matter what they are, but Valgrant can tell us where the memory is that's

13:14

acting up suspiciously. You can then see next to that that Valgrant is pointing to function f on

13:20

memory dot c's 15th line. So that's perhaps helpful and then main on line eight because that's the

13:25

function that was called. So Valgrant is actually kind of nice in that it's showing us all of the

13:29

functions that you called from bottom up, much like the stack from last week. And so something's going wrong

13:35

online 15. And if we go back to that, let's see, line 15 was sure enough. I'm actually trying to

13:42

access that memory location. And frankly, I did an online 14 as well. So hopefully fixing one or

13:47

both of those will address this issue. So and notice here, um, this frankly just gets overwhelming

13:54

pretty quickly. And then, oh, 40 bytes in one block are definitely lost in loss record.

13:59

I mean, this is the problem with Valgrant. Honestly, it was written some years ago, not particularly

14:03

user friendly. But that's fine. We have a tool to address this. Let me go ahead and rerun

14:08

Valgrant with help 50 enter and see if we can't just assist with this. All right. So still the

14:14

same amount of black and white input, but down here now, help 50 is noticing, oh, I can help you with

14:19

an invalid right of size four. So it's still at the same location, but this time, or rather,

14:24

same file, memory.c, but line 14. And we proposed, looks like you're trying to modify four

14:30

bytes of memory that isn't yours. Question mark. Did you try to store something beyond the

14:34

bounds of an array? Take a closer look at line 14 of memory.c. So hopefully, even though Valgrant's

14:39

output is crazy as a taric, like at least that yellow output will point you toward, ah, line 14.

14:44

I'm indeed touching four bytes in integer that I shouldn't be. And so let's go ahead and fix this.

14:50

If I go into my program and I don't do this, let's change it to location nine and location nine

14:56

here and save, then let me go ahead and rerun Valgrant without help 50. All right, progress,

15:05

except, oh, nope, not progress. I skipped a step. Yeah, I didn't compile it. It's a little puzzled

15:11

why I saw the same thing. So now let's rerun Valgrant. And here, it seems to be better. So I don't

15:19

see that same error message up at the very top, like we did before, but notice here 40 bytes in one

15:25

blocks, okay, that was bad grammar in the program, but are definitely lost in loss record one of one.

15:31

So I still don't quite understand that. No big deal. Let's go ahead and run help 50 and see what

15:35

the second of two errors apparently is here. So here, it's highlighting those lines, 40 bytes

15:41

and one blocks are definitely lost and looks like your program leaked, 40 bytes of memory.

15:45

Did you forget to free memory that you allocated with Malach? Take a closer look at line 13 of

15:50

memory.c. So in this case, line 13 indeed has a call to Malach. So what's the fix for this problem?

15:59

Per help 50 or your own intuition. What do I have to add to this program?

16:04

Yeah, free. And where does that go? Right here. So we can free the memory.

16:12

Why would this be bad?

16:17

Exactly. We're freeing the memory, which is like saying the operating system. I don't need this anymore

16:21

and yet two lines later, we're using it again and again. So bad. We didn't do that mistake last week,

16:27

but you should only be freeing memory when literally you're ready to free it up and give it back,

16:31

which would probably be at the end of the program. So let me go ahead and resave this.

16:35

Open up my terminal window, recompile at this time, and now let me run Valdron one last time without

16:41

help 50 and still a little verbose, but zero errors from zero context. That sounds pretty good.

16:48

And moreover, it also explicitly says all heap blocks were freed and we're called at the heap

16:54

is that chunk of memory that we drew visually up here, which is where Malach takes memory from. So

17:00

this is kind of the mentality with which to have when approaching the correctest of your code.

17:06

Like it's one thing to run sample inputs or run the program like I did all looked well.

17:10

It's one thing to run tools like Chick 50, which we humans wrote, but we too are fallible certainly

17:14

and we might not think of anything. And thankfully, smart humans have made tools that at first glance

17:18

might be a little hard to use like debug 50 as his valgron now, but they ultimately help you get your

17:24

code 100% correct without you having to struggle visually over just staring at the screen.

17:28

And we see this a lot in office hours, honestly, a lot of students to their credit sort of reasoning

17:33

through staring at the screen just trying to understand what's going wrong, but they're not taking

17:36

any additional input other than the characters on the screen. You have so many tools that can feed you

17:42

more and more hints along the way, so do acquire those instincts. Any questions on this?

17:48

Yeah. If you had a main function that took arguments, would you run Valdron with those arguments

17:56

as well? Yes indeed. So valgron works just like debug 50, just like help 50. If you have

18:02

command line arguments, just run them as usual, but prefix your command with valgron or maybe even

18:07

help 50 valgron to help one with the other. Good question. Other thoughts. Yeah.

18:19

Good question. So at the end of the day, think about what's inside the computer, which is just

18:23

something like this. So physically, it's obviously still there. It's just being treated by the

18:29

operating system, macOS, Windows, Linux, whatever, as like a pool of memory. We keep drawing it as a

18:34

grid that looks a little something like this. So the operating systems job is to just keep track

18:39

of which of those squares is in use, thanks to Malach, and which has been freed. And so you can

18:44

think of it as having little check marks next to them saying, this is in use, this is in use,

18:47

these others are not in use, so they just go back on the so-called free list into that pool of

18:53

good question. If you take a higher level course on operating systems in fact, or see a 61 or

18:57

one 61 at Harvard, you'll actually build these kinds of things yourself and implement tools like

19:02

Malach yourself. Yeah. Good question. Why did we have to allocate memory? In this case,

19:11

we did not. This was purely as mentioned for demonstration purposes, if we had some program in

19:16

which we wanted to allocate some amount of memory, then this is how we might do it. However,

19:22

a cleaner way to do all of this would have been to say, hey, computer, give me 10 integers like this,

19:30

and not have to worry about memory management. And that's where we began in week one, just using

19:33

arrays on the stack, so to speak, not using Malach at all. So the point is only that once you start

19:39

using Malach and free and memory more generally, you take on more responsibilities than we did

19:45

in week one. Good question. Any others? All right. So turns out, there's one more tool in all

19:52

seriousness. This is a thing, DDB 50. So D bug 50 is an illusion to a very popular tool called

20:00

GDB 50. Good new debugger. It's an older tool that you won't use at the command line, but it's

20:06

what makes debug 50 work. Turns out there's a thing and there's an actual Wikipedia article that

20:10

you might have clicked on in my email last night. The called rubber duck debugging. And frankly,

20:15

you don't have to go as all out as excessive as we did here. But the purpose of this technique of

20:20

rubber duck debugging is to keep literally like a rubber duck on your shelf or on your desk.

20:24

And when you have a bug and you don't have the luxury of a teaching fellow or a roommate who

20:28

took CS50 or more technical friend who can help walk you through your code, literally start walking

20:34

through your code verbally, talking to the duck saying, well, online too, I'm declaring main and

20:40

online three, I'm allocating space for an array, and then online four, I'm calling, oh, that's

20:45

what I'm doing wrong. So if any of you have ever had that kind of moment with an office hours

20:49

or alone where you're either talking in your head or you're talking through your code to someone

20:53

else, and here she doesn't even have to respond, you just hear yourself saying the wrong thing

20:59

or having that aha moment, you can approximate that by just keeping one of these little guys

21:04

on your desk and have that conversation. And it's actually not as crazy sounding as it actually

21:09

is. It's that process of just talking through your code logically step by step in a way that you

21:14

can't necessarily do in your own mind, at least I can't, when you hear yourself say something wrong

21:18

or that didn't quite follow logically, bam, you can actually have that aha moment. So on the

21:23

way out today, by all means, take any one of these ducks that took quite a while long time for

21:27

occult until I out today, and we'll have more at office hours in the weeks to come if you

21:31

would like. So you might, some of you might recall such a duck from Currier House last year too,

21:36

which was a cousin of his as well. All right, so that is rubber duck debugging. Now last week,

21:42

we call that we began to take off training wheels. We used for a few weeks to see a city library,

21:46

and that's kind of in the past now. That was just a technique, a tool via which we could get

21:50

user input a little more pleasantly than if we actually started doing it with memory early on.

21:55

And we revealed last week that a string, quote, unquote, is just what underneath the hood

22:00

in C? What, say again? In array of characters, and even more specifically, it's a synonym

22:08

STRIng for what actual data type. Char star, as we've called it. So a char star is just the computer

22:16

scientist way of describing a pointer to a character, or rather the address of a character,

22:22

which is functionally equivalent to saying in an array of memory or sequence of memory,

22:26

but it's kind of the more precise, more technical way of describing it. And so now that we know

22:31

that we have char stars underneath the hood, where is all of that coming from? Well indeed,

22:35

it maps directly to that memory. We keep pointing out that something like this is inside of your

22:40

computer, and we can think of the memory as just being chunks of memory, all of whose bytes are

22:45

numbered zero on up to two gigabytes, or two billion, whatever the value might be. But of course,

22:50

last week, we pointed out that you think about this memory, not as being hardware per say,

22:55

but it's just being this pool of memory that's divided into different regions. The very top of

22:59

your computer's memory, so to speak, is what we call the text segment. And what goes in the text

23:04

segment of your computer's memory when you're running a program? Text is like a poor choice of words,

23:10

frankly, but what is it? A second? Not the file headers in this case. This isn't the context of

23:18

running a program. This is early saving a file. Not string literals here, but they're nearby actually

23:24

in memory. Functions closer. Yeah, the text segment of your computer's memory is where,

23:32

when you double click a program to run it or in Linux, when you do dot slash something to run it,

23:38

that's where the zero's in ones of your actual program, the machine code that we talked about in

23:42

week zero is just loaded into RAM. So recall from last week that, you know, anything physical in

23:48

this world, hard drive, solid state drives is slow. So those devices are slow, but RAM,

23:53

the stuff that we keep pulling up on the screen is relatively fast. If only because it has no

23:57

moving parts, it's purely electronic. So when you double click a program on your Mac or PC,

24:01

or do dots slash something in Linux, that is loading from a slow device, your hard drive,

24:06

where the data is stored long term into RAM or memory, where it can run much more quickly and

24:12

pleasurably in terms of performance. And so what does this actually mean for us? Well,

24:17

it's got to go somewhere. We just decided humans years ago that it's going to go at the top,

24:21

so to speak, of this chunk of memory. Below that, though, are the more dynamic regions of memory,

24:26

the stack and the heap. And we said this a moment ago, in last week as well, what goes on the heap,

24:31

or who uses the heap? Dynamic memory. Anytime you call malloc, you're asking the operating

24:38

system for memory from the so-called heap. Anytime you call free, you're sort of conceptually

24:43

putting it back. Like, it's not actually going anywhere. You're just marking it as available for

24:47

other functions and variables to use. The stack meanwhile is used for what?

24:54

Local variables. And any of your functions. So main typically takes a sliver of memory at the

24:59

bottom. If main calls another function, it gets a sliver of memory above that. If that function

25:04

calls one, it gets a sliver of memory above that. So they each have their own different

25:07

regions of memory. But of course, these arrows, both pointing at each other, doesn't seem

25:12

like such a good design. But the reality is bad things can happen. You can allocate so much

25:17

memory that bam, the stack overflows, the heap, or the heap overflows, the stack. Thus was

25:23

born websites like stack overflow. And the like, but that's just a reality. If you have a finite

25:27

amount of memory, at some point, something's going to break, or the computer's going to have to

25:31

say, mm, no more memory, you're going to have to quit some programs, or close some files,

25:36

so that was only to say that that's how the memory is laid out. And we started to explore

25:40

this by way of a few programs. This one here, it's a little dark here. This one here was a swap

25:46

function, and how it's even darker. It was a swap function that actually did swap two values,

25:53

A and B, but it didn't actually work in the way we intended. What was broken about this swap

25:59

function last week? Like I'm pretty sure it worked. And when our brave volunteer came up and

26:06

swapped the orange juices in the milk, that worked. So like the logic was correct. But the

26:12

program itself did not work. Why? Change the values of copy variables. Exactly. It changed

26:18

values in the copies of the variable. So we're called it, when main was the function we called

26:23

and it had two values, X and Y. That chunk of memory was here. That chunk of memory was here.

26:28

And it had like the numbers one and two. But when it called the swap function,

26:32

that got its own chunk of memory. So main was at the bottom. Swap was above that. It had its own

26:36

chunks of memory called A and B, which initially got the values one and two. One and two were

26:41

indeed successfully swap, but that had no effect on X and Y. So we fixed that with the newer

26:47

version of this program. Of course, it looked a lot more cryptic at first glance, but in English,

26:51

could someone just describe what it is that happens in this example that was more correct?

26:56

Like, what does this program do line by line? Yeah. Exactly. Instead of passing the values of

27:05

the variables, they were by copying them. It passes the addresses of those variables. So that's

27:10

like saying, I don't technically care where it is in memory, but I do need to know that it is

27:15

somewhere in memory. So instead of passing an X and the number one, let's suppose that X is at

27:19

location 100. Might go to example. It's actually the number 100 that's going to go there. And if

27:24

Y is at the location like 104, well, it's 104 that's going to go there, which are not the

27:30

values we want to swap. But those are sort of like little maps or breadcrumbs, if you will,

27:34

that lead us to the right location so that when we execute this code, what we're ultimately

27:39

swapping in those three lines is this and this and all along the way, we're called we're using

27:45

a temporary variable there that can be just thrown away after. So that's what pointers allow

27:49

us to do. And that's what allowed us to actually change values on the so-called stack,

27:54

even though even by calling another function. Are any questions then on where we left off last

28:01

time with the stack and with swap? Now, all right. So recall, we introduced Binky as well,

28:10

who lost his head at one point, but why? What went horribly, horribly, or why, with the scene

28:16

from last week's film from Stanford? Binky was doing everything correctly, right? Like moving

28:23

values 42 was successful and then yeah. So D reference something that wasn't funny to

28:30

actually, I wasn't pointing to the images. Exactly. He tried to D reference a pointer and address

28:35

that wasn't actually pointing to a valid address. Recall that this was the line in code in question

28:40

that was unlucky and bad. Star Y means go to the address in Y and do something to it,

28:45

set it equal to the number 13. But the problem was that in the code we looked at last week,

28:51

all we did at the start was say, hey, computer, give me a pointer to an int and call it x,

28:56

do the same and give it y, allocate space and point x at it, but we never did the same for y. So

29:05

whereas x contained last week, the address of an actual chunk of memory, thanks to Mallock,

29:10

what did y contain at that point in the story, the yellow line there?

29:16

What did y contain? What value? No. No, maybe, maybe, but it's not obvious because there's no

29:27

mention of null on the program. We might get lucky, null is just zero and sometimes we've seen

29:32

that zero or the default values in a program. So maybe, but I say maybe an imaging y.

29:40

Yeah, and it doesn't allocate, well, allocates not quite the right word that suggests you are

29:44

allocating actual memory. It's a garbage value. There's something there. My Mac has been running

29:48

for a few hours and your Macs and PCs and phones are probably running all day long or certainly

29:52

when the lid is up. And so your memory is getting used and unused and used, like lots of stuff

29:57

is going on. So your computer is not filled with like all zeros or all ones. If you look at it

30:01

at some random point in the day, it's filled with like a bunches and bunches of zeros and ones

30:05

from previous programs that you quit long ago. Windows, you have in the background and the like.

30:10

So the short of it is, when you're running a program for the first time that's been running

30:14

now for some time, it's going to get messy. That big rectangle of memory is going to have

30:18

some ones over here, some zeros over here and vice versa. So their garbage values because those

30:24

bytes have some values in them. You just don't necessarily know what they are. So the point is you

30:29

should never, ever, dereference a pointer that you have not set yourself, maybe you'll crash, maybe

30:35

it won't crash. Valgrin can help you find these things, put sometimes, but it's just not a safe

30:40

operation. All right, and lastly, the last thing we introduced last week, which will be the stepping

30:45

stone for what problems we'll solve this week was struck. So struck is kind of cool. In that,

30:50

you can design your own custom data structures. See is pretty limited out of the box. So to speak,

30:56

you only have chars and bulls and floats and inks and doubles and longs and string, well,

31:01

we don't even have strings per se. So it doesn't really come with many features, like a lot of

31:05

languages do, like Python, which we'll see in a few weeks. So with struck and see,

31:09

you have the ability to solve some problems of your own. For instance, with the struck,

31:14

we can actually start to implement our own features, our own data types. For instance, let me go

31:21

up here, and let me go ahead and create a file called, say, a student, a rather just struck.h.

31:28

So we call that dot h as a header file. Thus far, you have used header files that other people

31:33

made, like CS50 dot h, and standard IO dot h, and standard lid about h, but you can make your own.

31:38

header files are just files that typically contain code that you want to share across multiple

31:43

programs. And we'll see more of this in time. So let me go ahead and just save this file and

31:47

suppose that I want to represent a student in memory. A student, of course, is probably going to

31:53

have what, like, a, for instance, how about a string for their name, a string for their dorm,

32:01

but string is kind of two weeks ago. Let's call this char star, and let's call name char star.

32:08

And so you might want to associate, like multiple pieces of data with a students, right? And

32:12

you don't want to have multiple variables per se. It'd be nice to kind of encapsulate these together,

32:15

and recall at the very end of last week we saw this feature where you can define your own type

32:21

with type death that is a structure itself, and you can give it a name. So in short, simply by

32:27

executing these lines of code, you have just created your own custom data type. It's now called

32:32

student, and every student in the world shall have per this code, a name, and a dorm associated

32:37

with them. Now, why is this useful? Well, the program we looked at at the very end of last time

32:42

looked a little something like this. In struct zero, dot c, we had the following. I first

32:50

allocated some amount of space for students. I asked the user what's the enrollment in the class,

32:54

or what not? That gives us an int. And then we allocated an array of type student called students.

33:01

plural. This was an alternative for call to doing something like this. String names enrollment,

33:08

and string dorms enrollment, which would work. You could have two separate arrays, and you just have

33:13

to remember that name zero and dorm zero is the same human, but why do that if you can keep

33:18

things together? So with structs, we were able to do this. Give me this many student structures

33:25

and call the whole array students. And the only new syntax we introduced to satisfy this goal was

33:31

what operator. The dot. Yeah, so in the past, recall from like week two, we introduced arrays,

33:40

and arrays allow you to square bracket notation. So that is no different from a couple of weeks back.

33:45

But if your array is not storing just integers, or chars, or floats, or whatever,

33:50

it's actually storing a structure, like a student, you can get at that student's name by literally

33:56

just saying dot name, and you can get at their dorm by doing dot dorm, and then everything else is the same.

34:01

This is what's called encapsulation, and it's kind of like a fundamental principle of programming, where

34:06

if you have some real world entity, like a student, and you want to represent students with code,

34:11

yeah, you could have a bunch of arrays that all have called denames, dorms, emails, phone numbers,

34:17

but that's just just messy. You can instead encapsulate all of that related information about a

34:22

student into one data structure so that now you have per week zero in abstraction. Like a student

34:28

is an abstraction, and if we break that abstraction, what is a student actually? Not in the real world,

34:35

but in our code world here. Student is an abstraction. It's a useful word, all of us can kind of

34:40

agree mean something, but technically what does it apparently mean? A student is actually a name

34:47

in a dorm, which really kind of is diminutive to everyone in this room, but we've distilled it in code

34:52

to just those two values. So there we have encapsulation, you're kind of encapsulating together

34:57

multiple values, and you're abstracting away, just have a more useful term, because no one's going to

35:01

want to talk in terms of lines of code to describe anything. So, same topic is in the past. So,

35:06

now we have the ability to come up with our own custom data structures it seems that we can store

35:11

anything inside of them that we want. So let's now see how poorly we've been designing some things

35:17

for the past few weeks. So it turns out that much of the code, hopefully we've been writing

35:23

in recent weeks, has been correct, but we've been not necessarily designing solutions in the

35:28

best way. We're called that when we have this chunk of memory, we've typically treated it as

35:33

at most in array. So just a continuous chunk of memory, and thanks to this very simple mental

35:37

model, do we get strings, do we get arrays of students now, but arrays aren't necessarily the

35:44

best data structure in the world. Like, what is a downside of an array if you've encountered

35:49

ones of us far? In see, what's a downside of an array? Yeah.

35:55

Tanner cannot. To cannot. That is true. So in see, you cannot mix data types inside of an array.

36:06

They must all be in, they must all be charged, they must all be students. It's a bit of a

36:11

white like, as technically you can have something called a void star, and you can actually map,

36:14

but yes, that is true, though strictly speaking, cannot mix data types, though frankly even

36:19

though other languages let you do that, it's not necessarily the best design decisions. But sure,

36:24

other thoughts. Yeah. The size cannot change. Let's focus on that one, because that's sort of

36:30

more even more constraining it with seam. So if you want an array for say two values, what do you do?

36:37

Well, you can do something like int x, bracket two, semicolon, and what is that actually give you

36:43

inside of your computer's memory? It gives you some chunk that will draw as a rectangle,

36:47

with this is location zero, this is location one. Suppose that, oh, a few minutes later,

36:52

you change your mind. Oh, darn, I just took a, I want to type in a third value, or I want to add

36:57

another student to the array. Where do you put that? Well, you don't. If you want to add a third

37:03

value to an array of size two, what's your only option in C? Make a new array. You make a new array.

37:09

So literally, and if this array had the number like 42, and this had the number 13, the only way

37:15

to add a third number is to allocate a second array, copy the values into the same locations 42,

37:23

13, and then we'll add another value 50, and then so that you're not using a twice as much space

37:28

almost permanently, now you can sort of free somehow or stop using that chunk of memory. So that's

37:34

fine. It's correct what we just did, but what's the running time of that process?

37:40

Call a couple of weeks ago, we started talking about efficiency and design. What's the running time?

37:45

Of resizing an array. Say again? Two long, fair. But let's be more precise. Biggo of,

37:59

Biggo of what? And what's n? Okay, true, but what is n represents?

38:09

Yeah, so you don't actually have to not know, like it's just a general answer in this case,

38:12

however long the array is, call it n, it is that many steps to resize it into that plus one.

38:18

Okay, technically it's Biggo of n plus one, but recall in our discussion of Biggo notation,

38:22

we just ignore the smaller terms, the plus ones, the divided by two, the plus n. We focus only

38:28

on the most powerful term in the expression, which is just n here. So yes, if you have an array of

38:33

size two and you resize it into an array of size three or really n plus one, that's going to take

38:39

me roughly n steps, technically n plus one steps, but n steps, Ergo, Biggo of n. So it's a linear

38:45

process. So possible, but not necessarily the fastest thing because we literally have to move

38:50

all those damn values around. So what would be better than this? And if you've programmed before,

38:57

you might have the right instincts already. How do we solve this problem?

39:02

Oh, yeah. We're going to allocate more memory at the end of the array.

39:07

Reallocate more memory at the end of the array. So it turns out C does have a function called

39:14

realloc, perfectly if not obviously named, that reallocates memory. And if you pass it,

39:20

the address of a chunk of memory you've allocated, and the operating system notices, oh yeah,

39:25

you've got lucky. I've got more memory at the end of this array. It will then allocate that

39:30

additional RAM for you and let you use it. Or worst case, if there's nothing available at

39:35

the end of the array in memory, because it's being used by something else in your program,

39:39

that's fine. Realloc will take take on the responsibility of creating another array,

39:44

somewhere in memory, copying all of that data for you into it, and returning the address of

39:50

that new chunk of memory. Unfortunately, that's still linear. Yeah. This is all being done in the heap,

39:57

malloc, and realloc, and free all operate on the heap. Yes. So that is a solution, but it doesn't

40:03

really speak to the efficiency. Yeah. Yeah. What is a linked list? Go ahead. Uh,

40:11

when you have an element at points like different elements. Okay, points to other element. Yes.

40:16

So let me speak to like, what's the fundamental issue here? The fundamental problem is that

40:20

like much like painting yourself into a corner, uh, so to speak, as the cliche goes,

40:25

within array, you're deciding in advance how big the data structure is and committing to it.

40:31

But what if you just do the opposite? Don't do that. If you want initially room for just one

40:37

value, say one integer, only ask the computer for that. Give me space for one integer, and I'll

40:43

put my number 42 in here. And then if and only if you want a second integer, do you ask the

40:49

computer for a second integer? And so the computer, as by a malloc or whatnot, will give you

40:54

another one, like the number 13. And if you want a third, just ask the same question of the

40:58

operating system, each time just getting back one chunk of memory. But there's a fundamental

41:04

gacha here. There's always a trade-off. So yes, this is possible. You can call malloc three times,

41:10

each time asking for a chunk of memory of size one instead of size three, for instance.

41:15

But what's the price you pay or what problem do we still need to solve? Yeah.

41:19

Yeah, they're not being stored next to each other. So even though I can think of this as being the

41:25

first element, the second and the third, you do not have in this story random access to elements.

41:32

And random access, Ergo random access memory, or RAM, just means that arithmetic, like mathematically,

41:38

you can jump to location zero, location one, location two randomly, or in constant time. Just

41:44

instantly, because if they're all back to back to back, all you have to do is like add one or

41:49

add four or whatever to the address and you're there. But the problem is, if you're calling malloc

41:54

again and again and again, there's no guarantee that these things are even going to be

41:59

proximal to one another. These second chunks of memory might end up, if this is the big chunk of

42:05

memory we've been talking about, where the heaps up but here and the stacks down here, 42 might end

42:10

up over here, the next chunk of memory 50 might end up over here, the third chunk might end up over here.

42:16

So there you can't just jump from location zero to one to two, because you have to somehow

42:21

remember where location zero and one and two are. So how do we solve this? Even if you haven't

42:28

programmed before, like what would a solution be here? Okay, somehow storing the addresses of

42:41

all right. So let's just suppose for the sake of discussion that this chunk of memory ended up at

42:44

location 100, this one ended up at like one hundred fifty, this one ended up at like four hundred and

42:50

seventy five, whatever, those values are. It would seem that somehow or other I need to remember three

42:56

values, a hundred one fifty and four seventy five. So where can I store that? Well it turns out I could

43:03

be a little clever but a little greedy. I could say to malloc, you know what? Every time I call

43:07

all you don't just give me space for an integer, give me space for an integer plus the address

43:14

of another integer. So if you've ever kind of seen like popcorn strong together on a string or

43:20

any kind of chain length fence where one link is linking to another, we could create the equivalent

43:27

of whoops, not that. We could create the equivalent of this kind of picture where each of these squares

43:35

or nodes will start calling them kind of links graphically to the other. Well we've seen these

43:40

links or these pointers, literally arrows that are pointing implemented in code, an arrow or a

43:45

pointer is just an address. So you know what? We should just ask malloc, not for enough space for just

43:52

the number 42, we should instead ask for a little more memory in each of these squares, making them

43:59

pictorially rectangles now, so that now, yes, we do have these arrows conceptually pointing from one

44:05

location to another, but what values do I actually want to put in these new additional boxes?

44:12

The addresses of the next. So they're like little breadcrums. So in this box here,

44:17

associated with the first value, should be the address of my second value, 475,

44:23

associated with my second value here per the arrow and let me draw the arrow from the right place.

44:29

From the arrow should be the address 150 because that's the last, and then from this extra box,

44:35

what should I put there? Yeah. Yeah, so probably the equivalent of slash arrow,

44:41

which in the world of pointers recall is N-U-L-L. So just a special value that means that's it,

44:46

this is the end of the line. That still leaves us with room to add a fourth value and point to it,

44:51

but it for now signifies very clearly to us, there's nothing actually there. So what did we just do?

44:58

We created a list of values, 50, oh sorry, 42, 50, 13, but we linked them together. First,

45:05

pictorially we just arrows like any human might with a piece of chalk, but technically in code,

45:10

we could do this by just storing addresses in each of these places. So just to be clear,

45:15

then what might this actually translate to in code? Well, what if I proposed this? In code,

45:23

we might do something like this. If we want to store an integer, we're of course going to need

45:31

to store like int and we'll call it. And we'll represent 42 or 50 or 13. But if we want to create

45:36

a data structure, we might want to start giving this data structure a name. I called it a

45:40

moment ago node, which is a CS term for like a node in a linked list, so to speak, and it looks

45:46

like this. So type-deft means give me my own type. Struct means make it a structure, like a student

45:51

was, and then node was going to be the name of this thing. And I'll explain in a moment why I have

45:55

the word node twice this time. But I have left room on the board for just one more line.

46:02

In addition to an int called n or whatever, I need to somehow represent in code the additional

46:09

memory that I want Malok to give me for the address. So first of all, these are addresses of

46:14

what data types, each of those three new boxes. They're the addresses of integers.

46:20

And that point in the story. But technically, what is this box really pointing to?

46:27

Is it pointing specifically to the int? It's pointing to that whole chunk of memory if you will.

46:33

So if you start thinking about each of these rectangles as being a node, and each of the arrows

46:38

as pointing to another node, we need to somehow express, I need to somehow store a pointer to a node.

46:45

In other words, each of these arrows needs to point to another node. And in code, we could

46:50

say this, right? Like let's give it a name instead of n, which is the number, let's call it next.

46:56

So next, shall be the name of this field that points to the next node in memory. And node star,

47:01

what does that mean in English, if you will? So again, pointing to an address, right? It looks

47:09

different. Node is a new word today, and that's fine. But node star just means a pointer to a node.

47:14

The address of a node, and it turns out that this is a custom structure, so we actually have to say this.

47:20

But it's the same principle, even though things are kind of escalating quickly here,

47:24

we just need two values, an int, and then a pointer to another thing. That other thing is going to be

47:31

another node, and we're just using node, frankly, to encapsulate two values, an int, and a pointer.

47:36

And the way you express in C, albeit somewhat cryptically, a pointer of one of those arrows,

47:41

is you say, give me a variable called next, have it point two, a structure called node,

47:48

or rather, have it be the address of a structure of type node. Yeah.

48:01

Good question. So this feels like a circular kind of definition, because I'm defining a node,

48:06

and yet inside of a node is a node, that is okay, because of the star. It is necessary in C.

48:14

I'll remember that C always is kind of red top to bottom. So accordingly, this very first line of code here,

48:21

typed F struck node, at that point in the story, when client has read that line,

48:26

it knows that a phrase struck node exists. Exactly. Exactly. We didn't need to do this with

48:33

student, because there were no pointers involved to other students, but yes, in this case.

48:38

So in short, this tells client, hey, client, give me a structure called node, and then in here,

48:43

we say, hey, client, each of those nodes shall have two things, an integer called N,

48:48

and a pointer to another one of these data structures of type node, and call the whole thing node.

48:56

It's a bit of a mouthful, but all this is the following. Let me go ahead and erase all of this.

49:00

All this data type is, if we get rid of the picture we drew on the fly there, is this says,

49:08

hey, client, give me a data structure that pictorially looks like this. It's divided into two parts.

49:14

The first part is called N. The second type is called next. This data type is of type N,

49:20

this is a pointer to another such node. And that's it. Even though the code looks complex,

49:26

the idea is exactly that. Yeah. Good question. When the reason is for it, as just came up a

49:40

moment ago, it's clang and see in general, or kind of dumb it, that they just read code top to bottom.

49:46

And the problem is you have to declare the name of this structure as being a struck node before.

49:52

You actually use it. It's similar in spirit to our discussion of prototypes,

49:55

why functions need to be mentioned way up top. This just says the clang,

49:59

give me a type called struck node. You don't know what it's going to look like yet,

50:03

but I'll finish my thought later. And then in here, we're just telling clang inside of that node

50:09

should be an integer, as well as a pointer to the very type of thing. I'm in the middle of the

50:13

finding. But if I had left off the word node up there and just said struck, you couldn't do that,

50:18

because it hasn't seen the word N-O-D-E yet. That's all. Other questions?

50:25

All right. So, if I now have a data structure called node, I can use it to kind of stitch

50:31

together these link lists. And maybe just to vary things a little bit and to start giving away

50:35

some ducks, would folks be comfortable with a volunteering to solve a problem here? Yeah. Okay,

50:41

come on up. One, two, sure. Or you can take a duck and run. Okay, one, two,

50:49

how about three? Come on over here. Three. So, if you want to be our first pointer, come and

50:53

you can be number five, come on over here. Want to be number nine. And one more, one more volunteer,

51:00

come on over here. Yeah. All right. So, how many of you over here?

51:08

Okay, 17. All right. So, if you'd like to just so we pick this up for those following along

51:14

home, if you would like to just say hello to the audience. Hi, I'm Andrea. Hi, I'm Cumm.

51:22

Wonderful. Okay, and if you wouldn't mind I was just taking a big step back over the ducks,

51:27

just so that we're a little farther back. Let's go ahead and do this. If you're our first

51:31

pointer, if you could come over here, for instance, and just stand outside the ducks. And if you

51:35

guys could come a little over here in front, it's still fine. So, here we have the making of a

51:40

length list. And what's your name again? Cummie. Cummie is our first pointer, if you will.

51:45

By a Cummie's variable, are we just going to keep track of the first elements of the length list.

51:49

So, if you could with your left hand, represent first, just point over at what was your name again?

51:54

Andrea. So, Andrea is the number nine. If you could use your left hand to point at number five,

51:59

and if you could use your left, to point at number 17, and your left hand to just point at

52:04

null, which we'll just call the ground. So, you don't want to just point it randomly, because that

52:08

would be like following a bogus pointer. So, here means null. All right. So, this is a length list.

52:13

All you need to store a length list of three values is three nodes inside of which are three

52:19

integers, and their left hand represents that next pointer. So, to speak, Cummie's a little different

52:24

in that she's not holding a value. She's not holding an integer, rather holding just the name

52:30

of the variable first. So, you're the only one that's different here fundamentally.

52:34

So, suppose I want to insert like the number 20. Could someone volunteer to be number 20?

52:39

Okay, come on up. All right. And what's your name? Eric. Eric. You're the number 20. And Eric,

52:47

actually, let's see. Let's see. Let's actually, can we do this? Let me give, let me make this a little

52:56

more different. Okay, that never happened. Okay. Eric, give me that please. I want to insert

53:03

Eric as number five. So, Eric, I'm keeping this list sorted. So, where obviously you're going to go?

53:10

All right. But before you do that, let's just consider what this looks like in code. In code

53:13

presumably, we have a mallocked Eric from the audience. I've given him a value n of number five.

53:20

And his left hand is like, it's garbage value right now. Because it's not pointing to anything

53:25

specific. So, he's got two values in integer and a left hand representing the next pointer.

53:30

If the goal is to put Eric in sorted order, what should our steps be? Like who's hands should

53:37

point where and in what order? Yeah, give us one step. Okay. So, you should point at number nine.

53:42

So, you should point at number nine, which is equivalent to saying point at whatever first or

53:47

pony is pointing at. So, go ahead and do that. All right. Next. What's the next step? Someone else?

53:54

Someone else. Almost there. Yeah. First, should point to five. Okay. So, first, your

53:58

combing could you point to five. And that's fine. You don't even have to move, right? This is the

54:02

beauty of a linked list. It doesn't matter where you are in memory. It's the whole beauty of these

54:07

pointers where you can literally point at that other location. It's not an array where they need to

54:11

be standing back to back to back. They can be pointing anywhere. All right. Let's go ahead and insert one

54:15

more who wants to be say 55. Big value. Yeah. Come on down. All right. What's your name?

54:22

Keon. Okay. So, come on over. So, we've just melocked Keon from the audience. I've given him his

54:27

end value of 55. His left hand is just some garbage value right now. How do we insert Keon

54:33

in the right order? Where is he obviously supposed to go? In sorted order, he obviously belongs

54:38

at the end. But here is the catch with a linked list. Just like when we've discussed searching

54:44

and sorting in the past, the computer is pretty blind to all but just one value. And the linked

54:49

list at the moment, I don't know that these three, these four exist. All I know really is that

54:54

Komi exists. Because via this first pointer is the only access to the rest of the elements.

55:00

And so what's cool about a linked list, but perhaps not obvious, is that you only the most important

55:05

value is the first. Because from the first value, you can get to everyone else. It's not useful.

55:10

Excuse me for me to remember Andrea. Andrea alone. Because if I do, I've just lost track of

55:16

Komi and more importantly, because of his number, Eric. So, all I have to do really is remember Komi.

55:21

So, if the goal now is to insert number 55, what steps should come first? No pun intended.

55:29

Okay, finding the first space. So, I'm going to start at Komi and I'm going to follow this

55:36

pointer number five. Does 55 belong here? No, so I'm going to follow this pointer and get to

55:42

Andrea. Does 55 belong here? No, I'm going to follow her pointer and 22 does a belong here? No,

55:47

I follow this pointer, 26. No, but you have a free hand that turns out. So, what steps should

55:52

come next? We could have you point at 55 and now done. So, relatively simple, but what was the

56:01

running time of this? Yeah, it's big all of end. It's linear because I had to start at the

56:08

beginning. Even though we humans have the luxury of just eyeballing and saying, oh, obviously,

56:11

he belongs way at the end. Not in code. Like we have to start at the beginning to reverse

56:15

the whole darn list until we get linear lead to the very end and now we're done. Let's try one last

56:20

one. How about 20? Yeah, come on down. What's your name? James. All right, James. All right,

56:27

so we just maloch James given him the number 20. If he obviously belongs roughly in the middle,

56:30

what's the first step? Sorry. All right, so we start with combing again. All right, first. Okay,

56:38

five. Do you belong here? No. Let me follow the link. Okay, nine. Do you belong here? No. Do you belong

56:44

22? Oh, but what did I just do wrong? I went too far. At least in this story, like I literally

56:51

Andrea's behind me now. Okay, so can I follow the pointer backwards? You can't. Like in

56:56

every picture we've drawn and every example we've done with an address, we only have the address of

57:00

the next pointer. We don't have what's called a doubly linked list, at least in this story,

57:04

where I can just turn around. So that was a bug. So I need to start over instead first. Okay, five.

57:08

Okay, 19. What I really need in code ultimately is to kind of peek ahead and not actually move

57:14

not that far. Just a 22. peek ahead at 22 and really it's, oh, that's going to be too far.

57:19

This is not yet far enough. So let's go ahead and bring James over. Actually, you can stay there physically,

57:24

but what step happens to happen first? I know now he belongs in here. You want to point it

57:31

him? Okay, point it him? Well, let's do that just because it is incorrect. That's fine.

57:36

Okay, Andrea proposed that we point here, but you just broke the whole linked list. Why?

57:41

Because there's a thing to point at, right? No one is remembering who was your name again.

57:45

No one remembered where Keon was, so can't do that. Your left hand has to stay there. So what

57:49

step should happen first instead? James should point at whatever Andrea is pointing at perhaps.

57:56

So a little redundantly at the moment, just like before. Okay, now what happens next? That's step one.

58:02

Now you can point at him. Okay, good do that. All right. And so now this looks like a complete mess,

58:08

but if we know that call me his first, we can follow these bread crumbs to Eric and then to Andrea

58:15

and then to James and then the rest of our list, step by step by step. So it's a huge amount of like

58:22

logic now, but what problem have we solved? And I think we identified it over here earlier. What was the

58:27

problem first and foremost with the raise? You have to decide on their size and advance. And once you do that,

58:35

if you want to add an additional element, you have to greaseize the whole darn thing, which is expensive,

58:39

because you have to move everyone around. Now frankly, I'm being a little greedy here and every time

58:44

we've inserted these new elements, I've been keeping them in sorted order. So it would seem that if you

58:48

insert things in sorted order, big all of then, every time, because in the worst case, the new element might

58:53

end up all the way at the end. But what if we relax that constraint? What if I'm not so uptight and need

58:58

everything nice and like orderly and sorted? What if I just want to keep growing the list in any random order?

59:03

And I allocate the number 34, where and I'll play the number 34. Malach 34, where is the quickest place for me to go?

59:11

Yeah, 0.5. Okay, 0.5. And then, call me if you could point to me. Done. One, well, two steps. All right.

59:19

So, post now, Malach 17, with someone else, the pretend is right here. Where's the best place for 17 to go?

59:27

Right after Call Me 2. So, now, Call Me can point at 17. 17 can point at me. I can point at Eric and so forth.

59:34

And that's two steps again. Two steps. If it's the same number of steps every time, we call that constant time. And we write it as big

59:40

goal of 1. And so here, too, it's just a tradeoff. If you want really fast insertions, don't worry about sorting.

59:47

Just put them at the beginning and deal with it later. If you want dynamic,

59:50

resizeability, don't use an array, use a link list, and just keep allocating more and more as you go without

59:55

wasting a huge amount of space, too. Which notice, that's another big problem within array. If you

59:59

overallocate space and only use part of it, you're just wasting space. So, there's no one solution here,

01:00:04

but we do now have the capabilities, things to the structure and point pointers to stitch together.

01:00:09

If you will these new problems. Yes, please. And who am I in this story?

01:00:21

Okay, absolutely. So, another very reasonable idea would be, well, why don't I just put the new

01:00:25

ones at the end? That's fine if I keep track of who is at the end. The problem is that the

01:00:31

moment in the story, and we'll ultimately see this in code, I'm only remembering Call Me and from

01:00:35

Call Me in my getting everywhere else. I could have another pointer, second pointer, and literally

01:00:41

call it last, that's equivalent to you, or that's always pointing at you. I just need then two pointers,

01:00:46

one literally called first one, literally called last, that's fine. That's a nice optimization,

01:00:50

if I want to throw all of the elements at the end. And frankly, I could get really fancy and

01:00:55

to solve the problem that Andrea cited earlier, if I store not just an end, an end, and a pointer,

01:01:00

but instead an end and two pointers, I can even have each of these guys pointing with their left

01:01:05

end right hands in a doubly linked list, so as to solve the problem Andrea identified, which was,

01:01:11

if I go too far, no big deal, take one step back. I don't have to think as hard about that logic,

01:01:16

so there are two to trade off. Let's go ahead and take a five minute break. I'll turn on some

01:01:19

music grab a duck now if you'd like and we'll return with some fancier data structure still. Thanks.

01:01:24

All right, we're back. So, let's now translate some of these ideas to code, so that we can actually

01:01:29

solve this problem a little more concretely than just having humans pointing at each other. So,

01:01:33

for instance, let's try to distill everything we've been talking about into just a goal in code

01:01:37

of storing a list of numbers. I would propose that we can take like three passes at this problem.

01:01:42

The first would be, let's just decide in advance how many numbers we want to store,

01:01:46

so we don't have to deal with all this complexity with the pointing and the pointers and all this,

01:01:50

and just hard code that value somehow and just stop when the user has inputted that many numbers

01:01:56

and no more. Two, we can improve upon that and at least let the user dynamically resize

01:02:01

their array so that if they decide to input more numbers than we intend, it's going to grow and deal

01:02:06

with that. Of course, arrays are not necessarily ideal because we have to do all that

01:02:09

damn copying from old to new. That's linear time. It seems smartest to get some version three,

01:02:15

which is actually going to use a link list, so we're just more modestly allocating space for

01:02:20

another number and another number or really a node, one number at a time. So, let me go ahead and

01:02:26

start as follows. I'm going to go ahead and include some familiar lines in list 0.c of the CS50

01:02:34

library just to make it easy to get some user input for this and standardio.h for printf. And

01:02:39

let me go ahead and declare my main function as usual and then in here let's do a couple things.

01:02:44

First let's ask the user for the capacity of the array that we're going to use. Or rather let's do

01:02:50

this first. Let me first rewind and say you know what, int numbers 50. Well, that's going to be

01:02:57

annoying to type in 50 numbers. We're going to give the user two numbers to at first that here

01:03:00

she can type in. Next let's go ahead and prompt the user for those numbers. So let me go ahead and

01:03:07

say let's do this. Let's at least clean this up a little bit so that we can reuse this value.

01:03:14

So we don't have a magic number as just came up in discussion actually. So while,

01:03:20

I'm not going to do that. Let me fix this. This will be my capacity of size two and that's

01:03:25

going to give me that size. And then I'm going to keep track of how many integers I've prompted

01:03:29

the user for so far. So initially the size of this structure is going to be zero but it's capacity.

01:03:34

So to speak is two. So size means how many things are in it. Capacity means how many things

01:03:38

can be in it. And while the size of this structure is less than its capacity, let's go ahead

01:03:44

and get some inputs from the user. Let's go ahead and ask them for a number using our old friend

01:03:49

get int and just say give me a number. And then let me go ahead and insert the number that they type

01:03:55

in into this array at location size like this and then do size plus plus. I think you know it's

01:04:05

pretty quickly, but let's consider what I just did. I initialized size to zero because there's

01:04:09

nothing in it initially. Then I say while size is less than the capacity of the whole thing and

01:04:14

capacity is two by default, go ahead and do the following. Give me an int from the user.

01:04:20

So int number gets int. Then put at location size in my numbers array whatever the human typed in

01:04:27

number and then increment size with plus plus. All right. So on the first iteration size is zero.

01:04:33

So numbers bracket zero gets the first number. Numbers bracket one gets the second number.

01:04:37

Then size equals capacity so it stops logically. Any questions on the logic of this code?

01:04:45

All right. So once we have those numbers, let's just do something simple. Like for int

01:04:50

I get zero. I is less than the actual size. I plus plus. Let's just go ahead and print out

01:04:56

the number you inputted percent i backslash n and type out numbers bracket i. All right.

01:05:08

So if I made no typos in list zero dot c, then I'm going to go ahead and do dot slash zero dot c.

01:05:14

I'm going to be prompted for a couple numbers. Let's go ahead and do one, two, you input one,

01:05:19

you input a two. All right. So not bad. But this is bad design arguably. Why?

01:05:27

Just find one fault. It's correct. But bad design.

01:05:34

Repetitive because I'm using a couple loop sure and it's fundamentally it's very limited in

01:05:39

functionality. Why? Like how useful is this program? It's hard code in a two. So let's at least

01:05:46

improve upon this a little bit and get rid of this hard coding. Why don't I at least ask the

01:05:50

user for something like this? Well, instead of just declaring the capacity, let me go ahead and say,

01:05:57

you know what? Let's just replace the two. Get in and just say capacity for instance. All right.

01:06:02

And now if I do this, I'm going to be prompted. So make list zero dot slash the list zero.

01:06:09

One, the capacity will be two, one, two. That's nice. But if I run it again and give it a

01:06:16

capacity of three, one, two, three, I get more capacity. So that's nice. It's an improvement for sure.

01:06:23

There is a bug here. Before I test it further, can anyone identify a bug or somehow crash this?

01:06:34

Oh, go ahead. If you don't input an integer. If I don't input an integer or was that the

01:06:39

same common up here? Oh, no, because I'm re-running it in each time. I don't need to

01:06:50

worry about previous runs of the, of the, of the program. Yeah. In the four loop, he just goes

01:06:56

like one, two, three, it doesn't actually care what you input it. I don't want two, three. Well,

01:07:01

I am iterating up to size, which could be capacity because now they do end up being equivalent

01:07:07

because I'm filling the whole thing. But let's try this. If you don't type in a value. So let me

01:07:10

go ahead and rerun this. My capacity shall be duck. All right. So it did handle that because

01:07:18

get in does that for me. But I bet I can still break this. Oh, yeah. Let's always try something

01:07:23

negative. Oh, okay. So bad. Like cryptic looking message, but clearly has to do with the negative

01:07:28

value. So I should probably be a little smarter about this. And recall from like week one, we did

01:07:33

do this with Mario. You might have done this. So I could do something like do while capacity is less

01:07:40

than one. I could go ahead and say capacity gets get in capacity. So just a little bit of air

01:07:48

checking to close the bug that you identified. All right. So let's go ahead and recompile this.

01:07:52

Make list zero. Whoops. Make, we're going to start hearing that a lot today, aren't we?

01:07:58

Make list zero dot slash list zero capacity will be three one two three. Now capacity will be

01:08:04

negative one doesn't allow it capacity zero doesn't allow it capacity one. Yes. So non-exhaustively

01:08:10

I've tested it feels like it's in better shape. Okay. But this program while correct and while

01:08:15

more featureful still has this fundamental limit. Wouldn't it be nice to allow the user to just

01:08:20

keep typing numbers as many as they want and then quit once they're done inputting numbers. Right.

01:08:27

If you're making a program to compute someone's GPA different students might have taken different

01:08:30

courses. You don't want to have them to type in all 32 courses if they're younger and haven't

01:08:35

taken all those courses like there's a lot of scenarios where you don't know in advance how many

01:08:39

numbers the user wants to provide but you want to support few numbers, lots of numbers or beyond.

01:08:44

So let's do this in a second version in list one dot C. Let me go ahead and improve upon that

01:08:51

example as follows. First let me give my familiar friends up here. CS50 dot H for IO standard IO dot H and

01:09:00

then in here int main void and then let's start writing this. So now I don't know when

01:09:07

advance necessarily how many numbers the user is going to type in. Like the goal is I want them to

01:09:13

be able to type in a number another number another number another number and then hit the equivalent

01:09:17

of like Q for quit when they're done inputting numbers. Like I don't want them to have to think

01:09:21

about an advance how many numbers it is they're inputting. But how do I do that? Like I can't just

01:09:27

come up with an array called numbers and say 50 because if the user wants to type in 51 numbers

01:09:32

I'm going to have to resize that. But how do you resize an array? How do you resize an array?

01:09:42

What's that? You can't, right? We've never seen an instance where you've resized an array.

01:09:48

We talked about it on the blackboard here while just like allocate a bigger one and copy everything

01:09:53

and we did identify realloc but you can't actually use realloc on an array. Realloc actually

01:09:59

accepts an address of a chunk of memory that you want to grow or shrink. So it turns out if we

01:10:05

now start to harness this sort of fundamental definition of what an array is a chunk of memory

01:10:10

we can actually build arrays ourselves. If an array is just a chunk of memory or more specifically

01:10:17

it's like the address of the first byte of a chunk of memory it would seem that I could declare my

01:10:23

array not with square brackets as we've been doing for weeks but I can say you know what numbers

01:10:28

really is it's really just a pointer and I'm initially going to initialize it to null because

01:10:34

there is no array but now I have the ability to point that pointer at any chunk of memory small

01:10:40

or big. Now why is this useful? Well initially let me claim that my capacity is zero because

01:10:45

nothing's going on yet I haven't called Malach or anything and initially my size is zero because

01:10:51

there's nothing in the array and it doesn't even have a size but let me just do this forever.

01:10:56

Much like in scratch we had the forever block you can use while true and see to just say keep doing

01:11:00

this until the user breaks out of this and let me go ahead and ask the user give me a number get in

01:11:07

and just ask them for a number and then we just need a place to put that. So where do I put this

01:11:13

number? Well do I have at the moment any place to put the number? No and technically speaking how

01:11:20

do you express that? Like in pseudocode I want to say if no place for number but technically I could

01:11:26

do this. Well if the size of the array at the moment equals its capacity that feels like a

01:11:34

lower level way of expressing the same thing if whatever the capacity is if the size is the same

01:11:39

there is no more room and that simple statement also covers the scenario where the capacity is zero

01:11:47

the size is therefore zero so it's the same question either we have no space at all or we have

01:11:52

some space but we've used it all size equals equals capacity. So if the size equals capacity or put

01:11:59

more casually if I don't have enough space what do I want to do intuitively? Allocate more

01:12:06

memory and it turns out you proposed or someone proposed earlier reallocating memory we can use

01:12:11

this function for the very first time. Let me go ahead and say this I the catch with realloc is you

01:12:16

have to be smart about it because it returns a pointer so let me propose this code first first

01:12:20

just give me a temporary variable call it temp that's going to store the following actually no let me

01:12:27

start this more simply let me go ahead and say numbers should be reallocated please realloc

01:12:36

by passing itself in and this time give me the size of an int times how many ints do I want this time

01:12:47

how many numbers did the human just input presumably just one right because literally we've only

01:12:56

called get it once in this story so whatever the size of this array is now we need to increase it by one

01:13:03

that's all so this line of code here is saying hey hey computer go ahead and out reallocate

01:13:12

this array from whatever its current size is and make it this size instead the size of whatever it is

01:13:21

plus one times the size of an int because that's what we're trying to store as an int so we have to do that

01:13:25

multiplication and realloc is mentioned earlier is pretty fancy it's going to take in a pointer whatever

01:13:30

chunk of memory you've already allocated and it's going to then reallocate a bigger chunk of memory hopefully

01:13:37

what's going to happen is this if your chunk of memory initially looks like this it's going to hopefully

01:13:42

notice all this memory's free let me just give you back the same address so if this is address 100

01:13:48

and you get lucky and this address is also available the realloc functions going to remember that

01:13:52

for the operating system it's going to return the number 100 again and you're good to go you can safely

01:13:56

touch memory here or if this is in use already this chunk of memory and therefore yeah we can't

01:14:04

fit another bite there because some other code you wrote is using that memory but there is twice

01:14:09

as much memory available down here what realloc will do is if you've stored the number 50 it will

01:14:15

handle the process of copying 50 to the new value this is going to be left as a garbage value

01:14:20

for you to deal with and it's going to return to you the address of the new chunk of memory having

01:14:25

done the copying for you so even though it's technically reallocating the array it's not

01:14:31

necessarily just going to grow it it might relocate it in memory to a bigger chunk and then give you

01:14:36

the new address of that memory question is that process really preferable to just create the

01:14:44

extra memory in this place and then saving the time and it's a really good question honestly

01:14:52

we could avoid this problem slightly by just doing you know what let's give me at least

01:14:58

go ahead and give me at least the size of an ant times I don't know most humans are not going to

01:15:03

type in more than 50 numbers let's just pick 50 so you could do this and that would indeed save

01:15:08

you time because the approach I'm currently taking is pretty inefficient because every damn time I

01:15:13

the user calls get ant and gives an ant we're resizing resizing resizing very expensive

01:15:18

as to what the best value is though 50 should be 25 should it be a thousand I'm either going to

01:15:24

underbets or overbet and it just depends on you to decide which of those is the worst decisions

01:15:31

in terms of programs does it also pretty expensive to have memory that they're not using

01:15:36

for generally is it usually more okay good question in programs you're writing is it better to have

01:15:41

more memory than you're using or should you really be conservative these days memory is cheap we all

01:15:46

have gigabytes of memory and so wasting 50 bytes or 200 bytes times 4 of memory not a big deal

01:15:52

like just get the job done quickly and easily but in resource constrained devices maybe things like

01:15:57

phones or little internet of things style devices that have a lot fewer resources you don't really

01:16:03

want to go wasting bytes but honestly the CPUs the brains and our computers are so darn fast these

01:16:08

even if you're calling mallock 10 times a thousand times it's happening so darn fast that the

01:16:12

human doesn't even notice so there too these are what are called design decisions and these are

01:16:16

the kinds of things that in the real world you might actually debate with someone at a whiteboard

01:16:19

saying no this is stupid because of this reason or here she might push back for other reasons and

01:16:24

no one's necessarily right the whole goal is to just have that thought process first so you're at

01:16:28

least confident in what you chose yeah when you were calling F read you were by definition

01:16:41

in the forensics problem said reading bytes from disk into memory when you were calling F right you

01:16:47

were copying bytes from memory back to disk if that answers the question okay other questions yeah

01:16:58

why do i say size plus one in line 16 because the whole goal is to make room in this array for the

01:17:05

newly imported number that the human just typed in and so whatever the current size of the array is

01:17:10

I clearly need one more space it does because it does repeat on and on because at the moment

01:17:20

I'm inside of this while loop so we do need to ask a question when is the human done inputting

01:17:26

and it turns out and this is non obvious and it's not the best user experience on a keyboard for the

01:17:31

human but we can actually detect the following sentiment if user is done inputting numbers then

01:17:40

let's go ahead and break but the question then is how do you express that pseudo code well you could

01:17:47

in some programs maybe type q for quit but is that going to work when using get in could we detect q

01:17:54

why not

01:18:02

exactly because get in immediately prompts you for another in so because of the way we design this is 50

01:18:06

library you can't detect q or you can't have the human type quit unless you don't use get in you instead use

01:18:16

we could use get string and then every time the human types in a number we could use like a to i

01:18:21

to convert it to an int but if the human types in q or q u i t a string also we could just have

01:18:28

an if condition with string compare and quit but honestly then you're reinventing like get in

01:18:33

so trade off anyhow a common way to work around this would be you know that control c quits programs

01:18:39

perhaps cancels out of your program there's another popular keystroke control d that sends what's

01:18:44

called end of file it simulates the end of a file it simulates the end of the human's input so it's kind of

01:18:51

like the period at the end of an English sentence so if you want to signal to a computer that's waiting

01:18:55

for input from you that you don't want to quit the program that would be control c but you just want to

01:19:00

be done input input to the computer you hit control d otherwise known as EOF and the way to express this

01:19:06

and you would only know this from documentation would be to say something like this if the number

01:19:11

the human type in equals end of file but there is no such thing in this context you actually do this

01:19:17

because of how the CS50 library works turns out that if the only values a function can return

01:19:23

our integers that means you can return zero one negative one two billion negative two billion

01:19:29

give or take what humans did for years with old programming languages is they just steal one

01:19:35

or few numbers for instance you steal like the number two billion and call it int max the maximum

01:19:40

integer and you just say you can never actually type two billion because we are using that as a special

01:19:45

value to signify that the human hit control d or you could do negative two billion or you could do zero

01:19:51

or 50 but at some point you have to steal one of the four billion available numbers to use as a

01:19:56

centenal value a special value that you can then check for as a constant so anyhow this just means

01:20:03

when the user's done a typing input go ahead and break out of this while loop and as on the side let me

01:20:08

fix one thing it turns out things can go wrong with realloc and if realloc fails to allocate memory

01:20:15

it can return null a special value that just means something went wrong it's an invalid pointer

01:20:20

it's the address zero and so it turns out there's a subtle bug here where technically I should

01:20:26

actually do this store realloc's return value in a temporary variable because if temp equals null

01:20:34

something went wrong and I should actually go ahead and quit out of this program but let me wave

01:20:40

my hand at that for now because it's more of a corner case but you'll see in the online version

01:20:44

of this program we have additional error checking that just checks in the rare case that realloc fails

01:20:50

clean it up and return properly but a wave to the online code for that all right any questions on that

01:20:57

example before we move on yeah yes good question when you call realloc and it ends up allocating more space

01:21:14

does it clear uh the original memory no and that is where garbage values come from for instance

01:21:21

because if they're just left in memory from the previous use other questions yeah what does it use

01:21:26

your user actually type to break oh control D control D and it's not break it's to send

01:21:34

end of file end of input control C kills or breaks out of the program itself

01:21:39

same as int max yes correct in the CS50 library int max yes is the symbol yes yeah good

01:21:52

you'll suggest us to use it to like like say do you want to enter another member yes or no absolutely

01:21:58

we could add more logic and you could use get string and we could prompt him or her hey do you

01:22:01

want to input another number only downside of that would be now I have to type in not only my number

01:22:06

but like yes or no constantly so just to trade off user interface wise all right so let me go

01:22:11

ahead and let me go ahead and return zero here just as my simple solutions to this problem

01:22:18

of something going wrong I've just compiled this program let me go ahead and run it I'm going to type in

01:22:24

one number two numbers three numbers and now I'm bored I don't want to keep doing this how do I tell

01:22:30

the computer I'm done control D and oh okay that's correct behavior because I forgot a key step

01:22:40

what's that yeah I'm not actually doing anything with the values I should probably like four

01:22:47

int I get zero I less than size I plus plus code we had before and I should probably print out

01:22:53

you input it percent I comma this save that make list one so all I did was read the printing code

01:23:03

now if I rerun this one two three control D damn it oh okay now I broke my code here let me do this

01:23:16

we're going to get rid of this error checking because I'm not actually ever resizing numbers gets

01:23:20

realloc oh and maybe someone was chiming chiming in with this numbers bracket size gets the

01:23:28

users input size plus plus was this a key detail someone wanted me to do okay so I didn't actually

01:23:36

finish the program earlier notice we left off as follows hey computer give me an array of size zero

01:23:42

initially that's null there's no memory for it therefore the size of this array is zero do the following

01:23:49

forever get a number from the human if the number equals the special value int max just break out

01:23:55

because the program is done and actually sorry this is why I write these in advance too

01:24:04

okay go ahead and prompt the user for number if they have inputed the control D just break out of this

01:24:12

loop however if the size of the array equals its current capacity go ahead and reallocate space for

01:24:18

this thing being one number bigger than it previously was now assuming that says succeeded and we have

01:24:25

memory go ahead and just like our list zero example store in the numbers array at the current location

01:24:31

which is zero whatever number the human typed in and then increment the size by one to remember what we have

01:24:38

done I'm also though going to need to do capacity plus plus here to remember that we've increased

01:24:43

the capacity of the array so again two new measures capacity is how much space there is in total

01:24:48

size is how much we're using they happen to be identical at the moment because we're growing this

01:24:54

thing step by step by step all right let me go ahead and hit save let me go ahead and compile this one

01:25:00

last time dot slash list one and input one two three control D okay now it's just an aesthetic bug

01:25:09

I forgot my back slash n so just to prove that I can actually program dot slash list one one two three

01:25:18

control D all right so you inputed one and the reason it didn't move to another line is because

01:25:23

control D gets sent immediately without hitting enter all right that's all using arrays now let's do the

01:25:31

sort of cake baked already and pull it out of the oven the third and final example here

01:25:37

is list two and actually before we get there let me note one thing yeah let's do one last thing here

01:25:43

let me go ahead and run per earlier our new friend valigrant on list one enter it's waiting for me

01:25:52

to type in one two three let me go ahead and hit control D interesting I seem to have a buggy program

01:25:59

even though I claimed a moment ago that I knew what I was doing 12 bytes in one blocks are

01:26:04

definitely lost and lost record one of one again I don't understand most of those words but 12 bytes

01:26:09

definitely lost probably my fault why is it 12 and what are those 12 bytes yeah you have I think you

01:26:18

made three integers yeah one two and three and each one is four bytes you know three the

01:26:24

exactly I typed in three numbers one two and three each of those is four bytes on this computer that's

01:26:28

12 three times four and so I'd never freed them seems to be the source of the issue so at the end

01:26:34

let's just prove that valigrant can detect correctness as well free my numbers semicolon let me go ahead

01:26:41

and rerun make list one and now let me increase the size of this and do valigrant again on list one

01:26:48

typing in the same values one two and three control D all heat blocks were freed no leaks are possible

01:26:55

so again valigrant is your friend it finds problems that you didn't even necessarily notice and you

01:26:58

didn't have to read through your lines of code again and again to identify the source of the

01:27:03

issue necessarily all right any questions then on these arrays that are dynamically allocated

01:27:08

and the bugs we find there in with valigrant all right so the last demonstration of code is going to

01:27:16

be this I have stolen for this final example some of the building blocks that we had on the screen earlier

01:27:23

in my code for list two dot c I need a structure called node and that node as we claimed earlier

01:27:30

with our human volunteers is going to contain a number called number we'll call it this time instead of

01:27:35

n and it's going to contain a pointer called next to another such node so that's copied and pasted

01:27:41

earlier albeit with the integer rename to number for clarity now notice in main what I'm doing first

01:27:48

go ahead and allocate in array of no space initially so this was like when combi was holding up

01:27:55

first and representing the beginning of our data structure this is the analog using an array

01:28:01

that the piece of paper that would be held up here would be numbers and it's just pointing at nothing

01:28:05

at all like left hand down on the floor because there's no memory yet allocated but then go ahead

01:28:11

and while true go ahead and get an integer from the user with this code here check if the user hit

01:28:17

control d as with this arcane technique and then our code is similar in spirit but we have to

01:28:25

stitch these things together allocate space for the number so when I mallocked an additional

01:28:31

volunteer from the audience and here she came down the equivalent in code is this hey computer

01:28:36

allocate with mallock enough space to fit the size of a node then store the results in a pointer

01:28:43

called n so node star n just means give me a pointer to a node call it n and store the address that was just

01:28:51

allocated from the audience as before why do I have these lines of code here that have highlighted in blue what's

01:28:56

that expressing if bang n or if not n would be how you pronounce it what's going on there

01:29:10

yeah if there's no more memory that you can point to then it fails exactly this isn't going to

01:29:17

happen all that often but if the computer is out of memory and therefore mallock fails you don't want

01:29:22

the program just to fret crash or freeze like all of us hate when that happens on macOS or windows

01:29:26

so check for it if not n or equivalently if n equals equals null just return one quick

01:29:33

gracefully even though annoyingly but don't just crash or do something unexpected so you can simplify

01:29:40

that check to just if not n if n is not a valid pointer return one now here's the code with which we

01:29:48

were implementing the demonstration with our humans and this is the scariest looking or most

01:29:52

cryptic at least looking code we're going to see in c today is our final day in c sort of the

01:29:58

we've been running up a really steep hill of late learning about memory and now data structures

01:30:03

and syntax the last of our syntax and c so what are the symbols to be aware of this line of code here

01:30:12

is how I handed one of our volunteers a piece of paper on the right hand side is the number

01:30:17

that was typed in 55 or 5 or 20 or whatever the value is on the left hand side is where you

01:30:23

want to put it n and then literally an arrow number does this it has with mallock a liner so

01:30:32

prior given me in memory just one of these big rectangles and again the top of this in this example

01:30:38

is called number and the bottom is called next so that's our human having stood up from the

01:30:44

back of the room when I hand that human a number like 55 it visually goes there the line of code

01:30:51

with which you achieve that is this here because notice on line 31 here when I mallocked that note

01:30:58

I stored it's address in a variable called n and that's a pointer as drawn with an arrow to that

01:31:05

big node or if we really want to be nitpicky if this is an address 100 yes then the pointer actually

01:31:10

has the value 100 in it but again that's rarely useful information so we can

01:31:14

abstract the way with just an arrow so line 31 is what creates those boxes on the screen line

01:31:22

38 is what puts the number for instance 55 into the box exactly much like I handed a piece of paper

01:31:30

over so what is this this is the only real new notation today even though we're using lots of

01:31:37

stars elsewhere arrow this is wonderfully the first time in see it actually maps to our pictures

01:31:44

if n is the variable and you do n arrow something that means follow the arrow kind of like

01:31:48

shoots and letters if you grew up playing that and then put the number where the arrow has led you

01:31:54

in the field called number so as an aside we can think about this a different way and is what

01:32:01

data type what is this thing in blue n it's a pointer and it's a pointer two one of these things

01:32:12

that we created earlier so we're not doing students anymore with our structures we're implementing

01:32:16

nodes which have numbers and next pointers so it turns out that if n is a node or n is a pointer

01:32:23

to a node recall that dot notation from before this is not how you access number in this case because

01:32:30

n is not a node itself it's a pointer but if n is a pointer how do you go to a pointer how do you go

01:32:36

to an address with what notation star so recall from last week if we want to go to an address

01:32:44

you could do syntax like this ignore the parentheses for a moment just star n means if n is an

01:32:49

address of a chunk of memory star n means go there once you're there you're like conceptually right

01:32:55

here top left hand corner how do you access individual fields like number or next you use dot notation

01:33:01

so if you literally do star n dot number that means go to the address and access the number field

01:33:09

there's nice syntactic sugar in c which is just a fancy way of saying short hand notation

01:33:15

with which is the arrow but that's all it is this arrow notation doesn't do anything new it just combines

01:33:21

go there with access a field in a struct all in one breath if you will and this just looks a little

01:33:27

prettier this when I told our volunteers earlier point your hand down at the floor that's all that

01:33:33

line of code is doing it's saying go to n's address which is here access the next field and right

01:33:39

in that field null which is just the address zero the default special address like pointing at the

01:33:46

this line of code 40 is just a quick error check if numbers what is that equivalent to that's

01:33:52

actually just saying if numbers not equals null so if numbers is legitimate if malloc worked correctly

01:33:58

then let's go ahead and do the following this is a mouthful what is going on here so this is a

01:34:06

for loop that's not using numbers well or is it's almost every for loop we've written a new

01:34:11

probably written just uses ij maybe k but just integers probably but that doesn't have to be the

01:34:18

case what is a pointer it's an address what is an address a place in memory or a number really

01:34:26

so you can certainly use four loops just involving addresses but how so consider this line of code

01:34:32

this here looks different today but it's everything before that first semicolon that's just

01:34:37

where you initialize a value so this is like saying hey computer go ahead and give me a variable

01:34:44

called ptr and initialize it to be the start of my list then I'm saying hey computer do this

01:34:54

so long as pointer does not equal null and then what am I doing if and let's ignore this for now

01:35:01

as an error check go ahead and sorry let me think for one second if okay let's do this

01:35:17

what is what are these lines of code doing this is the code that was actually suggested the very

01:35:22

end of our human example like what if we wanted to insert all of the elements at the end of the

01:35:27

link list how do you express that so in this highlighted lines of code we're asking the question

01:35:34

if the current pointers next field is null we found the end go ahead and update that next field

01:35:40

to equal n and then break so let me translate this to an actual picture when using smaller boxes

01:35:47

that makes clear where something is going so suppose that this program's been running for a

01:35:52

little while and we have a length list that looks like this where this one's pointing here and maybe

01:36:00

this one's pointing here and this says null here and this points here and the numbers are as we've

01:36:06

been using today a 42 50 13 so the start of this list is called numbers this points to the

01:36:18

start of the list what am I doing in this for loop I am just implementing the following logic with this loop

01:36:25

give me a variable called pointer as represented in the story by like my left finger here and

01:36:30

initialize that to be the start of the list if that nodes next pointer is not is equal to null

01:36:39

add a new node here but this is not null I want to follow the breadcrumbs to here and then oh

01:36:47

we're at the end of the list I want to insert this new thing here so how do I express this code

01:36:53

actually in C so if I look back up here this is the line of code that allocates my left finger here

01:37:04

called pointer and initialize it to equal numbers which is the same as pointing at the first element it's kind of like

01:37:10

combi was representing first earlier but now our array is called numbers next what am I doing

01:37:16

does pointer equal null will know if my left hand is pointing here it obviously doesn't equal null so we

01:37:22

don't have to worry yet then what do I want to do if pointer next equals null well what does that mean

01:37:28

well pointer is here pointer arrow next means here does this equal null in the story

01:37:35

I mean it literally doesn't because it's null is not written there null is way down there so the

01:37:40

condition does not pass so what do I do next if pointer is equal to null doesn't apply

01:37:48

here's a weird update pointer gets pointer next so it's cryptic looking syntax but if pointer is pointing here

01:37:57

what is pointer next that's just this right this is n this is next or this is number this is next

01:38:03

so pointer next is this so what is this value well this is a pointer pointing here so that

01:38:10

highlighted block of code pointer equals pointer next has the effect visually of doing this why

01:38:19

if the arrows are a little too magical just think about these being addresses if this is saying

01:38:23

uh the next address is location one hundred pointer equals pointer next is like saying well this

01:38:29

also equals one hundred whatever one hundred is for instance over here is what both arrows should

01:38:34

now point that and if you now repeat this process and repeat this process eventually that question

01:38:40

we asked earlier is going to apply if pointer next equals null what do I want to do well if pointer next

01:38:48

equals null there's two lines going on pointer next equals n so pointer next is no longer null

01:38:56

it should instead be pointing at n which is the new node and then that's it because this was

01:39:03

already initialized to null and let's propose this was 55 and we're done so so much easier to do

01:39:08

obviously in person with just humans moving around and pointing with their arrows with their left hands

01:39:13

but in code you just have to think about the basic building blocks what is each of these values

01:39:18

where is each of it pointing and which of those fields do need to update and the only new code

01:39:24

here even though we're kind of combining it all in one massive example is this we are actually

01:39:28

using arrow notation to say go to that address and access some value there in and this condition

01:39:35

down here which I'll wave my hand out for now just handles the situation where the list is initially

01:39:40

empty any questions on this thus far all right so let's take a look more graphically at some

01:39:51

final problems we can solve and what you'll see in the in the in the days ahead is the following

01:39:56

when it comes to these linked lists and more we now have the ability to actually allocate things in

01:40:01

memory dynamically we don't necessarily know in advance how many numbers we have or in the

01:40:04

case of the next problem set how many words we have we have the ability to use malloc and maybe even

01:40:09

re-alloc to grow and grow our data structure in memory and we have the ability in code to actually

01:40:15

traverse those values in such a way that we can access memory that's all over the board now

01:40:20

and not necessarily back to back to back but what happens if we want to combine these ideas into

01:40:26

fancier solutions still well let's take a look at that in particular if I go let's say over here

01:40:34

to the following let's consider a problem we might now solve if I want to restore everyone's name in

01:40:40

this room in a data structure I could do what well we could use an array so I could actually draw

01:40:47

it decide how many people are in the room let's call it in and actually draw in boxes on the board

01:40:52

and then iteratively ask everyone for their name and actually write it down if I then wanted to take

01:40:57

attendance at their after and say oh is Alice here or is Bob here or is Karim here or Brian

01:41:02

I could just look through that array and say yes or no that human is here but what's the running

01:41:07

time of that algorithm how long would it take to look up a name in a data structure where I just

01:41:12

drawn it as an array of big list on the board what's that a big oven right because if it's just a

01:41:18

list of names it's going to take big oven and frankly that seems a little slow how could I do

01:41:23

an optimization well what if we combine some of these ideas a razor nice because they give me random

01:41:28

sort of instant access to memory locations but link lists are nice because they allow me to dynamically

01:41:34

add or subtract elements even if I want from the list so you know what instead of writing down

01:41:40

everyone's names like Alice and Bob and Charlie like this in just one big array of some fixed

01:41:49

size that necessarily that might paint me into a corner now I have room for one more name

01:41:55

what if I instead do things a little more cleverly so when I'm actually jotting down everyone's

01:42:00

name in the room what if I instead did okay is Alice here all right Alice is here and then Brian

01:42:07

is here and put Brian here and then maybe Charlie is here all right so Charlie and then maybe

01:42:15

Arnold is here where should I put Arnold so also starts with A you know what let's just put Arnold

01:42:20

here Arnold and let's go and Abby is here so you know what let's just put Abby appear as well

01:42:28

Bob came as well so Bob so what's the pattern obviously following is I'm hearing names called out

01:42:34

I'll alphabetically sort it kind of like Abby kind of ended up in the weird place here but that's fine

01:42:42

because I didn't hear her name first but I did kind of bucketize people into different

01:42:48

rows of the board in other words all of the A names I seem to just write down for convenience at the top

01:42:53

and then all the B names together and C names and probably if I kept going I could do this all the way

01:42:56

through Z in English alphabet so what's nice about this is that yeah I'm making lists of names

01:43:02

but how long is each of those lists if there's N people in the room each of my list is not going to be

01:43:08

N long which is slow it's going to be what N divided by 26 give or take if we assume that there's an

01:43:16

equal number of people with Z names and A names it's going to be roughly N divided by 26 so that I have

01:43:21

these like chains of human names but they're much shorter than they would have been if I just grouped

01:43:26

everyone together and this is a fundamental technique in programming called hashing it turns out

01:43:32

there's things in this world called hash functions these are just mathematical or verbal or code

01:43:38

implemented functions that take us input something and produce as output a number typically a number

01:43:44

from zero to say 25 or from one to 26 but they can also output strings and other contexts as well

01:43:50

so my hash function here in my mind is if you hand me a name I'm going to look at the first letter in

01:43:55

your name and if it's A I'm putting you in location zero if it's B I'm going to put you in location one

01:44:00

if it's a Z I'm going to put you in location 25 at the end so these are all buckets I've got

01:44:05

so to speak in computer science like 26 buckets or room on the board that represent the

01:44:10

starts of people's names so what is that well it would seem that if I don't know in advance how many

01:44:15

A names I have that's kind of like drawing this as a linked list if you will that might just get

01:44:21

longer and longer but I do know that I only have a finite number of first letters so that

01:44:28

at the risk of drawing a little messy is kind of like drawing what data structure it's kind of like

01:44:34

drawing in array that just has 26 spots and what's nice about in array is that I have random access

01:44:43

I can jump right to any letter of the alphabet in constant time one step and once I get there

01:44:48

I'm still going to see a list of names thankfully thanks to linked lists that list can be

01:44:53

short or long but on average let's say it's going to be 126th the length that it would have

01:44:59

been if I just used one array or one linked list so this technique of using a hash function which

01:45:07

again I've defined as you give me a name I take that as input I look at the first letter and I

01:45:11

return his output a number from zero to 25 a hash function let's you create a hash table and

01:45:17

there's different ways to implement hash tables but perhaps one of the most common is indeed like this

01:45:22

you decide in advance on the size of an array but that array does not contain the strings or the

01:45:29

humans names that array actually contains a linked list and it's the linked lists that contain the

01:45:36

names so we borrow ideas from like week two we merge them with an idea today from week four of

01:45:42

adding arrays to linked list respectively and we kind of get the best of both worlds because I can

01:45:47

immediately jump to any letter the alphabet super fast and once I'm there yeah there's a list but

01:45:52

it's not nearly as long as it would have been if I didn't use this trick so what's the running

01:45:56

time of all of this well it turns out that a hash table in the worst case might still take you

01:46:02

how many steps to find someone's name once it's been added to the list very worst case how many

01:46:07

steps if there's n people in the room maybe n y it's kind of a perverse situation but can you

01:46:15

contrive a scenario in which even though we're doing this fanciness it still takes me n steps to

01:46:20

confirm or deny that someone's here yeah everyone's name starts with the same letter for some

01:46:26

weird reason now it's a little silly in the human world but it could happen if you're just talking

01:46:29

data or whatever in the in the computer world this can devolve into short an array would just one

01:46:36

really long linked list but in practice that's not likely gonna happen right if we actually spent the

01:46:41

time here and ask everyone for their name we'd probably get a reasonably uniform distribution

01:46:46

of letters at least as as statistically likely would just human names so that would kind of spread

01:46:51

things out and so there's this fundamental distinction between sort of real world running time

01:46:56

or wall clock time how many seconds are actually spinning on a clock versus asymptotic running

01:47:00

time we've talked for a couple weeks now about running time is being big O of N and that might be

01:47:05

still the case that a hash table yes in the worst case it's still a big O of N data structure

01:47:11

because in the worst case it's gonna take n steps but in the real world big O of N is really big O of N

01:47:17

divided by 26 even though we always ignore those lower order terms but when it's you the human

01:47:22

running the code and analyzing the data running 26 times faster is actually real time saved even

01:47:30

though a mathematician might say ah that's the same fundamentally and indeed one of the problems

01:47:35

ahead for the next problem said is gonna be such to so-so exactly what the implications are in

01:47:40

your own code for actual wall clock running time and making smarter design decisions like something

01:47:46

like this can actually really speed up your code to be 26 times as fast even though yes a

01:47:51

theoretician would say ah but that's still asymptotically or mathematically equivalent

01:47:56

to just something linear so it's this fine tuning that'll make your code even better and better

01:48:01

now frankly hashing on first names probably isn't the smartest thing alone right like it does

01:48:06

anyone's this is gonna be hard especially uh uh is it does anyone's name start with z here

01:48:13

is my less not here but thank you for that for a fair counter example but she's not here so look there's

01:48:17

no z so now we're down to 25 possible values and I could probably pick some less common letters too

01:48:22

the pointers there's probably a few more a's than there are z's or a few more b's and there are

01:48:26

cues just by nature of human names so maybe just choosing the first letter isn't good enough and frankly

01:48:33

with 26 names suppose we did this for all of Harvard and had thousands of names each of my

01:48:38

chains might still have hundreds or thousands of names so another design question is going to be

01:48:42

well how many buckets should you have how big should the array be maybe you shouldn't look at the

01:48:46

first letter what if you look at the first and the second letter together so a a and a b and a c

01:48:52

and then dot dot dot b a b c so you could come up with more and more buckets but what else how

01:48:58

else might we kind of uh uniformly distribute people what do all of you have that we could use

01:49:03

as input to a hash function okay what you're going to do last name which might give us a different

01:49:10

or similar distribution yeah what's that you can use your ID number and actually look at the

01:49:16

first digit of your ID and odds are it's 0 through 9 so we could probably at least get 10 buckets

01:49:21

that way and that's probably uniformly distributed I'm not sure we could use birth birth dates in

01:49:27

some way like we could put all of the freshman in one bucket all the seniors in another bucket

01:49:31

and everyone else and so forth in their own buckets which would also give us some input so again

01:49:35

a hash function is entirely up to you to program and design the goal though is to smooth things out

01:49:41

you want to have roughly the same number of things in each link list just so that you have found

01:49:47

about the same performance across all of these various inputs so let's take a look at a couple

01:49:52

of other data structures again in this abstract way now that we know that even though it's not

01:49:56

obvious at first attempt we know how to construct arrays we kind of know now how to construct

01:50:01

linked lists it stands to reason we could implement them together in code what else could we do

01:50:06

now with these building blocks so for instance this structure here is a very common one known as

01:50:13

a tree uh tree like a family tree where there's one patriarch or matriarch at the top and then

01:50:17

their children and then their grandchildren and great grandchildren and so forth and what's nice

01:50:22

about a tree structure is that if you're storing data you can actually store the data in

01:50:27

clever ways to the left child to the right child and so forth as follows notice here there's

01:50:35

something curious about all the numbers in this data structure what is not worthy about them

01:50:45

what is not worthy yeah what's that they are multiples of 11 that was just to make them look

01:50:51

pretty though uh by the author here yeah yeah there's a mathematical significance too like no matter

01:50:59

what node or circle you look at the value in it is bigger than the left child and it's smaller

01:51:06

than the right child so it's kind of in between any circle you look at the number to the left

01:51:11

to smaller the number to the right is bigger and I think that applies universally all over the

01:51:16

place yes so what does that mean well we're call from like week zero when we had a whole bunch of like

01:51:21

numbers that we were uh phone book pages that we were searching one two three four five six let's

01:51:26

give ourselves a seventh one recall that when we did divide and conquer or binary search we did it

01:51:31

on an array and what was nice about binary search was we started in the middle and then we

01:51:35

maybe went left or we maybe went right and we kind of divided and divided and divided and conquer the

01:51:39

problem much more efficiently in logarithmic time than it would have been if we did it linearly

01:51:44

but we know now weeks later that arrays kind of limiting right if I keep storing all of my

01:51:50

values in an array what can I not do with the array make it bigger right I can't add an elements

01:51:59

to it without copying every darn element as we've discussed thus far today but what if I was a

01:52:04

little smarter about it what if I stored my values not just in an array but I started storing them

01:52:09

in these circles let's call them nodes and each of those nodes is really just an integer plus

01:52:16

two additional values how would we implement this data structure in memory well here's an

01:52:22

int n could represent the number in question and we could put that in a data structure called a node

01:52:27

that just has the same syntax as earlier today but I've left room for two more fields what is

01:52:32

it that I want to represent in code if I want to start storing my numbers not in this old school

01:52:37

week zero or a but in a tree two two pointers right a tree as drawn here with literally with

01:52:48

arrows is just like saying every one of these nodes or circles has a left child and a right

01:52:53

child how do you implement children well you can literally just use pointer notation as well here

01:52:59

a left child is just a pointer to another struct on the left and a right child is just another

01:53:03

pointer to the child on the right and what's nice about this ultimately is that we can now traverse

01:53:09

this tree just as efficiently as we can traverse this array because notice if I want to search for the

01:53:16

number 66 how many steps does it take me if I start at the top just like combi represented the

01:53:25

start of our linked list so in the world of a tree does the root have special significance and that's

01:53:30

where we always begin so how many steps does it take me to find 66 given the top it looks like

01:53:36

two yeah like want two or three right I started the top I look at it say on 55 which way do I go

01:53:41

I go to the right then I see 77 okay which way do I go I go to the left so it's the same logic

01:53:47

as we zero in dividing and conquering the phone book or an array a couple weeks later but we get to the

01:53:52

number we care about pretty quickly and it's not linear and in fact if we actually did out the

01:53:57

math what's really cool about a binary search tree is that if you have n elements n circles

01:54:02

the height of that tree is by definition mathematically log n so the height of the tree just so happens

01:54:09

to correspond to exactly how many times you can take n and divide it divide it divide it divide it into

01:54:15

and you can actually see this if you think about it the reverse direction on the bottom row there are

01:54:20

how many elements all right and on the middle row there's two and on the top row there's one

01:54:26

so you can actually see it in the reverse direction this is like dividing conquer but in a different

01:54:30

conceptual way like it's being divide every every row in the tree has half as many elements as the one

01:54:37

below it and so the implication of that is just like from week zero in the phone book when we're

01:54:42

dividing and dividing and dividing in half and half and half so this is only to say now that we have

01:54:47

structures and pointers we can build something like this but let's try one other example here too

01:54:53

this is a crazy looking example but it's kind of amazing suppose that if we want to store

01:55:00

a dictionary of words so not humans names this time but like English words so mariam weptor

01:55:05

Oxford English dictionary has like what thousands hundreds of thousands of words these days and English

01:55:10

for instance how do you actually store those well if you just look upwards in the dictionary

01:55:14

back in yesterday year like that is linear like you have to start at the beginning and look through it

01:55:18

page by page looking for words or you could be a little smarter because the words in any

01:55:22

dictionary are hopefully alphabetized you can do the Mike Smith style divide and conquer by

01:55:26

going to the middle then the middle of the middle and so forth log event but what if I told you

01:55:31

you could look upwards in constant time some fixed number of steps none of these divide and

01:55:37

conquer complexity no log and just constant time you want to word go get it instantly that's where

01:55:44

this last structure comes in which is called a try TRI. TRI short for retrieval even though it's

01:55:50

pronounced the opposite so a try is a tree each of whose nodes is in a red so it's like this

01:55:58

weird Frankenstein's monster kind of data structure where just really combining lots of different

01:56:03

ideas as follows and the way a try works as is implied by this partial diagram on the board

01:56:10

is that if you want to store the name Brian for instance in your dictionary is the first word

01:56:16

what you do is you start by creating a tree with just one node but that node is effectively in a

01:56:22

ray that a ray is of size let's say for simplicity 26 so a through z this location here therefore

01:56:31

represents b for Brian so if I want to insert Brian into this tree I create one node at the top

01:56:37

and then for the second letter in his name r I create another node also in a ray a through z

01:56:44

and so here I put a pointer to this node here b r i so i should have drawn some more boxes

01:56:51

by a b c d e f g h i so here I'm going to draw another pointer to b wait Brian okay that's wrong

01:57:06

I miss Billy shall be our name Billy is that b wait that no that's no

01:57:15

damn it b b b i b yes this works this works okay sorry so here we go we're inserting

01:57:21

Billy into this the fancy data structure so the first node represents the first letter the second

01:57:26

node represents the second letter the third node represents the third letter and so forth but what's

01:57:31

cool about this is the reusability so notice if this is the second letter and I counted this

01:57:36

out correctly i this is going to lead to a third node deeper in the tree where it's L that we

01:57:43

care about next and then another one down here which represents another L and I'll start drawing

01:57:48

the letters L this is b this is i L and we'll call this L and then finally another one over here

01:57:56

which is a y and this gets pointing down here this gets pointing here and so forth so in short

01:58:04

we have one node essentially for every letter in the word that we're inserting into the data

01:58:10

structure now this looks stupidly inefficient at the moment because the store b i L L y how much

01:58:18

memory did i just use 26 plus 26 plus 26 plus 26 plus 26 just a store five characters i use 26 times

01:58:30

five but this is kind of a thematic in computer science spend a little more space and i bet

01:58:35

I can decrease the amount of time it takes to find anyone because now no matter how many other

01:58:41

students are in this data structure and for instance let's do another one if we had another one

01:58:47

like bob so b is the same first letter that leads us to the second node oh is somewhere else in this

01:58:54

array say over here so this represents oh and then bob has another one so there's going to be another

01:59:01

array here and this is why the picture above draws this so succinct this is how we might store bob

01:59:08

so b i l l y or you can follow a different route b oh b so we can start to reuse some of these arrays

01:59:19

so there's where you start to get some of the efficiency anytime name share a few letters

01:59:23

then you start reusing those same nodes so it's not super super wasteful but the question now is

01:59:28

if there's like a thousand students in the class or a thousand students in the room we're going to have

01:59:33

a lot of nodes there on the board but how many steps does it take to find billy or bob or any name

01:59:41

with this data structure and to conclude yes or no that student is in the class

01:59:48

so like five for billy three for bob and notice none of that math has any relationship to how many

01:59:57

students are in the room if we instead wrote out a long list of a thousand names in the worst case

02:00:02

it might take me a thousand steps to find billy or bob maybe i could be a little smarter if i

02:00:06

sort it but in the worst case big o event it's linear or if i used a hash table before and maybe there's

02:00:12

a thousand students in the room but okay there's 26 letters in the English alphabet at least so that's

02:00:16

26 buckets so maybe it's a thousand divided by 26 worst case if i'm using those chains those linked lists

02:00:22

inside my array but wait a minute if i'm using this structure a try where every node in the tree

02:00:28

is just in an array that leads me to the next node a lot of breadcrumbs b i l l y is five and always five

02:00:36

b o b is always three b r i a and would have been five as well none of these tutorials has any

02:00:44

impact or any influence from the number of total names in the data structure so a try in some

02:00:51

senses this amazing like this amazing holy grail in that by combining these various data structures

02:00:56

now you get constant time but you do pay a price and just to be clear what is the price we seem to be paying

02:01:04

memory and in fact this is why i'm not really drawing it much more because it just becomes a big

02:01:08

mess on the screen because it's hard to draw such wide data structures it's taking a huge amount of

02:01:12

memory but theoretically it's being coming faster yeah question oh the question so it's a bit of

02:01:24

simplification if you were storing both bob and bobby you would actually keep going so each of

02:01:29

these elements is not just one letter you also have essentially a node there or some other

02:01:35

data structure that says either stop here or continue and you'll see actually in the problems

02:01:40

that will propose to you how you can represent that idea if you choose to go this route indeed

02:01:44

the challenge ahead ultimately is something quite like this you will implement your very own spell

02:01:48

checker and we will give you code they could you start it with this process and of course

02:01:52

a spell checker these days in google docs and Microsoft Word just like underlines in red misspelled

02:01:56

words but what's going on and how is it that word or google docs can spell check your English or

02:02:01

whatever language so quickly well it has a dictionary in memory probably with tens of thousands

02:02:05

or hundreds of thousands of words and all they're doing constantly is every time you type a word and hit

02:02:11

the space bar or period or enter it's quickly looking up that new word or those words and it's

02:02:16

dictionary and saying yes or no should I squiggle red line underneath this word and so what we're

02:02:21

going to do is give you a big text file asky text containing like a hundred plus thousand words

02:02:26

you're going to have to decide how to load those into memory not just correctly but in a way

02:02:31

that's well designed and we'll even give you a tool if you choose to use it that times how long

02:02:36

your code takes and it even counts how much RAM you're actually using but the key goals for this

02:02:40

week and our final we can see as to take some of these basic building blocks like arrays and pointers

02:02:47

and structures and decide for yourselves how your most comfortable stitching them together to

02:02:53

what extent you want to really fine tune your code beyond just getting it correct and it

02:02:57

give you a better sense of the underlying code that people have had to write for years and libraries

02:03:02

to make programming doable a lot scratch because in just a few weeks we're going to transition

02:03:07

to Python and the dozens of lines of code you've been writing now are going to be widow down to

02:03:12

one line two line because we're going to get a lot more features from these newer fancier languages

02:03:16

but you'll hopefully have an appreciation of what is actually going on underneath that hood

02:03:20

so I'll stick around for anyone I want questions let's call it a day take a duck on your way out

02:03:24

for roommates as well we'll see you next time

00:00

Einführung in die CS50 IDE

01:52

Debugger und Code-Prüfwerkzeuge

04:52

Konzepte des Speichermanagements

06:13

Verwendung des Valgrind-Tools

07:00

Speicherzuweisung und Zugriff auf den Speicher

16:34

Abschliessende Valgrind-Prüfungen

18:01

Verantwortlichkeiten im Bereich Speicherverwaltung

19:39

Dynamischer Speicher und Zeiger

30:45

Verstehen von Strukturen in C

31:27

Erstellen benutzerdefinierter Datentypen

33:01

Benutzerdefinierte Datenstrukturen mit Strukturen

34:01

Kapselung in der Programmierung

35:44

Einschränkungen von Arrays

37:40

Ändern der Grösse von Arrays in C

40:10

Einführung in verkettete Listen

49:45

Erstellen einer Knotenstruktur

50:31

Das Verständnis des Konzepts der verketteten Liste

51:44

Zeigerdarstellung in verketteten Listen

52:30

Elemente in eine verkettete Liste einfügen

01:01:42

Dynamische Listen Erstellung in C

01:06:23

Eingabeverarbeitung und Fehler

01:08:00

Kapazitätsprüfung in der Programmierung

01:08:38

Dynamische Benutzereingabe

01:09:41

Konzept der Grössenänderung von Arrays

01:10:00

Bäume in Datenstrukturen verstehen

01:10:28

Verstehen der dynamischen Speicherzuweisung

01:10:55

Dynamische Array-Manipulation

01:12:20

Speicherverwaltung mit Realloc

01:13:40

Binärbäume erklärt

01:17:10

Grundlagen der Baumdurchläufe

01:17:16

Benutzereingabeverarbeitung in C

01:20:50

Implementierung von Binären Suchbäumen

01:22:20

Einführung in Hash-Tabellen

01:23:20

Erklärung der Hash-Funktion

01:25:10

Kollisionsauflösungstechniken

01:26:33

Einführung in Datenstrukturen

01:27:40

Implementierung von Hash-Tabellen in C

01:04

Was sind die Hauptfunktionen von CS50 IDE, die das Programmieren einfacher machen?

05:17

Wie kann Valgrind dabei helfen, Speicherlecks in Programmen zu finden?

09:38

Warum kann es ein Problem sein, auf nicht erlaubten Speicher in C zuzugreifen?

13:58

Was zeigt Valgrind über die Risiken von schlechtem Umgang mit Speicher?

16:49

Wie kannst du sicherstellen, dass der ganze zugewiesene Speicher in deinem Programm richtig freigegeben wird?

20:25

Was lehrt dich das Gummienten-Debugging über das Finden von Bugs?

30:29

Warum ist es gefährlich, einen Pointer nicht zu dereferenzieren, bevor man ihn initialisiert?

32:32

Was sind die Vorteile, wenn man eigene Datentypen in C definiert?

34:05

Was sind die Vorteile, wenn man verwandte Infos in Structs packt?

35:55

Warum können Arrays in C nicht richtig mit verschiedenen Datentypen umgehen?

39:13

Wie hilft es, den Speicher neu zuzuweisen, um mit Arrays in C besser klarzukommen?

49:45

Wie erklärst du einen Strukturtyp, bevor du ihn im C-Code benutzt?

52:34

Was sind die Schritte, um neue Elemente dynamisch in eine verkettete Liste einzufügen?

55:21

Warum ist es so wichtig, die Beziehungen zwischen den Knoten zu checken, wenn man verkettete Listen baut?

01:07:40

Wie kannst du die Eingabevalidierung für unbekannte Benutzereingaben umsetzen?

01:07:23

Was passiert, wenn die Leute unerwartete Werte wie negative Zahlen eingeben?

01:09:26

Wie gehst du mit Arrays um, wenn die Anzahl der Eingaben nicht festgelegt ist?

01:10:05

Was sind die wichtigsten Vorteile von Baumstrukturen im Datenmanagement?

01:17:15

Wie funktionieren eigentlich die verschiedenen Baumdurchlauf-Techniken in Datenstrukturen?

01:20:55

Was macht binäre Suchbäume so effizient, wenn's darum geht, Daten zu suchen und zu sortieren?

01:23:25

Wie beeinflusst eine gute Hash-Funktion die Geschwindigkeit, mit der man Daten in Hash-Tabellen abrufen kann?

01:25:15

Welche Strategien können helfen, Kollisionen in Hash-Tabellen zu vermeiden?

01:27:45

Warum sind Hash-Tabellen in manchen Fällen besser als Arrays?

01:12:13

Wie verbessert die realloc-Funktion die Speichernutzung in C?

01:17:35

Welche Strategien verbessern die Eingabeverarbeitung in C-Programmen?

01:21:15

Warum ist die Speicherneuordnung für dynamische Arrays so wichtig?


FehlersucheCS50Software-EntwicklungAlgorithmusComputerwissenschaftenLösung von ProblemenComputerprogrammierungSoftware-EntwicklungsprozessStruktur der DatenZeiger (Computerprogrammierung)Array (Datenstruktur)SpeicherverwaltungVerknüpfte Liste

Beschreibung

Die Sitzung beginnt mit einer Einführung in die CS50 IDE, eine webbasierte Programmierumgebung, die das Codieren und Debuggen erleichtert. Wichtige Funktionen wie debug 50, mit dem ihr durch den Code gehen und Variablen inspizieren könnt, werden hervorgehoben. Ausserdem wird check 50 als ein Werkzeug vorgestellt, um die Korrektheit des Codes durch automatisierte Tests zu überprüfen, ähnlich wie das Feedback von Tutoren. Die Diskussion geht auch auf die Nützlichkeit von print-tap zum Debuggen ein, mit dem Programmierer Variablenwerte und Texte ausgeben können, um Probleme in ihrem Code zu identifizieren. Darüber hinaus wird eine einzigartige Funktion erwähnt, die die Umwandlung von C-Code in Scratch ermöglicht und die Vielseitigkeit der IDE zeigt. Die Sitzung ermutigt dazu, sich mit diesen Werkzeugen vertraut zu machen, um die Programmierpraktiken und Problemlösungsfähigkeiten zu verbessern. Insgesamt liegt der Fokus darauf, die Lernenden mit den notwendigen Ressourcen auszustatten, um Programmierherausforderungen effektiv zu meistern.