All right, so this is CS50 and this is lecture four. So we're here in beautiful, low,
lecture halls and senders is in use today and we're joined by some friends that will soon
be clear and presence in just a moment. But before then, recall the last time we took a look
at CS50 IDE. This was a new web-based programming environment, similar in spirit,
to CS50 sandbox and CS50 lab, but added a few features. For instance, what features did it
add to you? To your capabilities. Yeah. What's that? The debugger. So debug 50, which opens that
side panel that allows you to step through your code, step by step, and see variables? Yeah. Sorry,
check 50 as well, which is a CS50 specific tool that allows you to check the correctness of your
code, much like the teaching fellows would when providing feedback on it, running a series of tests
that pretty much are the same tests that a lot of the homeworks will encourage you yourself to
run manually, but it just automates the process and anything else? So that is true, too. There's a little
hidden easter egg that we don't use this semester, but yes, indeed. If you look for a small
puzzle piece, you can actually convert your C code back to scratch, like puzzle pieces, and back,
and forth, and back to forth, thanks to cream and some of the team. So that is there. But by now,
it's probably better to get comfortable with text as well. So there's a couple of other
tools that we've used over time, of course. Besides, check 50 and debug 50, we've of course
used print-tap, and when is print-tap useful? Like, when might you want to use it, beyond needing
to just print something because the problem set tells you to. Yeah. Yeah, so to find where
your bug is. If you just kind of want to print out variables, value, or some kind of text,
just so you know what's going on. And you don't necessarily want to deploy debug 50, you can do that,
went out. Good, yeah. Indeed. Well, and you really, so you might want to use print-tap when
you have, maybe a nested loop, and you want to put a print-tap in the inside loop, so it's to see
when it kicks in. Of course, you could use debug 50, but you might end up running debug 50,
you're clicking next, next, next, next, next, next, there's so many times that that gets a little
tedious, but do keep in mind, you could just put a breakpoint deeper into your code as well,
and perhaps remove an earlier breakpoint as well. And honestly, all the time,
whether it's in C or other languages, do I find myself occasionally using print-tap,
just to type out print-tap in here, just so that I can literally see if my code got to a certain
point in here to see if something's printed. But the debugger you're going to find now,
and henceforth, so much more powerful, so much more versatile. So if you haven't already got into
the habit of using debug 50, biomeans start and use those breakpoints to actually walk through
your code, where you can, or see what's going on. So style 50, of course, checks the style of
your code, much like the teaching fellow's might, and it shows you in red or green,
what spaces you might want to delete, what spaces you might want to add, just a pretty
things up, so it's more readable for you and others. And then what about help 50?
When should you instinctively reach for help 50? Yeah, exactly. Yeah, when you don't understand
an error message, so you're compiling something, you're running a command, it doesn't really
quite work, and you're seeing a cryptic error message. Eventually you'll get the muscle memory
in the sort of exposure to just know, oh, I remember what that means, but until then, run
help 50 at the beginning of that same command, and it's going to try to detect what your error is
and provide TF like feedback on how to actually work around that. You'll see two on the course's
website is an wonderful handout made by Emily Hong, one of our own teaching fellows that introduces
all of these tools, and a few more, and gets you into the habit of thinking about things
as kind of a flowchart. If I have this problem, then do this, else if I have this problem,
do this other thing, so do check that out as well. But today, let's introduce really the last,
certainly for see of our command line tools that's going to help you chase down
problems in your code. Last week, recall that we talked about memory, a lot. We talked about
malloc, allocating memory, and we talked about freeing memory in the like, but it turns out you
can do a lot of damage when you start playing with memory. In fact, probably by now, almost everyone,
segmentation fault. Yeah, so that's just one of the errors that you might run into, and frankly,
you might have errors in your code now and henceforth that have bugs, but you don't even
realize it because you're just getting lucky, and the program is just not crashing, or it's not
freezing. But this can still happen, and so Valgrind is a command line program that is probably
looks the scariest of the tools we've used, but you can also use it with help 50 that just
tries to find what are called memory leaks in your program. We're called that last week,
we introduced malloc, and malloc lets you allocate memory. But if you don't free that memory,
by literally calling the free function, you're going to constantly ask your operating system,
macOS, Linux, Windows, whatever, can I have more memory? Can I have more memory? Can I have more
memory? And if you never literally hand it back by calling free, your computer may very well slow
down, or freeze, or crash, and frankly, if you've ever had that happen on your Mac or PC,
very likely, that's what some human accidentally did. You're she just allocated more and more
memory, but never really got around to freeing that memory. So Valgrind can help you find those
mistakes before you or your users do. So let's do a quick example. Let me go into CS50 IDE,
and let me go ahead and make one new program here. We'll call it memory.c, because we'll see
later today how I might chase down those memory leaks. But for now, let's start with something
even simpler, which all of you have maybe done by now, which is to accidentally touch memory
that you shouldn't, changing it, reading it, and let's see what this might mean. So let me do the
familiar at the top here, include standard IO. Well, let's not even do that yet. Let's just do this
first. Let's do int main void, just to start a simple program, and in here, let me go ahead and just
call a function called f. I don't really care what its name is for today. I just want to call
the function f, and then that's it. Now this function f, let me go ahead and define it as follows.
Void, f void, it's not going to do much of anything at all. But let's suppose just for the sake of
discussion that f's purpose and life is just to allocate memory. For whatever useful purpose,
but for now, it's just for demonstration sake. So what's the function with which you can allocate
memory? Malach. So suppose I want to malach space for, I don't know, something simple like just
one integer, like we're just doing this for demonstration purposes, or actually let's do more.
10 integers, 10 integers. I could of course do, well, give me 10, but how many bytes do I want?
How many bytes do I need for 10 integers? Yeah, so I can do literally size of int, and most likely,
the size of an int is going to be for probably on many systems today. It's just four bytes,
or 32 bits, but you don't want to hard code that list. Someone else is computer not
used those same values, so the size of an int. So 10 times the size of an int. Malach returns
what type of data. What is it going to hand me back? Yeah, returns an address or pointer.
Specifically, the address, 100, 900, whatever, of the chunk of memory, it just allocated for you.
So if I want to keep that around, I need to declare a pointer. Let's just call it x for today.
That stores that address. Could call it x, y, z, whatever, but it's not an int that it's
returning. It's the address of an int. And remember, that's what the star operator now means.
The address of some data type. It's just a number. All right, so now, if I were to first,
let's clean this up, turns out that to use Malach, I need to use standard lib.h. We've
saw that last week, albeit briefly. And then, of course, if I'm going to call f, what do I have to
do to fix this code? Yeah, I need to declare it up here, or I could just move f's implementation
up top. So I think this works, even though this program at the moment, it's completely stupid.
It doesn't do anything useful, but it will allocate memory and now do something with it as follows.
If I want to change the first value in this chunk of memory, well, how might I do that? Well,
I've asked the computer for 10 integers or rather space for 10 integers. What's interesting about Malach
is that when it returns a chunk of memory for you, it's continuous, back to back. And when you
hear continuous or back to back, what kind of data structure does that recall to mind?
In a ray. So it turns out we can treat this just random chunk of memory like it's an array.
So if we want to go to the first location in that array of memory, I can just do this and put
in the number, say, 50, or if I want to go to the next location, I can do this. Or if I want to do
the next location, I can do this. Or if I want to go to the last location, I might do this. But
is that good or bad? Why bad?
Yeah. So it's out of bounds. This is sort of week one style mistakes when it came to loops,
recall with four loops or while loops, you might go a little too far. And that's fine.
But now we actually will see we have a tool that can help us notice these things. So,
hopefully just visually it's apparent that what I have going on here is just a mind 12. I have a
variable X that's storing the address of that chunk of memory. And then online 13, I'm just
trying to access location 10 and set the value 50 there. But as you note, there is no location 10.
There's location 0, 1, 2, 3 all the way through 9. Of course. So how might we detect this with
a program? Well, let me go ahead and increase my terminal window just a bit here. Save my file.
And let me go ahead and compile, make memory. Okay. All is well. It compiled without any error messages.
And now let me go ahead and run memory. Enter. All right. So that worked pretty well. Let's actually
be a little more explicit here. Just for good measure. Let me go ahead and print something out.
So print F, percent I for an integer. And let's make it this more explicit. You
imported percent I. And then comma X bracket 10. And what do I have to include to use print F?
Yeah. So standard I. Oh, so that's just quickly out that standard I. Oh, dot H save. All right.
Let me recompile this. Make memory. Enter. And now let me go ahead and do dot slash. Whoops, dot slash
memory. Huh. Feels like it's a correct program. And yet for a couple weeks now, we've been claiming
that don't do that. Like don't go beyond the boundaries of your array. So how do we reconcile this?
Feels like buggy code, or at least we've told you it's buggy code, and yet it's working.
Yeah. That's a good way of putting it. Let's still there's something that I'm on that.
Okay. But yeah, and I think if I heard you correctly, you said C doesn't scream if you go too far.
Yeah, okay. So that's a good way of putting it. Like you can get lucky in C. And you can do something
that is objectively pedagogically like technically wrong, but the computer's not going to crash.
It's not going to freeze because you just get lucky because often for performance reasons,
when you allocate space for 10 integers, you're actually going to get a chunk of memory back.
That's a little bigger than you need. It's just not safe to resume that it's bigger than you need.
But you might just get lucky and you might end up having more memory that you can technically
get away with touching or accessing or changing. And the computer's not going to notice,
but that's not safe because on someone else's macro PC, their computer might just be operating
a little bit differently than yours. And bam, that bug is going to bite them and not you.
And those are the hardest, most annoying bugs to chase down, as some of you might have experienced.
Right? It works on your computer, but not a friend's or vice versa.
These are the kinds of explanations for that. So Valgrant can help us track down even these most
subtle errors. Program seems to be working. Check 50, or tools like it might even assume that it's
working because it is printing the right thing. But let's take a look at what this program
Valgrant thinks. Let me increase the size of the terminal window here and go ahead and type in
Valgrant dot slash memory. So same program name, dot slash memory, but I'm pre fixing it with the
name Valgrant. All right, unfortunately Valgrant is really quite ugly and it prints out a whole bunch of
stuff here. So let's take a look. At the very top, you'll see all these numbers on the left and that's just an
unfortunate aesthetic, but we do see some useful information. Invalid read of size four and then it has
these cryptic looking letters and numbers. What are those? They're just addresses and hexadec
small. It doesn't really matter what they are, but Valgrant can tell us where the memory is that's
acting up suspiciously. You can then see next to that that Valgrant is pointing to function f on
memory dot c's 15th line. So that's perhaps helpful and then main on line eight because that's the
function that was called. So Valgrant is actually kind of nice in that it's showing us all of the
functions that you called from bottom up, much like the stack from last week. And so something's going wrong
online 15. And if we go back to that, let's see, line 15 was sure enough. I'm actually trying to
access that memory location. And frankly, I did an online 14 as well. So hopefully fixing one or
both of those will address this issue. So and notice here, um, this frankly just gets overwhelming
pretty quickly. And then, oh, 40 bytes in one block are definitely lost in loss record.
I mean, this is the problem with Valgrant. Honestly, it was written some years ago, not particularly
user friendly. But that's fine. We have a tool to address this. Let me go ahead and rerun
Valgrant with help 50 enter and see if we can't just assist with this. All right. So still the
same amount of black and white input, but down here now, help 50 is noticing, oh, I can help you with
an invalid right of size four. So it's still at the same location, but this time, or rather,
same file, memory.c, but line 14. And we proposed, looks like you're trying to modify four
bytes of memory that isn't yours. Question mark. Did you try to store something beyond the
bounds of an array? Take a closer look at line 14 of memory.c. So hopefully, even though Valgrant's
output is crazy as a taric, like at least that yellow output will point you toward, ah, line 14.
I'm indeed touching four bytes in integer that I shouldn't be. And so let's go ahead and fix this.
If I go into my program and I don't do this, let's change it to location nine and location nine
here and save, then let me go ahead and rerun Valgrant without help 50. All right, progress,
except, oh, nope, not progress. I skipped a step. Yeah, I didn't compile it. It's a little puzzled
why I saw the same thing. So now let's rerun Valgrant. And here, it seems to be better. So I don't
see that same error message up at the very top, like we did before, but notice here 40 bytes in one
blocks, okay, that was bad grammar in the program, but are definitely lost in loss record one of one.
So I still don't quite understand that. No big deal. Let's go ahead and run help 50 and see what
the second of two errors apparently is here. So here, it's highlighting those lines, 40 bytes
and one blocks are definitely lost and looks like your program leaked, 40 bytes of memory.
Did you forget to free memory that you allocated with Malach? Take a closer look at line 13 of
memory.c. So in this case, line 13 indeed has a call to Malach. So what's the fix for this problem?
Per help 50 or your own intuition. What do I have to add to this program?
Yeah, free. And where does that go? Right here. So we can free the memory.
Why would this be bad?
Exactly. We're freeing the memory, which is like saying the operating system. I don't need this anymore
and yet two lines later, we're using it again and again. So bad. We didn't do that mistake last week,
but you should only be freeing memory when literally you're ready to free it up and give it back,
which would probably be at the end of the program. So let me go ahead and resave this.
Open up my terminal window, recompile at this time, and now let me run Valdron one last time without
help 50 and still a little verbose, but zero errors from zero context. That sounds pretty good.
And moreover, it also explicitly says all heap blocks were freed and we're called at the heap
is that chunk of memory that we drew visually up here, which is where Malach takes memory from. So
this is kind of the mentality with which to have when approaching the correctest of your code.
Like it's one thing to run sample inputs or run the program like I did all looked well.
It's one thing to run tools like Chick 50, which we humans wrote, but we too are fallible certainly
and we might not think of anything. And thankfully, smart humans have made tools that at first glance
might be a little hard to use like debug 50 as his valgron now, but they ultimately help you get your
code 100% correct without you having to struggle visually over just staring at the screen.
And we see this a lot in office hours, honestly, a lot of students to their credit sort of reasoning
through staring at the screen just trying to understand what's going wrong, but they're not taking
any additional input other than the characters on the screen. You have so many tools that can feed you
more and more hints along the way, so do acquire those instincts. Any questions on this?
Yeah. If you had a main function that took arguments, would you run Valdron with those arguments
as well? Yes indeed. So valgron works just like debug 50, just like help 50. If you have
command line arguments, just run them as usual, but prefix your command with valgron or maybe even
help 50 valgron to help one with the other. Good question. Other thoughts. Yeah.
Good question. So at the end of the day, think about what's inside the computer, which is just
something like this. So physically, it's obviously still there. It's just being treated by the
operating system, macOS, Windows, Linux, whatever, as like a pool of memory. We keep drawing it as a
grid that looks a little something like this. So the operating systems job is to just keep track
of which of those squares is in use, thanks to Malach, and which has been freed. And so you can
think of it as having little check marks next to them saying, this is in use, this is in use,
these others are not in use, so they just go back on the so-called free list into that pool of
good question. If you take a higher level course on operating systems in fact, or see a 61 or
one 61 at Harvard, you'll actually build these kinds of things yourself and implement tools like
Malach yourself. Yeah. Good question. Why did we have to allocate memory? In this case,
we did not. This was purely as mentioned for demonstration purposes, if we had some program in
which we wanted to allocate some amount of memory, then this is how we might do it. However,
a cleaner way to do all of this would have been to say, hey, computer, give me 10 integers like this,
and not have to worry about memory management. And that's where we began in week one, just using
arrays on the stack, so to speak, not using Malach at all. So the point is only that once you start
using Malach and free and memory more generally, you take on more responsibilities than we did
in week one. Good question. Any others? All right. So turns out, there's one more tool in all
seriousness. This is a thing, DDB 50. So D bug 50 is an illusion to a very popular tool called
GDB 50. Good new debugger. It's an older tool that you won't use at the command line, but it's
what makes debug 50 work. Turns out there's a thing and there's an actual Wikipedia article that
you might have clicked on in my email last night. The called rubber duck debugging. And frankly,
you don't have to go as all out as excessive as we did here. But the purpose of this technique of
rubber duck debugging is to keep literally like a rubber duck on your shelf or on your desk.
And when you have a bug and you don't have the luxury of a teaching fellow or a roommate who
took CS50 or more technical friend who can help walk you through your code, literally start walking
through your code verbally, talking to the duck saying, well, online too, I'm declaring main and
online three, I'm allocating space for an array, and then online four, I'm calling, oh, that's
what I'm doing wrong. So if any of you have ever had that kind of moment with an office hours
or alone where you're either talking in your head or you're talking through your code to someone
else, and here she doesn't even have to respond, you just hear yourself saying the wrong thing
or having that aha moment, you can approximate that by just keeping one of these little guys
on your desk and have that conversation. And it's actually not as crazy sounding as it actually
is. It's that process of just talking through your code logically step by step in a way that you
can't necessarily do in your own mind, at least I can't, when you hear yourself say something wrong
or that didn't quite follow logically, bam, you can actually have that aha moment. So on the
way out today, by all means, take any one of these ducks that took quite a while long time for
occult until I out today, and we'll have more at office hours in the weeks to come if you
would like. So you might, some of you might recall such a duck from Currier House last year too,
which was a cousin of his as well. All right, so that is rubber duck debugging. Now last week,
we call that we began to take off training wheels. We used for a few weeks to see a city library,
and that's kind of in the past now. That was just a technique, a tool via which we could get
user input a little more pleasantly than if we actually started doing it with memory early on.
And we revealed last week that a string, quote, unquote, is just what underneath the hood
in C? What, say again? In array of characters, and even more specifically, it's a synonym
STRIng for what actual data type. Char star, as we've called it. So a char star is just the computer
scientist way of describing a pointer to a character, or rather the address of a character,
which is functionally equivalent to saying in an array of memory or sequence of memory,
but it's kind of the more precise, more technical way of describing it. And so now that we know
that we have char stars underneath the hood, where is all of that coming from? Well indeed,
it maps directly to that memory. We keep pointing out that something like this is inside of your
computer, and we can think of the memory as just being chunks of memory, all of whose bytes are
numbered zero on up to two gigabytes, or two billion, whatever the value might be. But of course,
last week, we pointed out that you think about this memory, not as being hardware per say,
but it's just being this pool of memory that's divided into different regions. The very top of
your computer's memory, so to speak, is what we call the text segment. And what goes in the text
segment of your computer's memory when you're running a program? Text is like a poor choice of words,
frankly, but what is it? A second? Not the file headers in this case. This isn't the context of
running a program. This is early saving a file. Not string literals here, but they're nearby actually
in memory. Functions closer. Yeah, the text segment of your computer's memory is where,
when you double click a program to run it or in Linux, when you do dot slash something to run it,
that's where the zero's in ones of your actual program, the machine code that we talked about in
week zero is just loaded into RAM. So recall from last week that, you know, anything physical in
this world, hard drive, solid state drives is slow. So those devices are slow, but RAM,
the stuff that we keep pulling up on the screen is relatively fast. If only because it has no
moving parts, it's purely electronic. So when you double click a program on your Mac or PC,
or do dots slash something in Linux, that is loading from a slow device, your hard drive,
where the data is stored long term into RAM or memory, where it can run much more quickly and
pleasurably in terms of performance. And so what does this actually mean for us? Well,
it's got to go somewhere. We just decided humans years ago that it's going to go at the top,
so to speak, of this chunk of memory. Below that, though, are the more dynamic regions of memory,
the stack and the heap. And we said this a moment ago, in last week as well, what goes on the heap,
or who uses the heap? Dynamic memory. Anytime you call malloc, you're asking the operating
system for memory from the so-called heap. Anytime you call free, you're sort of conceptually
putting it back. Like, it's not actually going anywhere. You're just marking it as available for
other functions and variables to use. The stack meanwhile is used for what?
Local variables. And any of your functions. So main typically takes a sliver of memory at the
bottom. If main calls another function, it gets a sliver of memory above that. If that function
calls one, it gets a sliver of memory above that. So they each have their own different
regions of memory. But of course, these arrows, both pointing at each other, doesn't seem
like such a good design. But the reality is bad things can happen. You can allocate so much
memory that bam, the stack overflows, the heap, or the heap overflows, the stack. Thus was
born websites like stack overflow. And the like, but that's just a reality. If you have a finite
amount of memory, at some point, something's going to break, or the computer's going to have to
say, mm, no more memory, you're going to have to quit some programs, or close some files,
so that was only to say that that's how the memory is laid out. And we started to explore
this by way of a few programs. This one here, it's a little dark here. This one here was a swap
function, and how it's even darker. It was a swap function that actually did swap two values,
A and B, but it didn't actually work in the way we intended. What was broken about this swap
function last week? Like I'm pretty sure it worked. And when our brave volunteer came up and
swapped the orange juices in the milk, that worked. So like the logic was correct. But the
program itself did not work. Why? Change the values of copy variables. Exactly. It changed
values in the copies of the variable. So we're called it, when main was the function we called
and it had two values, X and Y. That chunk of memory was here. That chunk of memory was here.
And it had like the numbers one and two. But when it called the swap function,
that got its own chunk of memory. So main was at the bottom. Swap was above that. It had its own
chunks of memory called A and B, which initially got the values one and two. One and two were
indeed successfully swap, but that had no effect on X and Y. So we fixed that with the newer
version of this program. Of course, it looked a lot more cryptic at first glance, but in English,
could someone just describe what it is that happens in this example that was more correct?
Like, what does this program do line by line? Yeah. Exactly. Instead of passing the values of
the variables, they were by copying them. It passes the addresses of those variables. So that's
like saying, I don't technically care where it is in memory, but I do need to know that it is
somewhere in memory. So instead of passing an X and the number one, let's suppose that X is at
location 100. Might go to example. It's actually the number 100 that's going to go there. And if
Y is at the location like 104, well, it's 104 that's going to go there, which are not the
values we want to swap. But those are sort of like little maps or breadcrumbs, if you will,
that lead us to the right location so that when we execute this code, what we're ultimately
swapping in those three lines is this and this and all along the way, we're called we're using
a temporary variable there that can be just thrown away after. So that's what pointers allow
us to do. And that's what allowed us to actually change values on the so-called stack,
even though even by calling another function. Are any questions then on where we left off last
time with the stack and with swap? Now, all right. So recall, we introduced Binky as well,
who lost his head at one point, but why? What went horribly, horribly, or why, with the scene
from last week's film from Stanford? Binky was doing everything correctly, right? Like moving
values 42 was successful and then yeah. So D reference something that wasn't funny to
actually, I wasn't pointing to the images. Exactly. He tried to D reference a pointer and address
that wasn't actually pointing to a valid address. Recall that this was the line in code in question
that was unlucky and bad. Star Y means go to the address in Y and do something to it,
set it equal to the number 13. But the problem was that in the code we looked at last week,
all we did at the start was say, hey, computer, give me a pointer to an int and call it x,
do the same and give it y, allocate space and point x at it, but we never did the same for y. So
whereas x contained last week, the address of an actual chunk of memory, thanks to Mallock,
what did y contain at that point in the story, the yellow line there?
What did y contain? What value? No. No, maybe, maybe, but it's not obvious because there's no
mention of null on the program. We might get lucky, null is just zero and sometimes we've seen
that zero or the default values in a program. So maybe, but I say maybe an imaging y.
Yeah, and it doesn't allocate, well, allocates not quite the right word that suggests you are
allocating actual memory. It's a garbage value. There's something there. My Mac has been running
for a few hours and your Macs and PCs and phones are probably running all day long or certainly
when the lid is up. And so your memory is getting used and unused and used, like lots of stuff
is going on. So your computer is not filled with like all zeros or all ones. If you look at it
at some random point in the day, it's filled with like a bunches and bunches of zeros and ones
from previous programs that you quit long ago. Windows, you have in the background and the like.
So the short of it is, when you're running a program for the first time that's been running
now for some time, it's going to get messy. That big rectangle of memory is going to have
some ones over here, some zeros over here and vice versa. So their garbage values because those
bytes have some values in them. You just don't necessarily know what they are. So the point is you
should never, ever, dereference a pointer that you have not set yourself, maybe you'll crash, maybe
it won't crash. Valgrin can help you find these things, put sometimes, but it's just not a safe
operation. All right, and lastly, the last thing we introduced last week, which will be the stepping
stone for what problems we'll solve this week was struck. So struck is kind of cool. In that,
you can design your own custom data structures. See is pretty limited out of the box. So to speak,
you only have chars and bulls and floats and inks and doubles and longs and string, well,
we don't even have strings per se. So it doesn't really come with many features, like a lot of
languages do, like Python, which we'll see in a few weeks. So with struck and see,
you have the ability to solve some problems of your own. For instance, with the struck,
we can actually start to implement our own features, our own data types. For instance, let me go
up here, and let me go ahead and create a file called, say, a student, a rather just struck.h.
So we call that dot h as a header file. Thus far, you have used header files that other people
made, like CS50 dot h, and standard IO dot h, and standard lid about h, but you can make your own.
header files are just files that typically contain code that you want to share across multiple
programs. And we'll see more of this in time. So let me go ahead and just save this file and
suppose that I want to represent a student in memory. A student, of course, is probably going to
have what, like, a, for instance, how about a string for their name, a string for their dorm,
but string is kind of two weeks ago. Let's call this char star, and let's call name char star.
And so you might want to associate, like multiple pieces of data with a students, right? And
you don't want to have multiple variables per se. It'd be nice to kind of encapsulate these together,
and recall at the very end of last week we saw this feature where you can define your own type
with type death that is a structure itself, and you can give it a name. So in short, simply by
executing these lines of code, you have just created your own custom data type. It's now called
student, and every student in the world shall have per this code, a name, and a dorm associated
with them. Now, why is this useful? Well, the program we looked at at the very end of last time
looked a little something like this. In struct zero, dot c, we had the following. I first
allocated some amount of space for students. I asked the user what's the enrollment in the class,
or what not? That gives us an int. And then we allocated an array of type student called students.
plural. This was an alternative for call to doing something like this. String names enrollment,
and string dorms enrollment, which would work. You could have two separate arrays, and you just have
to remember that name zero and dorm zero is the same human, but why do that if you can keep
things together? So with structs, we were able to do this. Give me this many student structures
and call the whole array students. And the only new syntax we introduced to satisfy this goal was
what operator. The dot. Yeah, so in the past, recall from like week two, we introduced arrays,
and arrays allow you to square bracket notation. So that is no different from a couple of weeks back.
But if your array is not storing just integers, or chars, or floats, or whatever,
it's actually storing a structure, like a student, you can get at that student's name by literally
just saying dot name, and you can get at their dorm by doing dot dorm, and then everything else is the same.
This is what's called encapsulation, and it's kind of like a fundamental principle of programming, where
if you have some real world entity, like a student, and you want to represent students with code,
yeah, you could have a bunch of arrays that all have called denames, dorms, emails, phone numbers,
but that's just just messy. You can instead encapsulate all of that related information about a
student into one data structure so that now you have per week zero in abstraction. Like a student
is an abstraction, and if we break that abstraction, what is a student actually? Not in the real world,
but in our code world here. Student is an abstraction. It's a useful word, all of us can kind of
agree mean something, but technically what does it apparently mean? A student is actually a name
in a dorm, which really kind of is diminutive to everyone in this room, but we've distilled it in code
to just those two values. So there we have encapsulation, you're kind of encapsulating together
multiple values, and you're abstracting away, just have a more useful term, because no one's going to
want to talk in terms of lines of code to describe anything. So, same topic is in the past. So,
now we have the ability to come up with our own custom data structures it seems that we can store
anything inside of them that we want. So let's now see how poorly we've been designing some things
for the past few weeks. So it turns out that much of the code, hopefully we've been writing
in recent weeks, has been correct, but we've been not necessarily designing solutions in the
best way. We're called that when we have this chunk of memory, we've typically treated it as
at most in array. So just a continuous chunk of memory, and thanks to this very simple mental
model, do we get strings, do we get arrays of students now, but arrays aren't necessarily the
best data structure in the world. Like, what is a downside of an array if you've encountered
ones of us far? In see, what's a downside of an array? Yeah.
Tanner cannot. To cannot. That is true. So in see, you cannot mix data types inside of an array.
They must all be in, they must all be charged, they must all be students. It's a bit of a
white like, as technically you can have something called a void star, and you can actually map,
but yes, that is true, though strictly speaking, cannot mix data types, though frankly even
though other languages let you do that, it's not necessarily the best design decisions. But sure,
other thoughts. Yeah. The size cannot change. Let's focus on that one, because that's sort of
more even more constraining it with seam. So if you want an array for say two values, what do you do?
Well, you can do something like int x, bracket two, semicolon, and what is that actually give you
inside of your computer's memory? It gives you some chunk that will draw as a rectangle,
with this is location zero, this is location one. Suppose that, oh, a few minutes later,
you change your mind. Oh, darn, I just took a, I want to type in a third value, or I want to add
another student to the array. Where do you put that? Well, you don't. If you want to add a third
value to an array of size two, what's your only option in C? Make a new array. You make a new array.
So literally, and if this array had the number like 42, and this had the number 13, the only way
to add a third number is to allocate a second array, copy the values into the same locations 42,
13, and then we'll add another value 50, and then so that you're not using a twice as much space
almost permanently, now you can sort of free somehow or stop using that chunk of memory. So that's
fine. It's correct what we just did, but what's the running time of that process?
Call a couple of weeks ago, we started talking about efficiency and design. What's the running time?
Of resizing an array. Say again? Two long, fair. But let's be more precise. Biggo of,
Biggo of what? And what's n? Okay, true, but what is n represents?
Yeah, so you don't actually have to not know, like it's just a general answer in this case,
however long the array is, call it n, it is that many steps to resize it into that plus one.
Okay, technically it's Biggo of n plus one, but recall in our discussion of Biggo notation,
we just ignore the smaller terms, the plus ones, the divided by two, the plus n. We focus only
on the most powerful term in the expression, which is just n here. So yes, if you have an array of
size two and you resize it into an array of size three or really n plus one, that's going to take
me roughly n steps, technically n plus one steps, but n steps, Ergo, Biggo of n. So it's a linear
process. So possible, but not necessarily the fastest thing because we literally have to move
all those damn values around. So what would be better than this? And if you've programmed before,
you might have the right instincts already. How do we solve this problem?
Oh, yeah. We're going to allocate more memory at the end of the array.
Reallocate more memory at the end of the array. So it turns out C does have a function called
realloc, perfectly if not obviously named, that reallocates memory. And if you pass it,
the address of a chunk of memory you've allocated, and the operating system notices, oh yeah,
you've got lucky. I've got more memory at the end of this array. It will then allocate that
additional RAM for you and let you use it. Or worst case, if there's nothing available at
the end of the array in memory, because it's being used by something else in your program,
that's fine. Realloc will take take on the responsibility of creating another array,
somewhere in memory, copying all of that data for you into it, and returning the address of
that new chunk of memory. Unfortunately, that's still linear. Yeah. This is all being done in the heap,
malloc, and realloc, and free all operate on the heap. Yes. So that is a solution, but it doesn't
really speak to the efficiency. Yeah. Yeah. What is a linked list? Go ahead. Uh,
when you have an element at points like different elements. Okay, points to other element. Yes.
So let me speak to like, what's the fundamental issue here? The fundamental problem is that
like much like painting yourself into a corner, uh, so to speak, as the cliche goes,
within array, you're deciding in advance how big the data structure is and committing to it.
But what if you just do the opposite? Don't do that. If you want initially room for just one
value, say one integer, only ask the computer for that. Give me space for one integer, and I'll
put my number 42 in here. And then if and only if you want a second integer, do you ask the
computer for a second integer? And so the computer, as by a malloc or whatnot, will give you
another one, like the number 13. And if you want a third, just ask the same question of the
operating system, each time just getting back one chunk of memory. But there's a fundamental
gacha here. There's always a trade-off. So yes, this is possible. You can call malloc three times,
each time asking for a chunk of memory of size one instead of size three, for instance.
But what's the price you pay or what problem do we still need to solve? Yeah.
Yeah, they're not being stored next to each other. So even though I can think of this as being the
first element, the second and the third, you do not have in this story random access to elements.
And random access, Ergo random access memory, or RAM, just means that arithmetic, like mathematically,
you can jump to location zero, location one, location two randomly, or in constant time. Just
instantly, because if they're all back to back to back, all you have to do is like add one or
add four or whatever to the address and you're there. But the problem is, if you're calling malloc
again and again and again, there's no guarantee that these things are even going to be
proximal to one another. These second chunks of memory might end up, if this is the big chunk of
memory we've been talking about, where the heaps up but here and the stacks down here, 42 might end
up over here, the next chunk of memory 50 might end up over here, the third chunk might end up over here.
So there you can't just jump from location zero to one to two, because you have to somehow
remember where location zero and one and two are. So how do we solve this? Even if you haven't
programmed before, like what would a solution be here? Okay, somehow storing the addresses of
all right. So let's just suppose for the sake of discussion that this chunk of memory ended up at
location 100, this one ended up at like one hundred fifty, this one ended up at like four hundred and
seventy five, whatever, those values are. It would seem that somehow or other I need to remember three
values, a hundred one fifty and four seventy five. So where can I store that? Well it turns out I could
be a little clever but a little greedy. I could say to malloc, you know what? Every time I call
all you don't just give me space for an integer, give me space for an integer plus the address
of another integer. So if you've ever kind of seen like popcorn strong together on a string or
any kind of chain length fence where one link is linking to another, we could create the equivalent
of whoops, not that. We could create the equivalent of this kind of picture where each of these squares
or nodes will start calling them kind of links graphically to the other. Well we've seen these
links or these pointers, literally arrows that are pointing implemented in code, an arrow or a
pointer is just an address. So you know what? We should just ask malloc, not for enough space for just
the number 42, we should instead ask for a little more memory in each of these squares, making them
pictorially rectangles now, so that now, yes, we do have these arrows conceptually pointing from one
location to another, but what values do I actually want to put in these new additional boxes?
The addresses of the next. So they're like little breadcrums. So in this box here,
associated with the first value, should be the address of my second value, 475,
associated with my second value here per the arrow and let me draw the arrow from the right place.
From the arrow should be the address 150 because that's the last, and then from this extra box,
what should I put there? Yeah. Yeah, so probably the equivalent of slash arrow,
which in the world of pointers recall is N-U-L-L. So just a special value that means that's it,
this is the end of the line. That still leaves us with room to add a fourth value and point to it,
but it for now signifies very clearly to us, there's nothing actually there. So what did we just do?
We created a list of values, 50, oh sorry, 42, 50, 13, but we linked them together. First,
pictorially we just arrows like any human might with a piece of chalk, but technically in code,
we could do this by just storing addresses in each of these places. So just to be clear,
then what might this actually translate to in code? Well, what if I proposed this? In code,
we might do something like this. If we want to store an integer, we're of course going to need
to store like int and we'll call it. And we'll represent 42 or 50 or 13. But if we want to create
a data structure, we might want to start giving this data structure a name. I called it a
moment ago node, which is a CS term for like a node in a linked list, so to speak, and it looks
like this. So type-deft means give me my own type. Struct means make it a structure, like a student
was, and then node was going to be the name of this thing. And I'll explain in a moment why I have
the word node twice this time. But I have left room on the board for just one more line.
In addition to an int called n or whatever, I need to somehow represent in code the additional
memory that I want Malok to give me for the address. So first of all, these are addresses of
what data types, each of those three new boxes. They're the addresses of integers.
And that point in the story. But technically, what is this box really pointing to?
Is it pointing specifically to the int? It's pointing to that whole chunk of memory if you will.
So if you start thinking about each of these rectangles as being a node, and each of the arrows
as pointing to another node, we need to somehow express, I need to somehow store a pointer to a node.
In other words, each of these arrows needs to point to another node. And in code, we could
say this, right? Like let's give it a name instead of n, which is the number, let's call it next.
So next, shall be the name of this field that points to the next node in memory. And node star,
what does that mean in English, if you will? So again, pointing to an address, right? It looks
different. Node is a new word today, and that's fine. But node star just means a pointer to a node.
The address of a node, and it turns out that this is a custom structure, so we actually have to say this.
But it's the same principle, even though things are kind of escalating quickly here,
we just need two values, an int, and then a pointer to another thing. That other thing is going to be
another node, and we're just using node, frankly, to encapsulate two values, an int, and a pointer.
And the way you express in C, albeit somewhat cryptically, a pointer of one of those arrows,
is you say, give me a variable called next, have it point two, a structure called node,
or rather, have it be the address of a structure of type node. Yeah.
Good question. So this feels like a circular kind of definition, because I'm defining a node,
and yet inside of a node is a node, that is okay, because of the star. It is necessary in C.
I'll remember that C always is kind of red top to bottom. So accordingly, this very first line of code here,
typed F struck node, at that point in the story, when client has read that line,
it knows that a phrase struck node exists. Exactly. Exactly. We didn't need to do this with
student, because there were no pointers involved to other students, but yes, in this case.
So in short, this tells client, hey, client, give me a structure called node, and then in here,
we say, hey, client, each of those nodes shall have two things, an integer called N,
and a pointer to another one of these data structures of type node, and call the whole thing node.
It's a bit of a mouthful, but all this is the following. Let me go ahead and erase all of this.
All this data type is, if we get rid of the picture we drew on the fly there, is this says,
hey, client, give me a data structure that pictorially looks like this. It's divided into two parts.
The first part is called N. The second type is called next. This data type is of type N,
this is a pointer to another such node. And that's it. Even though the code looks complex,
the idea is exactly that. Yeah. Good question. When the reason is for it, as just came up a
moment ago, it's clang and see in general, or kind of dumb it, that they just read code top to bottom.
And the problem is you have to declare the name of this structure as being a struck node before.
You actually use it. It's similar in spirit to our discussion of prototypes,
why functions need to be mentioned way up top. This just says the clang,
give me a type called struck node. You don't know what it's going to look like yet,
but I'll finish my thought later. And then in here, we're just telling clang inside of that node
should be an integer, as well as a pointer to the very type of thing. I'm in the middle of the
finding. But if I had left off the word node up there and just said struck, you couldn't do that,
because it hasn't seen the word N-O-D-E yet. That's all. Other questions?
All right. So, if I now have a data structure called node, I can use it to kind of stitch
together these link lists. And maybe just to vary things a little bit and to start giving away
some ducks, would folks be comfortable with a volunteering to solve a problem here? Yeah. Okay,
come on up. One, two, sure. Or you can take a duck and run. Okay, one, two,
how about three? Come on over here. Three. So, if you want to be our first pointer, come and
you can be number five, come on over here. Want to be number nine. And one more, one more volunteer,
come on over here. Yeah. All right. So, how many of you over here?
Okay, 17. All right. So, if you'd like to just so we pick this up for those following along
home, if you would like to just say hello to the audience. Hi, I'm Andrea. Hi, I'm Cumm.
Wonderful. Okay, and if you wouldn't mind I was just taking a big step back over the ducks,
just so that we're a little farther back. Let's go ahead and do this. If you're our first
pointer, if you could come over here, for instance, and just stand outside the ducks. And if you
guys could come a little over here in front, it's still fine. So, here we have the making of a
length list. And what's your name again? Cummie. Cummie is our first pointer, if you will.
By a Cummie's variable, are we just going to keep track of the first elements of the length list.
So, if you could with your left hand, represent first, just point over at what was your name again?
Andrea. So, Andrea is the number nine. If you could use your left hand to point at number five,
and if you could use your left, to point at number 17, and your left hand to just point at
null, which we'll just call the ground. So, you don't want to just point it randomly, because that
would be like following a bogus pointer. So, here means null. All right. So, this is a length list.
All you need to store a length list of three values is three nodes inside of which are three
integers, and their left hand represents that next pointer. So, to speak, Cummie's a little different
in that she's not holding a value. She's not holding an integer, rather holding just the name
of the variable first. So, you're the only one that's different here fundamentally.
So, suppose I want to insert like the number 20. Could someone volunteer to be number 20?
Okay, come on up. All right. And what's your name? Eric. Eric. You're the number 20. And Eric,
actually, let's see. Let's see. Let's actually, can we do this? Let me give, let me make this a little
more different. Okay, that never happened. Okay. Eric, give me that please. I want to insert
Eric as number five. So, Eric, I'm keeping this list sorted. So, where obviously you're going to go?
All right. But before you do that, let's just consider what this looks like in code. In code
presumably, we have a mallocked Eric from the audience. I've given him a value n of number five.
And his left hand is like, it's garbage value right now. Because it's not pointing to anything
specific. So, he's got two values in integer and a left hand representing the next pointer.
If the goal is to put Eric in sorted order, what should our steps be? Like who's hands should
point where and in what order? Yeah, give us one step. Okay. So, you should point at number nine.
So, you should point at number nine, which is equivalent to saying point at whatever first or
pony is pointing at. So, go ahead and do that. All right. Next. What's the next step? Someone else?
Someone else. Almost there. Yeah. First, should point to five. Okay. So, first, your
combing could you point to five. And that's fine. You don't even have to move, right? This is the
beauty of a linked list. It doesn't matter where you are in memory. It's the whole beauty of these
pointers where you can literally point at that other location. It's not an array where they need to
be standing back to back to back. They can be pointing anywhere. All right. Let's go ahead and insert one
more who wants to be say 55. Big value. Yeah. Come on down. All right. What's your name?
Keon. Okay. So, come on over. So, we've just melocked Keon from the audience. I've given him his
end value of 55. His left hand is just some garbage value right now. How do we insert Keon
in the right order? Where is he obviously supposed to go? In sorted order, he obviously belongs
at the end. But here is the catch with a linked list. Just like when we've discussed searching
and sorting in the past, the computer is pretty blind to all but just one value. And the linked
list at the moment, I don't know that these three, these four exist. All I know really is that
Komi exists. Because via this first pointer is the only access to the rest of the elements.
And so what's cool about a linked list, but perhaps not obvious, is that you only the most important
value is the first. Because from the first value, you can get to everyone else. It's not useful.
Excuse me for me to remember Andrea. Andrea alone. Because if I do, I've just lost track of
Komi and more importantly, because of his number, Eric. So, all I have to do really is remember Komi.
So, if the goal now is to insert number 55, what steps should come first? No pun intended.
Okay, finding the first space. So, I'm going to start at Komi and I'm going to follow this
pointer number five. Does 55 belong here? No, so I'm going to follow this pointer and get to
Andrea. Does 55 belong here? No, I'm going to follow her pointer and 22 does a belong here? No,
I follow this pointer, 26. No, but you have a free hand that turns out. So, what steps should
come next? We could have you point at 55 and now done. So, relatively simple, but what was the
running time of this? Yeah, it's big all of end. It's linear because I had to start at the
beginning. Even though we humans have the luxury of just eyeballing and saying, oh, obviously,
he belongs way at the end. Not in code. Like we have to start at the beginning to reverse
the whole darn list until we get linear lead to the very end and now we're done. Let's try one last
one. How about 20? Yeah, come on down. What's your name? James. All right, James. All right,
so we just maloch James given him the number 20. If he obviously belongs roughly in the middle,
what's the first step? Sorry. All right, so we start with combing again. All right, first. Okay,
five. Do you belong here? No. Let me follow the link. Okay, nine. Do you belong here? No. Do you belong
22? Oh, but what did I just do wrong? I went too far. At least in this story, like I literally
Andrea's behind me now. Okay, so can I follow the pointer backwards? You can't. Like in
every picture we've drawn and every example we've done with an address, we only have the address of
the next pointer. We don't have what's called a doubly linked list, at least in this story,
where I can just turn around. So that was a bug. So I need to start over instead first. Okay, five.
Okay, 19. What I really need in code ultimately is to kind of peek ahead and not actually move
not that far. Just a 22. peek ahead at 22 and really it's, oh, that's going to be too far.
This is not yet far enough. So let's go ahead and bring James over. Actually, you can stay there physically,
but what step happens to happen first? I know now he belongs in here. You want to point it
him? Okay, point it him? Well, let's do that just because it is incorrect. That's fine.
Okay, Andrea proposed that we point here, but you just broke the whole linked list. Why?
Because there's a thing to point at, right? No one is remembering who was your name again.
No one remembered where Keon was, so can't do that. Your left hand has to stay there. So what
step should happen first instead? James should point at whatever Andrea is pointing at perhaps.
So a little redundantly at the moment, just like before. Okay, now what happens next? That's step one.
Now you can point at him. Okay, good do that. All right. And so now this looks like a complete mess,
but if we know that call me his first, we can follow these bread crumbs to Eric and then to Andrea
and then to James and then the rest of our list, step by step by step. So it's a huge amount of like
logic now, but what problem have we solved? And I think we identified it over here earlier. What was the
problem first and foremost with the raise? You have to decide on their size and advance. And once you do that,
if you want to add an additional element, you have to greaseize the whole darn thing, which is expensive,
because you have to move everyone around. Now frankly, I'm being a little greedy here and every time
we've inserted these new elements, I've been keeping them in sorted order. So it would seem that if you
insert things in sorted order, big all of then, every time, because in the worst case, the new element might
end up all the way at the end. But what if we relax that constraint? What if I'm not so uptight and need
everything nice and like orderly and sorted? What if I just want to keep growing the list in any random order?
And I allocate the number 34, where and I'll play the number 34. Malach 34, where is the quickest place for me to go?
Yeah, 0.5. Okay, 0.5. And then, call me if you could point to me. Done. One, well, two steps. All right.
So, post now, Malach 17, with someone else, the pretend is right here. Where's the best place for 17 to go?
Right after Call Me 2. So, now, Call Me can point at 17. 17 can point at me. I can point at Eric and so forth.
And that's two steps again. Two steps. If it's the same number of steps every time, we call that constant time. And we write it as big
goal of 1. And so here, too, it's just a tradeoff. If you want really fast insertions, don't worry about sorting.
Just put them at the beginning and deal with it later. If you want dynamic,
resizeability, don't use an array, use a link list, and just keep allocating more and more as you go without
wasting a huge amount of space, too. Which notice, that's another big problem within array. If you
overallocate space and only use part of it, you're just wasting space. So, there's no one solution here,
but we do now have the capabilities, things to the structure and point pointers to stitch together.
If you will these new problems. Yes, please. And who am I in this story?
Okay, absolutely. So, another very reasonable idea would be, well, why don't I just put the new
ones at the end? That's fine if I keep track of who is at the end. The problem is that the
moment in the story, and we'll ultimately see this in code, I'm only remembering Call Me and from
Call Me in my getting everywhere else. I could have another pointer, second pointer, and literally
call it last, that's equivalent to you, or that's always pointing at you. I just need then two pointers,
one literally called first one, literally called last, that's fine. That's a nice optimization,
if I want to throw all of the elements at the end. And frankly, I could get really fancy and
to solve the problem that Andrea cited earlier, if I store not just an end, an end, and a pointer,
but instead an end and two pointers, I can even have each of these guys pointing with their left
end right hands in a doubly linked list, so as to solve the problem Andrea identified, which was,
if I go too far, no big deal, take one step back. I don't have to think as hard about that logic,
so there are two to trade off. Let's go ahead and take a five minute break. I'll turn on some
music grab a duck now if you'd like and we'll return with some fancier data structure still. Thanks.
All right, we're back. So, let's now translate some of these ideas to code, so that we can actually
solve this problem a little more concretely than just having humans pointing at each other. So,
for instance, let's try to distill everything we've been talking about into just a goal in code
of storing a list of numbers. I would propose that we can take like three passes at this problem.
The first would be, let's just decide in advance how many numbers we want to store,
so we don't have to deal with all this complexity with the pointing and the pointers and all this,
and just hard code that value somehow and just stop when the user has inputted that many numbers
and no more. Two, we can improve upon that and at least let the user dynamically resize
their array so that if they decide to input more numbers than we intend, it's going to grow and deal
with that. Of course, arrays are not necessarily ideal because we have to do all that
damn copying from old to new. That's linear time. It seems smartest to get some version three,
which is actually going to use a link list, so we're just more modestly allocating space for
another number and another number or really a node, one number at a time. So, let me go ahead and
start as follows. I'm going to go ahead and include some familiar lines in list 0.c of the CS50
library just to make it easy to get some user input for this and standardio.h for printf. And
let me go ahead and declare my main function as usual and then in here let's do a couple things.
First let's ask the user for the capacity of the array that we're going to use. Or rather let's do
this first. Let me first rewind and say you know what, int numbers 50. Well, that's going to be
annoying to type in 50 numbers. We're going to give the user two numbers to at first that here
she can type in. Next let's go ahead and prompt the user for those numbers. So let me go ahead and
say let's do this. Let's at least clean this up a little bit so that we can reuse this value.
So we don't have a magic number as just came up in discussion actually. So while,
I'm not going to do that. Let me fix this. This will be my capacity of size two and that's
going to give me that size. And then I'm going to keep track of how many integers I've prompted
the user for so far. So initially the size of this structure is going to be zero but it's capacity.
So to speak is two. So size means how many things are in it. Capacity means how many things
can be in it. And while the size of this structure is less than its capacity, let's go ahead
and get some inputs from the user. Let's go ahead and ask them for a number using our old friend
get int and just say give me a number. And then let me go ahead and insert the number that they type
in into this array at location size like this and then do size plus plus. I think you know it's
pretty quickly, but let's consider what I just did. I initialized size to zero because there's
nothing in it initially. Then I say while size is less than the capacity of the whole thing and
capacity is two by default, go ahead and do the following. Give me an int from the user.
So int number gets int. Then put at location size in my numbers array whatever the human typed in
number and then increment size with plus plus. All right. So on the first iteration size is zero.
So numbers bracket zero gets the first number. Numbers bracket one gets the second number.
Then size equals capacity so it stops logically. Any questions on the logic of this code?
All right. So once we have those numbers, let's just do something simple. Like for int
I get zero. I is less than the actual size. I plus plus. Let's just go ahead and print out
the number you inputted percent i backslash n and type out numbers bracket i. All right.
So if I made no typos in list zero dot c, then I'm going to go ahead and do dot slash zero dot c.
I'm going to be prompted for a couple numbers. Let's go ahead and do one, two, you input one,
you input a two. All right. So not bad. But this is bad design arguably. Why?
Just find one fault. It's correct. But bad design.
Repetitive because I'm using a couple loop sure and it's fundamentally it's very limited in
functionality. Why? Like how useful is this program? It's hard code in a two. So let's at least
improve upon this a little bit and get rid of this hard coding. Why don't I at least ask the
user for something like this? Well, instead of just declaring the capacity, let me go ahead and say,
you know what? Let's just replace the two. Get in and just say capacity for instance. All right.
And now if I do this, I'm going to be prompted. So make list zero dot slash the list zero.
One, the capacity will be two, one, two. That's nice. But if I run it again and give it a
capacity of three, one, two, three, I get more capacity. So that's nice. It's an improvement for sure.
There is a bug here. Before I test it further, can anyone identify a bug or somehow crash this?
Oh, go ahead. If you don't input an integer. If I don't input an integer or was that the
same common up here? Oh, no, because I'm re-running it in each time. I don't need to
worry about previous runs of the, of the, of the program. Yeah. In the four loop, he just goes
like one, two, three, it doesn't actually care what you input it. I don't want two, three. Well,
I am iterating up to size, which could be capacity because now they do end up being equivalent
because I'm filling the whole thing. But let's try this. If you don't type in a value. So let me
go ahead and rerun this. My capacity shall be duck. All right. So it did handle that because
get in does that for me. But I bet I can still break this. Oh, yeah. Let's always try something
negative. Oh, okay. So bad. Like cryptic looking message, but clearly has to do with the negative
value. So I should probably be a little smarter about this. And recall from like week one, we did
do this with Mario. You might have done this. So I could do something like do while capacity is less
than one. I could go ahead and say capacity gets get in capacity. So just a little bit of air
checking to close the bug that you identified. All right. So let's go ahead and recompile this.
Make list zero. Whoops. Make, we're going to start hearing that a lot today, aren't we?
Make list zero dot slash list zero capacity will be three one two three. Now capacity will be
negative one doesn't allow it capacity zero doesn't allow it capacity one. Yes. So non-exhaustively
I've tested it feels like it's in better shape. Okay. But this program while correct and while
more featureful still has this fundamental limit. Wouldn't it be nice to allow the user to just
keep typing numbers as many as they want and then quit once they're done inputting numbers. Right.
If you're making a program to compute someone's GPA different students might have taken different
courses. You don't want to have them to type in all 32 courses if they're younger and haven't
taken all those courses like there's a lot of scenarios where you don't know in advance how many
numbers the user wants to provide but you want to support few numbers, lots of numbers or beyond.
So let's do this in a second version in list one dot C. Let me go ahead and improve upon that
example as follows. First let me give my familiar friends up here. CS50 dot H for IO standard IO dot H and
then in here int main void and then let's start writing this. So now I don't know when
advance necessarily how many numbers the user is going to type in. Like the goal is I want them to
be able to type in a number another number another number another number and then hit the equivalent
of like Q for quit when they're done inputting numbers. Like I don't want them to have to think
about an advance how many numbers it is they're inputting. But how do I do that? Like I can't just
come up with an array called numbers and say 50 because if the user wants to type in 51 numbers
I'm going to have to resize that. But how do you resize an array? How do you resize an array?
What's that? You can't, right? We've never seen an instance where you've resized an array.
We talked about it on the blackboard here while just like allocate a bigger one and copy everything
and we did identify realloc but you can't actually use realloc on an array. Realloc actually
accepts an address of a chunk of memory that you want to grow or shrink. So it turns out if we
now start to harness this sort of fundamental definition of what an array is a chunk of memory
we can actually build arrays ourselves. If an array is just a chunk of memory or more specifically
it's like the address of the first byte of a chunk of memory it would seem that I could declare my
array not with square brackets as we've been doing for weeks but I can say you know what numbers
really is it's really just a pointer and I'm initially going to initialize it to null because
there is no array but now I have the ability to point that pointer at any chunk of memory small
or big. Now why is this useful? Well initially let me claim that my capacity is zero because
nothing's going on yet I haven't called Malach or anything and initially my size is zero because
there's nothing in the array and it doesn't even have a size but let me just do this forever.
Much like in scratch we had the forever block you can use while true and see to just say keep doing
this until the user breaks out of this and let me go ahead and ask the user give me a number get in
and just ask them for a number and then we just need a place to put that. So where do I put this
number? Well do I have at the moment any place to put the number? No and technically speaking how
do you express that? Like in pseudocode I want to say if no place for number but technically I could
do this. Well if the size of the array at the moment equals its capacity that feels like a
lower level way of expressing the same thing if whatever the capacity is if the size is the same
there is no more room and that simple statement also covers the scenario where the capacity is zero
the size is therefore zero so it's the same question either we have no space at all or we have
some space but we've used it all size equals equals capacity. So if the size equals capacity or put
more casually if I don't have enough space what do I want to do intuitively? Allocate more
memory and it turns out you proposed or someone proposed earlier reallocating memory we can use
this function for the very first time. Let me go ahead and say this I the catch with realloc is you
have to be smart about it because it returns a pointer so let me propose this code first first
just give me a temporary variable call it temp that's going to store the following actually no let me
start this more simply let me go ahead and say numbers should be reallocated please realloc
by passing itself in and this time give me the size of an int times how many ints do I want this time
how many numbers did the human just input presumably just one right because literally we've only
called get it once in this story so whatever the size of this array is now we need to increase it by one
that's all so this line of code here is saying hey hey computer go ahead and out reallocate
this array from whatever its current size is and make it this size instead the size of whatever it is
plus one times the size of an int because that's what we're trying to store as an int so we have to do that
multiplication and realloc is mentioned earlier is pretty fancy it's going to take in a pointer whatever
chunk of memory you've already allocated and it's going to then reallocate a bigger chunk of memory hopefully
what's going to happen is this if your chunk of memory initially looks like this it's going to hopefully
notice all this memory's free let me just give you back the same address so if this is address 100
and you get lucky and this address is also available the realloc functions going to remember that
for the operating system it's going to return the number 100 again and you're good to go you can safely
touch memory here or if this is in use already this chunk of memory and therefore yeah we can't
fit another bite there because some other code you wrote is using that memory but there is twice
as much memory available down here what realloc will do is if you've stored the number 50 it will
handle the process of copying 50 to the new value this is going to be left as a garbage value
for you to deal with and it's going to return to you the address of the new chunk of memory having
done the copying for you so even though it's technically reallocating the array it's not
necessarily just going to grow it it might relocate it in memory to a bigger chunk and then give you
the new address of that memory question is that process really preferable to just create the
extra memory in this place and then saving the time and it's a really good question honestly
we could avoid this problem slightly by just doing you know what let's give me at least
go ahead and give me at least the size of an ant times I don't know most humans are not going to
type in more than 50 numbers let's just pick 50 so you could do this and that would indeed save
you time because the approach I'm currently taking is pretty inefficient because every damn time I
the user calls get ant and gives an ant we're resizing resizing resizing very expensive
as to what the best value is though 50 should be 25 should it be a thousand I'm either going to
underbets or overbet and it just depends on you to decide which of those is the worst decisions
in terms of programs does it also pretty expensive to have memory that they're not using
for generally is it usually more okay good question in programs you're writing is it better to have
more memory than you're using or should you really be conservative these days memory is cheap we all
have gigabytes of memory and so wasting 50 bytes or 200 bytes times 4 of memory not a big deal
like just get the job done quickly and easily but in resource constrained devices maybe things like
phones or little internet of things style devices that have a lot fewer resources you don't really
want to go wasting bytes but honestly the CPUs the brains and our computers are so darn fast these
even if you're calling mallock 10 times a thousand times it's happening so darn fast that the
human doesn't even notice so there too these are what are called design decisions and these are
the kinds of things that in the real world you might actually debate with someone at a whiteboard
saying no this is stupid because of this reason or here she might push back for other reasons and
no one's necessarily right the whole goal is to just have that thought process first so you're at
least confident in what you chose yeah when you were calling F read you were by definition
in the forensics problem said reading bytes from disk into memory when you were calling F right you
were copying bytes from memory back to disk if that answers the question okay other questions yeah
why do i say size plus one in line 16 because the whole goal is to make room in this array for the
newly imported number that the human just typed in and so whatever the current size of the array is
I clearly need one more space it does because it does repeat on and on because at the moment
I'm inside of this while loop so we do need to ask a question when is the human done inputting
and it turns out and this is non obvious and it's not the best user experience on a keyboard for the
human but we can actually detect the following sentiment if user is done inputting numbers then
let's go ahead and break but the question then is how do you express that pseudo code well you could
in some programs maybe type q for quit but is that going to work when using get in could we detect q
why not
exactly because get in immediately prompts you for another in so because of the way we design this is 50
library you can't detect q or you can't have the human type quit unless you don't use get in you instead use
we could use get string and then every time the human types in a number we could use like a to i
to convert it to an int but if the human types in q or q u i t a string also we could just have
an if condition with string compare and quit but honestly then you're reinventing like get in
so trade off anyhow a common way to work around this would be you know that control c quits programs
perhaps cancels out of your program there's another popular keystroke control d that sends what's
called end of file it simulates the end of a file it simulates the end of the human's input so it's kind of
like the period at the end of an English sentence so if you want to signal to a computer that's waiting
for input from you that you don't want to quit the program that would be control c but you just want to
be done input input to the computer you hit control d otherwise known as EOF and the way to express this
and you would only know this from documentation would be to say something like this if the number
the human type in equals end of file but there is no such thing in this context you actually do this
because of how the CS50 library works turns out that if the only values a function can return
our integers that means you can return zero one negative one two billion negative two billion
give or take what humans did for years with old programming languages is they just steal one
or few numbers for instance you steal like the number two billion and call it int max the maximum
integer and you just say you can never actually type two billion because we are using that as a special
value to signify that the human hit control d or you could do negative two billion or you could do zero
or 50 but at some point you have to steal one of the four billion available numbers to use as a
centenal value a special value that you can then check for as a constant so anyhow this just means
when the user's done a typing input go ahead and break out of this while loop and as on the side let me
fix one thing it turns out things can go wrong with realloc and if realloc fails to allocate memory
it can return null a special value that just means something went wrong it's an invalid pointer
it's the address zero and so it turns out there's a subtle bug here where technically I should
actually do this store realloc's return value in a temporary variable because if temp equals null
something went wrong and I should actually go ahead and quit out of this program but let me wave
my hand at that for now because it's more of a corner case but you'll see in the online version
of this program we have additional error checking that just checks in the rare case that realloc fails
clean it up and return properly but a wave to the online code for that all right any questions on that
example before we move on yeah yes good question when you call realloc and it ends up allocating more space
does it clear uh the original memory no and that is where garbage values come from for instance
because if they're just left in memory from the previous use other questions yeah what does it use
your user actually type to break oh control D control D and it's not break it's to send
end of file end of input control C kills or breaks out of the program itself
same as int max yes correct in the CS50 library int max yes is the symbol yes yeah good
you'll suggest us to use it to like like say do you want to enter another member yes or no absolutely
we could add more logic and you could use get string and we could prompt him or her hey do you
want to input another number only downside of that would be now I have to type in not only my number
but like yes or no constantly so just to trade off user interface wise all right so let me go
ahead and let me go ahead and return zero here just as my simple solutions to this problem
of something going wrong I've just compiled this program let me go ahead and run it I'm going to type in
one number two numbers three numbers and now I'm bored I don't want to keep doing this how do I tell
the computer I'm done control D and oh okay that's correct behavior because I forgot a key step
what's that yeah I'm not actually doing anything with the values I should probably like four
int I get zero I less than size I plus plus code we had before and I should probably print out
you input it percent I comma this save that make list one so all I did was read the printing code
now if I rerun this one two three control D damn it oh okay now I broke my code here let me do this
we're going to get rid of this error checking because I'm not actually ever resizing numbers gets
realloc oh and maybe someone was chiming chiming in with this numbers bracket size gets the
users input size plus plus was this a key detail someone wanted me to do okay so I didn't actually
finish the program earlier notice we left off as follows hey computer give me an array of size zero
initially that's null there's no memory for it therefore the size of this array is zero do the following
forever get a number from the human if the number equals the special value int max just break out
because the program is done and actually sorry this is why I write these in advance too
okay go ahead and prompt the user for number if they have inputed the control D just break out of this
loop however if the size of the array equals its current capacity go ahead and reallocate space for
this thing being one number bigger than it previously was now assuming that says succeeded and we have
memory go ahead and just like our list zero example store in the numbers array at the current location
which is zero whatever number the human typed in and then increment the size by one to remember what we have
done I'm also though going to need to do capacity plus plus here to remember that we've increased
the capacity of the array so again two new measures capacity is how much space there is in total
size is how much we're using they happen to be identical at the moment because we're growing this
thing step by step by step all right let me go ahead and hit save let me go ahead and compile this one
last time dot slash list one and input one two three control D okay now it's just an aesthetic bug
I forgot my back slash n so just to prove that I can actually program dot slash list one one two three
control D all right so you inputed one and the reason it didn't move to another line is because
control D gets sent immediately without hitting enter all right that's all using arrays now let's do the
sort of cake baked already and pull it out of the oven the third and final example here
is list two and actually before we get there let me note one thing yeah let's do one last thing here
let me go ahead and run per earlier our new friend valigrant on list one enter it's waiting for me
to type in one two three let me go ahead and hit control D interesting I seem to have a buggy program
even though I claimed a moment ago that I knew what I was doing 12 bytes in one blocks are
definitely lost and lost record one of one again I don't understand most of those words but 12 bytes
definitely lost probably my fault why is it 12 and what are those 12 bytes yeah you have I think you
made three integers yeah one two and three and each one is four bytes you know three the
exactly I typed in three numbers one two and three each of those is four bytes on this computer that's
12 three times four and so I'd never freed them seems to be the source of the issue so at the end
let's just prove that valigrant can detect correctness as well free my numbers semicolon let me go ahead
and rerun make list one and now let me increase the size of this and do valigrant again on list one
typing in the same values one two and three control D all heat blocks were freed no leaks are possible
so again valigrant is your friend it finds problems that you didn't even necessarily notice and you
didn't have to read through your lines of code again and again to identify the source of the
issue necessarily all right any questions then on these arrays that are dynamically allocated
and the bugs we find there in with valigrant all right so the last demonstration of code is going to
be this I have stolen for this final example some of the building blocks that we had on the screen earlier
in my code for list two dot c I need a structure called node and that node as we claimed earlier
with our human volunteers is going to contain a number called number we'll call it this time instead of
n and it's going to contain a pointer called next to another such node so that's copied and pasted
earlier albeit with the integer rename to number for clarity now notice in main what I'm doing first
go ahead and allocate in array of no space initially so this was like when combi was holding up
first and representing the beginning of our data structure this is the analog using an array
that the piece of paper that would be held up here would be numbers and it's just pointing at nothing
at all like left hand down on the floor because there's no memory yet allocated but then go ahead
and while true go ahead and get an integer from the user with this code here check if the user hit
control d as with this arcane technique and then our code is similar in spirit but we have to
stitch these things together allocate space for the number so when I mallocked an additional
volunteer from the audience and here she came down the equivalent in code is this hey computer
allocate with mallock enough space to fit the size of a node then store the results in a pointer
called n so node star n just means give me a pointer to a node call it n and store the address that was just
allocated from the audience as before why do I have these lines of code here that have highlighted in blue what's
that expressing if bang n or if not n would be how you pronounce it what's going on there
yeah if there's no more memory that you can point to then it fails exactly this isn't going to
happen all that often but if the computer is out of memory and therefore mallock fails you don't want
the program just to fret crash or freeze like all of us hate when that happens on macOS or windows
so check for it if not n or equivalently if n equals equals null just return one quick
gracefully even though annoyingly but don't just crash or do something unexpected so you can simplify
that check to just if not n if n is not a valid pointer return one now here's the code with which we
were implementing the demonstration with our humans and this is the scariest looking or most
cryptic at least looking code we're going to see in c today is our final day in c sort of the
we've been running up a really steep hill of late learning about memory and now data structures
and syntax the last of our syntax and c so what are the symbols to be aware of this line of code here
is how I handed one of our volunteers a piece of paper on the right hand side is the number
that was typed in 55 or 5 or 20 or whatever the value is on the left hand side is where you
want to put it n and then literally an arrow number does this it has with mallock a liner so
prior given me in memory just one of these big rectangles and again the top of this in this example
is called number and the bottom is called next so that's our human having stood up from the
back of the room when I hand that human a number like 55 it visually goes there the line of code
with which you achieve that is this here because notice on line 31 here when I mallocked that note
I stored it's address in a variable called n and that's a pointer as drawn with an arrow to that
big node or if we really want to be nitpicky if this is an address 100 yes then the pointer actually
has the value 100 in it but again that's rarely useful information so we can
abstract the way with just an arrow so line 31 is what creates those boxes on the screen line
38 is what puts the number for instance 55 into the box exactly much like I handed a piece of paper
over so what is this this is the only real new notation today even though we're using lots of
stars elsewhere arrow this is wonderfully the first time in see it actually maps to our pictures
if n is the variable and you do n arrow something that means follow the arrow kind of like
shoots and letters if you grew up playing that and then put the number where the arrow has led you
in the field called number so as an aside we can think about this a different way and is what
data type what is this thing in blue n it's a pointer and it's a pointer two one of these things
that we created earlier so we're not doing students anymore with our structures we're implementing
nodes which have numbers and next pointers so it turns out that if n is a node or n is a pointer
to a node recall that dot notation from before this is not how you access number in this case because
n is not a node itself it's a pointer but if n is a pointer how do you go to a pointer how do you go
to an address with what notation star so recall from last week if we want to go to an address
you could do syntax like this ignore the parentheses for a moment just star n means if n is an
address of a chunk of memory star n means go there once you're there you're like conceptually right
here top left hand corner how do you access individual fields like number or next you use dot notation
so if you literally do star n dot number that means go to the address and access the number field
there's nice syntactic sugar in c which is just a fancy way of saying short hand notation
with which is the arrow but that's all it is this arrow notation doesn't do anything new it just combines
go there with access a field in a struct all in one breath if you will and this just looks a little
prettier this when I told our volunteers earlier point your hand down at the floor that's all that
line of code is doing it's saying go to n's address which is here access the next field and right
in that field null which is just the address zero the default special address like pointing at the
this line of code 40 is just a quick error check if numbers what is that equivalent to that's
actually just saying if numbers not equals null so if numbers is legitimate if malloc worked correctly
then let's go ahead and do the following this is a mouthful what is going on here so this is a
for loop that's not using numbers well or is it's almost every for loop we've written a new
probably written just uses ij maybe k but just integers probably but that doesn't have to be the
case what is a pointer it's an address what is an address a place in memory or a number really
so you can certainly use four loops just involving addresses but how so consider this line of code
this here looks different today but it's everything before that first semicolon that's just
where you initialize a value so this is like saying hey computer go ahead and give me a variable
called ptr and initialize it to be the start of my list then I'm saying hey computer do this
so long as pointer does not equal null and then what am I doing if and let's ignore this for now
as an error check go ahead and sorry let me think for one second if okay let's do this
what is what are these lines of code doing this is the code that was actually suggested the very
end of our human example like what if we wanted to insert all of the elements at the end of the
link list how do you express that so in this highlighted lines of code we're asking the question
if the current pointers next field is null we found the end go ahead and update that next field
to equal n and then break so let me translate this to an actual picture when using smaller boxes
that makes clear where something is going so suppose that this program's been running for a
little while and we have a length list that looks like this where this one's pointing here and maybe
this one's pointing here and this says null here and this points here and the numbers are as we've
been using today a 42 50 13 so the start of this list is called numbers this points to the
start of the list what am I doing in this for loop I am just implementing the following logic with this loop
give me a variable called pointer as represented in the story by like my left finger here and
initialize that to be the start of the list if that nodes next pointer is not is equal to null
add a new node here but this is not null I want to follow the breadcrumbs to here and then oh
we're at the end of the list I want to insert this new thing here so how do I express this code
actually in C so if I look back up here this is the line of code that allocates my left finger here
called pointer and initialize it to equal numbers which is the same as pointing at the first element it's kind of like
combi was representing first earlier but now our array is called numbers next what am I doing
does pointer equal null will know if my left hand is pointing here it obviously doesn't equal null so we
don't have to worry yet then what do I want to do if pointer next equals null well what does that mean
well pointer is here pointer arrow next means here does this equal null in the story
I mean it literally doesn't because it's null is not written there null is way down there so the
condition does not pass so what do I do next if pointer is equal to null doesn't apply
here's a weird update pointer gets pointer next so it's cryptic looking syntax but if pointer is pointing here
what is pointer next that's just this right this is n this is next or this is number this is next
so pointer next is this so what is this value well this is a pointer pointing here so that
highlighted block of code pointer equals pointer next has the effect visually of doing this why
if the arrows are a little too magical just think about these being addresses if this is saying
uh the next address is location one hundred pointer equals pointer next is like saying well this
also equals one hundred whatever one hundred is for instance over here is what both arrows should
now point that and if you now repeat this process and repeat this process eventually that question
we asked earlier is going to apply if pointer next equals null what do I want to do well if pointer next
equals null there's two lines going on pointer next equals n so pointer next is no longer null
it should instead be pointing at n which is the new node and then that's it because this was
already initialized to null and let's propose this was 55 and we're done so so much easier to do
obviously in person with just humans moving around and pointing with their arrows with their left hands
but in code you just have to think about the basic building blocks what is each of these values
where is each of it pointing and which of those fields do need to update and the only new code
here even though we're kind of combining it all in one massive example is this we are actually
using arrow notation to say go to that address and access some value there in and this condition
down here which I'll wave my hand out for now just handles the situation where the list is initially
empty any questions on this thus far all right so let's take a look more graphically at some
final problems we can solve and what you'll see in the in the in the days ahead is the following
when it comes to these linked lists and more we now have the ability to actually allocate things in
memory dynamically we don't necessarily know in advance how many numbers we have or in the
case of the next problem set how many words we have we have the ability to use malloc and maybe even
re-alloc to grow and grow our data structure in memory and we have the ability in code to actually
traverse those values in such a way that we can access memory that's all over the board now
and not necessarily back to back to back but what happens if we want to combine these ideas into
fancier solutions still well let's take a look at that in particular if I go let's say over here
to the following let's consider a problem we might now solve if I want to restore everyone's name in
this room in a data structure I could do what well we could use an array so I could actually draw
it decide how many people are in the room let's call it in and actually draw in boxes on the board
and then iteratively ask everyone for their name and actually write it down if I then wanted to take
attendance at their after and say oh is Alice here or is Bob here or is Karim here or Brian
I could just look through that array and say yes or no that human is here but what's the running
time of that algorithm how long would it take to look up a name in a data structure where I just
drawn it as an array of big list on the board what's that a big oven right because if it's just a
list of names it's going to take big oven and frankly that seems a little slow how could I do
an optimization well what if we combine some of these ideas a razor nice because they give me random
sort of instant access to memory locations but link lists are nice because they allow me to dynamically
add or subtract elements even if I want from the list so you know what instead of writing down
everyone's names like Alice and Bob and Charlie like this in just one big array of some fixed
size that necessarily that might paint me into a corner now I have room for one more name
what if I instead do things a little more cleverly so when I'm actually jotting down everyone's
name in the room what if I instead did okay is Alice here all right Alice is here and then Brian
is here and put Brian here and then maybe Charlie is here all right so Charlie and then maybe
Arnold is here where should I put Arnold so also starts with A you know what let's just put Arnold
here Arnold and let's go and Abby is here so you know what let's just put Abby appear as well
Bob came as well so Bob so what's the pattern obviously following is I'm hearing names called out
I'll alphabetically sort it kind of like Abby kind of ended up in the weird place here but that's fine
because I didn't hear her name first but I did kind of bucketize people into different
rows of the board in other words all of the A names I seem to just write down for convenience at the top
and then all the B names together and C names and probably if I kept going I could do this all the way
through Z in English alphabet so what's nice about this is that yeah I'm making lists of names
but how long is each of those lists if there's N people in the room each of my list is not going to be
N long which is slow it's going to be what N divided by 26 give or take if we assume that there's an
equal number of people with Z names and A names it's going to be roughly N divided by 26 so that I have
these like chains of human names but they're much shorter than they would have been if I just grouped
everyone together and this is a fundamental technique in programming called hashing it turns out
there's things in this world called hash functions these are just mathematical or verbal or code
implemented functions that take us input something and produce as output a number typically a number
from zero to say 25 or from one to 26 but they can also output strings and other contexts as well
so my hash function here in my mind is if you hand me a name I'm going to look at the first letter in
your name and if it's A I'm putting you in location zero if it's B I'm going to put you in location one
if it's a Z I'm going to put you in location 25 at the end so these are all buckets I've got
so to speak in computer science like 26 buckets or room on the board that represent the
starts of people's names so what is that well it would seem that if I don't know in advance how many
A names I have that's kind of like drawing this as a linked list if you will that might just get
longer and longer but I do know that I only have a finite number of first letters so that
at the risk of drawing a little messy is kind of like drawing what data structure it's kind of like
drawing in array that just has 26 spots and what's nice about in array is that I have random access
I can jump right to any letter of the alphabet in constant time one step and once I get there
I'm still going to see a list of names thankfully thanks to linked lists that list can be
short or long but on average let's say it's going to be 126th the length that it would have
been if I just used one array or one linked list so this technique of using a hash function which
again I've defined as you give me a name I take that as input I look at the first letter and I
return his output a number from zero to 25 a hash function let's you create a hash table and
there's different ways to implement hash tables but perhaps one of the most common is indeed like this
you decide in advance on the size of an array but that array does not contain the strings or the
humans names that array actually contains a linked list and it's the linked lists that contain the
names so we borrow ideas from like week two we merge them with an idea today from week four of
adding arrays to linked list respectively and we kind of get the best of both worlds because I can
immediately jump to any letter the alphabet super fast and once I'm there yeah there's a list but
it's not nearly as long as it would have been if I didn't use this trick so what's the running
time of all of this well it turns out that a hash table in the worst case might still take you
how many steps to find someone's name once it's been added to the list very worst case how many
steps if there's n people in the room maybe n y it's kind of a perverse situation but can you
contrive a scenario in which even though we're doing this fanciness it still takes me n steps to
confirm or deny that someone's here yeah everyone's name starts with the same letter for some
weird reason now it's a little silly in the human world but it could happen if you're just talking
data or whatever in the in the computer world this can devolve into short an array would just one
really long linked list but in practice that's not likely gonna happen right if we actually spent the
time here and ask everyone for their name we'd probably get a reasonably uniform distribution
of letters at least as as statistically likely would just human names so that would kind of spread
things out and so there's this fundamental distinction between sort of real world running time
or wall clock time how many seconds are actually spinning on a clock versus asymptotic running
time we've talked for a couple weeks now about running time is being big O of N and that might be
still the case that a hash table yes in the worst case it's still a big O of N data structure
because in the worst case it's gonna take n steps but in the real world big O of N is really big O of N
divided by 26 even though we always ignore those lower order terms but when it's you the human
running the code and analyzing the data running 26 times faster is actually real time saved even
though a mathematician might say ah that's the same fundamentally and indeed one of the problems
ahead for the next problem said is gonna be such to so-so exactly what the implications are in
your own code for actual wall clock running time and making smarter design decisions like something
like this can actually really speed up your code to be 26 times as fast even though yes a
theoretician would say ah but that's still asymptotically or mathematically equivalent
to just something linear so it's this fine tuning that'll make your code even better and better
now frankly hashing on first names probably isn't the smartest thing alone right like it does
anyone's this is gonna be hard especially uh uh is it does anyone's name start with z here
is my less not here but thank you for that for a fair counter example but she's not here so look there's
no z so now we're down to 25 possible values and I could probably pick some less common letters too
the pointers there's probably a few more a's than there are z's or a few more b's and there are
cues just by nature of human names so maybe just choosing the first letter isn't good enough and frankly
with 26 names suppose we did this for all of Harvard and had thousands of names each of my
chains might still have hundreds or thousands of names so another design question is going to be
well how many buckets should you have how big should the array be maybe you shouldn't look at the
first letter what if you look at the first and the second letter together so a a and a b and a c
and then dot dot dot b a b c so you could come up with more and more buckets but what else how
else might we kind of uh uniformly distribute people what do all of you have that we could use
as input to a hash function okay what you're going to do last name which might give us a different
or similar distribution yeah what's that you can use your ID number and actually look at the
first digit of your ID and odds are it's 0 through 9 so we could probably at least get 10 buckets
that way and that's probably uniformly distributed I'm not sure we could use birth birth dates in
some way like we could put all of the freshman in one bucket all the seniors in another bucket
and everyone else and so forth in their own buckets which would also give us some input so again
a hash function is entirely up to you to program and design the goal though is to smooth things out
you want to have roughly the same number of things in each link list just so that you have found
about the same performance across all of these various inputs so let's take a look at a couple
of other data structures again in this abstract way now that we know that even though it's not
obvious at first attempt we know how to construct arrays we kind of know now how to construct
linked lists it stands to reason we could implement them together in code what else could we do
now with these building blocks so for instance this structure here is a very common one known as
a tree uh tree like a family tree where there's one patriarch or matriarch at the top and then
their children and then their grandchildren and great grandchildren and so forth and what's nice
about a tree structure is that if you're storing data you can actually store the data in
clever ways to the left child to the right child and so forth as follows notice here there's
something curious about all the numbers in this data structure what is not worthy about them
what is not worthy yeah what's that they are multiples of 11 that was just to make them look
pretty though uh by the author here yeah yeah there's a mathematical significance too like no matter
what node or circle you look at the value in it is bigger than the left child and it's smaller
than the right child so it's kind of in between any circle you look at the number to the left
to smaller the number to the right is bigger and I think that applies universally all over the
place yes so what does that mean well we're call from like week zero when we had a whole bunch of like
numbers that we were uh phone book pages that we were searching one two three four five six let's
give ourselves a seventh one recall that when we did divide and conquer or binary search we did it
on an array and what was nice about binary search was we started in the middle and then we
maybe went left or we maybe went right and we kind of divided and divided and divided and conquer the
problem much more efficiently in logarithmic time than it would have been if we did it linearly
but we know now weeks later that arrays kind of limiting right if I keep storing all of my
values in an array what can I not do with the array make it bigger right I can't add an elements
to it without copying every darn element as we've discussed thus far today but what if I was a
little smarter about it what if I stored my values not just in an array but I started storing them
in these circles let's call them nodes and each of those nodes is really just an integer plus
two additional values how would we implement this data structure in memory well here's an
int n could represent the number in question and we could put that in a data structure called a node
that just has the same syntax as earlier today but I've left room for two more fields what is
it that I want to represent in code if I want to start storing my numbers not in this old school
week zero or a but in a tree two two pointers right a tree as drawn here with literally with
arrows is just like saying every one of these nodes or circles has a left child and a right
child how do you implement children well you can literally just use pointer notation as well here
a left child is just a pointer to another struct on the left and a right child is just another
pointer to the child on the right and what's nice about this ultimately is that we can now traverse
this tree just as efficiently as we can traverse this array because notice if I want to search for the
number 66 how many steps does it take me if I start at the top just like combi represented the
start of our linked list so in the world of a tree does the root have special significance and that's
where we always begin so how many steps does it take me to find 66 given the top it looks like
two yeah like want two or three right I started the top I look at it say on 55 which way do I go
I go to the right then I see 77 okay which way do I go I go to the left so it's the same logic
as we zero in dividing and conquering the phone book or an array a couple weeks later but we get to the
number we care about pretty quickly and it's not linear and in fact if we actually did out the
math what's really cool about a binary search tree is that if you have n elements n circles
the height of that tree is by definition mathematically log n so the height of the tree just so happens
to correspond to exactly how many times you can take n and divide it divide it divide it divide it into
and you can actually see this if you think about it the reverse direction on the bottom row there are
how many elements all right and on the middle row there's two and on the top row there's one
so you can actually see it in the reverse direction this is like dividing conquer but in a different
conceptual way like it's being divide every every row in the tree has half as many elements as the one
below it and so the implication of that is just like from week zero in the phone book when we're
dividing and dividing and dividing in half and half and half so this is only to say now that we have
structures and pointers we can build something like this but let's try one other example here too
this is a crazy looking example but it's kind of amazing suppose that if we want to store
a dictionary of words so not humans names this time but like English words so mariam weptor
Oxford English dictionary has like what thousands hundreds of thousands of words these days and English
for instance how do you actually store those well if you just look upwards in the dictionary
back in yesterday year like that is linear like you have to start at the beginning and look through it
page by page looking for words or you could be a little smarter because the words in any
dictionary are hopefully alphabetized you can do the Mike Smith style divide and conquer by
going to the middle then the middle of the middle and so forth log event but what if I told you
you could look upwards in constant time some fixed number of steps none of these divide and
conquer complexity no log and just constant time you want to word go get it instantly that's where
this last structure comes in which is called a try TRI. TRI short for retrieval even though it's
pronounced the opposite so a try is a tree each of whose nodes is in a red so it's like this
weird Frankenstein's monster kind of data structure where just really combining lots of different
ideas as follows and the way a try works as is implied by this partial diagram on the board
is that if you want to store the name Brian for instance in your dictionary is the first word
what you do is you start by creating a tree with just one node but that node is effectively in a
ray that a ray is of size let's say for simplicity 26 so a through z this location here therefore
represents b for Brian so if I want to insert Brian into this tree I create one node at the top
and then for the second letter in his name r I create another node also in a ray a through z
and so here I put a pointer to this node here b r i so i should have drawn some more boxes
by a b c d e f g h i so here I'm going to draw another pointer to b wait Brian okay that's wrong
I miss Billy shall be our name Billy is that b wait that no that's no
damn it b b b i b yes this works this works okay sorry so here we go we're inserting
Billy into this the fancy data structure so the first node represents the first letter the second
node represents the second letter the third node represents the third letter and so forth but what's
cool about this is the reusability so notice if this is the second letter and I counted this
out correctly i this is going to lead to a third node deeper in the tree where it's L that we
care about next and then another one down here which represents another L and I'll start drawing
the letters L this is b this is i L and we'll call this L and then finally another one over here
which is a y and this gets pointing down here this gets pointing here and so forth so in short
we have one node essentially for every letter in the word that we're inserting into the data
structure now this looks stupidly inefficient at the moment because the store b i L L y how much
memory did i just use 26 plus 26 plus 26 plus 26 plus 26 just a store five characters i use 26 times
five but this is kind of a thematic in computer science spend a little more space and i bet
I can decrease the amount of time it takes to find anyone because now no matter how many other
students are in this data structure and for instance let's do another one if we had another one
like bob so b is the same first letter that leads us to the second node oh is somewhere else in this
array say over here so this represents oh and then bob has another one so there's going to be another
array here and this is why the picture above draws this so succinct this is how we might store bob
so b i l l y or you can follow a different route b oh b so we can start to reuse some of these arrays
so there's where you start to get some of the efficiency anytime name share a few letters
then you start reusing those same nodes so it's not super super wasteful but the question now is
if there's like a thousand students in the class or a thousand students in the room we're going to have
a lot of nodes there on the board but how many steps does it take to find billy or bob or any name
with this data structure and to conclude yes or no that student is in the class
so like five for billy three for bob and notice none of that math has any relationship to how many
students are in the room if we instead wrote out a long list of a thousand names in the worst case
it might take me a thousand steps to find billy or bob maybe i could be a little smarter if i
sort it but in the worst case big o event it's linear or if i used a hash table before and maybe there's
a thousand students in the room but okay there's 26 letters in the English alphabet at least so that's
26 buckets so maybe it's a thousand divided by 26 worst case if i'm using those chains those linked lists
inside my array but wait a minute if i'm using this structure a try where every node in the tree
is just in an array that leads me to the next node a lot of breadcrumbs b i l l y is five and always five
b o b is always three b r i a and would have been five as well none of these tutorials has any
impact or any influence from the number of total names in the data structure so a try in some
senses this amazing like this amazing holy grail in that by combining these various data structures
now you get constant time but you do pay a price and just to be clear what is the price we seem to be paying
memory and in fact this is why i'm not really drawing it much more because it just becomes a big
mess on the screen because it's hard to draw such wide data structures it's taking a huge amount of
memory but theoretically it's being coming faster yeah question oh the question so it's a bit of
simplification if you were storing both bob and bobby you would actually keep going so each of
these elements is not just one letter you also have essentially a node there or some other
data structure that says either stop here or continue and you'll see actually in the problems
that will propose to you how you can represent that idea if you choose to go this route indeed
the challenge ahead ultimately is something quite like this you will implement your very own spell
checker and we will give you code they could you start it with this process and of course
a spell checker these days in google docs and Microsoft Word just like underlines in red misspelled
words but what's going on and how is it that word or google docs can spell check your English or
whatever language so quickly well it has a dictionary in memory probably with tens of thousands
or hundreds of thousands of words and all they're doing constantly is every time you type a word and hit
the space bar or period or enter it's quickly looking up that new word or those words and it's
dictionary and saying yes or no should I squiggle red line underneath this word and so what we're
going to do is give you a big text file asky text containing like a hundred plus thousand words
you're going to have to decide how to load those into memory not just correctly but in a way
that's well designed and we'll even give you a tool if you choose to use it that times how long
your code takes and it even counts how much RAM you're actually using but the key goals for this
week and our final we can see as to take some of these basic building blocks like arrays and pointers
and structures and decide for yourselves how your most comfortable stitching them together to
what extent you want to really fine tune your code beyond just getting it correct and it
give you a better sense of the underlying code that people have had to write for years and libraries
to make programming doable a lot scratch because in just a few weeks we're going to transition
to Python and the dozens of lines of code you've been writing now are going to be widow down to
one line two line because we're going to get a lot more features from these newer fancier languages
but you'll hopefully have an appreciation of what is actually going on underneath that hood
so I'll stick around for anyone I want questions let's call it a day take a duck on your way out
for roommates as well we'll see you next time
Einführung in die CS50 IDE
Debugger und Code-Prüfwerkzeuge
Konzepte des Speichermanagements
Verwendung des Valgrind-Tools
Speicherzuweisung und Zugriff auf den Speicher
Abschliessende Valgrind-Prüfungen
Verantwortlichkeiten im Bereich Speicherverwaltung
Dynamischer Speicher und Zeiger
Verstehen von Strukturen in C
Erstellen benutzerdefinierter Datentypen
Benutzerdefinierte Datenstrukturen mit Strukturen
Kapselung in der Programmierung
Einschränkungen von Arrays
Ändern der Grösse von Arrays in C
Einführung in verkettete Listen
Erstellen einer Knotenstruktur
Das Verständnis des Konzepts der verketteten Liste
Zeigerdarstellung in verketteten Listen
Elemente in eine verkettete Liste einfügen
Dynamische Listen Erstellung in C
Eingabeverarbeitung und Fehler
Kapazitätsprüfung in der Programmierung
Dynamische Benutzereingabe
Konzept der Grössenänderung von Arrays
Bäume in Datenstrukturen verstehen
Verstehen der dynamischen Speicherzuweisung
Dynamische Array-Manipulation
Speicherverwaltung mit Realloc
Binärbäume erklärt
Grundlagen der Baumdurchläufe
Benutzereingabeverarbeitung in C
Implementierung von Binären Suchbäumen
Einführung in Hash-Tabellen
Erklärung der Hash-Funktion
Kollisionsauflösungstechniken
Einführung in Datenstrukturen
Implementierung von Hash-Tabellen in C
Was sind die Hauptfunktionen von CS50 IDE, die das Programmieren einfacher machen?
Wie kann Valgrind dabei helfen, Speicherlecks in Programmen zu finden?
Warum kann es ein Problem sein, auf nicht erlaubten Speicher in C zuzugreifen?
Was zeigt Valgrind über die Risiken von schlechtem Umgang mit Speicher?
Wie kannst du sicherstellen, dass der ganze zugewiesene Speicher in deinem Programm richtig freigegeben wird?
Was lehrt dich das Gummienten-Debugging über das Finden von Bugs?
Warum ist es gefährlich, einen Pointer nicht zu dereferenzieren, bevor man ihn initialisiert?
Was sind die Vorteile, wenn man eigene Datentypen in C definiert?
Was sind die Vorteile, wenn man verwandte Infos in Structs packt?
Warum können Arrays in C nicht richtig mit verschiedenen Datentypen umgehen?
Wie hilft es, den Speicher neu zuzuweisen, um mit Arrays in C besser klarzukommen?
Wie erklärst du einen Strukturtyp, bevor du ihn im C-Code benutzt?
Was sind die Schritte, um neue Elemente dynamisch in eine verkettete Liste einzufügen?
Warum ist es so wichtig, die Beziehungen zwischen den Knoten zu checken, wenn man verkettete Listen baut?
Wie kannst du die Eingabevalidierung für unbekannte Benutzereingaben umsetzen?
Was passiert, wenn die Leute unerwartete Werte wie negative Zahlen eingeben?
Wie gehst du mit Arrays um, wenn die Anzahl der Eingaben nicht festgelegt ist?
Was sind die wichtigsten Vorteile von Baumstrukturen im Datenmanagement?
Wie funktionieren eigentlich die verschiedenen Baumdurchlauf-Techniken in Datenstrukturen?
Was macht binäre Suchbäume so effizient, wenn's darum geht, Daten zu suchen und zu sortieren?
Wie beeinflusst eine gute Hash-Funktion die Geschwindigkeit, mit der man Daten in Hash-Tabellen abrufen kann?
Welche Strategien können helfen, Kollisionen in Hash-Tabellen zu vermeiden?
Warum sind Hash-Tabellen in manchen Fällen besser als Arrays?
Wie verbessert die realloc-Funktion die Speichernutzung in C?
Welche Strategien verbessern die Eingabeverarbeitung in C-Programmen?
Warum ist die Speicherneuordnung für dynamische Arrays so wichtig?
Die Sitzung beginnt mit einer Einführung in die CS50 IDE, eine webbasierte Programmierumgebung, die das Codieren und Debuggen erleichtert. Wichtige Funktionen wie debug 50, mit dem ihr durch den Code gehen und Variablen inspizieren könnt, werden hervorgehoben. Ausserdem wird check 50 als ein Werkzeug vorgestellt, um die Korrektheit des Codes durch automatisierte Tests zu überprüfen, ähnlich wie das Feedback von Tutoren. Die Diskussion geht auch auf die Nützlichkeit von print-tap zum Debuggen ein, mit dem Programmierer Variablenwerte und Texte ausgeben können, um Probleme in ihrem Code zu identifizieren. Darüber hinaus wird eine einzigartige Funktion erwähnt, die die Umwandlung von C-Code in Scratch ermöglicht und die Vielseitigkeit der IDE zeigt. Die Sitzung ermutigt dazu, sich mit diesen Werkzeugen vertraut zu machen, um die Programmierpraktiken und Problemlösungsfähigkeiten zu verbessern. Insgesamt liegt der Fokus darauf, die Lernenden mit den notwendigen Ressourcen auszustatten, um Programmierherausforderungen effektiv zu meistern.