Hello, it's been a while since I've done one of these videos, but this video is 8 bits
of image processing you should know.
And you might be thinking, well why do I need to know anything about image processing?
Well, images are just 2D arrays of data, and the algorithms that we apply to this data
can shape it in useful ways.
Obviously, some of the applications involve images and cameras and video footage.
But there's also other ways of manipulating 2D data to your advantage, for example,
in things like procedural generation.
On the whole, I think most programmers should have an awareness of image processing.
It is a very useful tool to have in your toolbox.
So let's get started.
Before I start, I'm going to show you the video that I've created to demonstrate the
8 bits, and it's quite nice because it allows us to quickly compare the algorithms.
So here it's going to show the first bit which will be thresholding and we can choose
different numbers to look at the different algorithms and see what their effects are.
Because it's working with video, you can see here's a live feed of my arm waving around.
I think it makes quite a nice interactive tool, which is great for learning.
But this video is going to be a bit different to some of my others, and not going to
go through it line by line from scratch.
I've already created the code, and what I really want to emphasize is what is the image
processing that is going on and how does it work.
Bit one.
This is the process of binarizing an image.
Here I have an image, and I'm going to assume it's a grey scale image, so the pixels
go from black to white.
Freshholding involves taking an individual pixel and classifying it as being above or below
the threshold.
If it's above the threshold, you output one.
If it's below the threshold, you output zero.
This green line represents a single row in our image.
If I take this row and plot its exposition against the brightness of that pixel, I might
get something that looks like this.
Freshholding involves specifying an appropriate value to act as a cut-off.
So any pixels above that value will get classified as one, and any below it will get classified
as zero.
The red dashed line represents my threshold value.
So now, with my blue pen, I can indicate what the binary equivalent of this might be.
So it starts down here at zero, but that goes above the threshold to one.
Below the threshold, above the threshold.
And we've binarized our image.
To demonstrate these programs, I'm using a pixel game engine application that I've already
created.
And I feel it's necessary to give you a brief overview of what this application is before
we get stuck into the algorithm code, just so it makes some sort of sense.
Fundamentally, it's based on the idea of a frame, which is a fixed 2D array of pixels
in this case 320 by 240.
The pixels are floating point type.
So instead of working with RGB values, I'm taking the RGB from the camera and converting
it to a floating point value between zero and one.
By converting to the floating point domain from the integer domain, allows me to avoid
complexities such as integer division.
This simple frame class has some accesses, get and set, which will do boundary checks
for me.
So I can quite happily set a pixel's value beyond the edges of the image.
And if I get something from beyond the image, it just returns a black pixel.
So zero is black and white is one.
My frame class also overrides the assignment operator.
So I can have multiple frames in my application.
And I can transfer the contents of one frame to the other with ease.
For this video, I'm not going to dwell on the image capture side of things.
I've already done that in other videos.
And it's enough to say that we simply use the s scappy library to capture a frame from
a webcam.
So in on user create, the webcam is initialized.
And in on user update, I capture the image from the webcam per frame and convert the
pixels to floating point and store it in a frame called input.
This program shows eight different algorithms.
And so the bulk of the code shown here handles the selection of which algorithm is currently
being demonstrated.
And the algorithms also have a degree of user input, which allows the user to change the
values to play with the algorithm and see how they respond under different circumstances.
For example, when the user presses the one key on the keyboard, it changes the current
algorithm being demonstrated to threshold.
So let's continue looking at that algorithm.
Here it is.
And you'll see this on most of the algorithms.
We do a little bit of user input if there are values to change.
And then we actually perform the algorithm under demonstration.
And thresholding is very simple.
For all of the pixels in the frame, we read the input value of a pixel for that location,
compare it with a threshold value, which will give us a one or a zero in response.
And then we write that to an output frame.
At the end of the program, I then draw the input and output frames.
Hopefully you can see thresholding is very simple indeed.
So let's take a look at it.
This is thresholding.
Now my webcam has some automatic gain correction, which is what you saw then, as the
image sort of changed and faded.
I can't override those settings using the API for the camera.
But for this video, it doesn't really matter.
I'm in threshold mode now, and we can see the input image here on the left.
It's in gray scale, but the output image here on the right is in black and white.
It's been binarized.
It says here, I can use the Z and X keys to change the value of the threshold.
So currently it's at 0.5.
It's halfway between the minimum and maximum intensities for the gray scale.
As I increase the threshold value, we see less pixels being attributed to a binary one.
And as I decrease it, we see the opposite.
Fresh holding is essentially the corsister filters.
And it's usually the first step in removing as much rubbish from an image as you can.
For example here, you can see on the notebook, the text one lone coder comes through quite
clearly, but the lines and the slight grayness of it doesn't.
So if we were then to go on and extract this text, for example, it's much easier now
we're not contaminated with this spatial background noise with thresholded it out.
Bit two, motion.
And for this video, I'm assuming the simplest kind of motion.
We won't be able to get any direction information from this.
The word motion implies that something has moved.
And for something to move takes time.
So to detect motion in an image, we need to allow time to have elapsed.
Fortunately, with video, this is quite simple because a video camera returns successive
frames in time, which means we have a built in delta time between each frame.
Alongside movement in time, motion also implies movement in space.
The object was in one location and now it's in another.
But for this bit, let's not think of objects as being the things we're looking at instead
we're looking at pixel gray scale values.
So over time, if something is moving in the image, a particular pixel value is also changing.
So we can identify that motion has occurred.
By looking at the difference of pixel values between successive frames of video input.
And so on this graph, we can see that the difference between A and B is related to
the change in that gray scale value.
The end result of this could be signed and in some applications, that's a useful thing.
It gives you additional information.
But for our application, I'm just going to take the absolute value of this.
To tell us that motion for that pixel is likely to have occurred.
The code for motion detection is equally as simple as thresholding.
I'm going to go through every single pixel in my frames with these nested for loops.
And I'm going to look at the difference between the current frame and the previous frame.
By subtracting them.
I'm them taking the absolute value of that result and setting that in the corresponding
location to the output frame.
I then draw the input and output frames.
I update the previous input frame before I acquire a new image in the input frame.
Here's the algorithm running and it's looking at a reasonably static scene.
But as soon as things start to move, I bring my hand into the scene.
We're looking at the difference between one frame and the previous frame.
But we only see illumination in the output, where there has been change.
So that signifies that motion has occurred in those locations.
Because the frame rate of my camera is reasonably quick, it's about 20 frames per second,
I get what looks like an edge around the object that's moving.
But don't be fooled by this, it's not strictly an edge, although you can use it as an
edge.
It is just the difference between the two frames.
Motion detection like this is usually a foundation algorithm.
It is used to guide your decisions in subsequent algorithms that you apply to the image.
For example, I might want a system to shut down if nothing in the scene is moving.
I mean why bother taking more images if nothing has changed.
So I could detect that by accumulating the sum of all of the pixels in the output image
and then checking that against a threshold value.
To tell me has there been enough motion in the image for the system to switch on.
Bit three, low pass temporal filtering.
As we've just seen in bit two, the value of a pixel changes over time.
And if we look over a longer period of time, we might see that the pixels change values
quite rapidly.
Between frames, this is called noise, because sensors aren't perfect.
Using conditions, electronics and all sorts of things can influence the value of a pixel.
This noise can cause us problems, because what we actually want to see is the real value
of the pixel change over time, which is indicated by this green line.
We can approximate that that's somewhere in between all of these noise values.
A noise can become a problem if you do things such as thresholding, because the noise
might just tip you above or below the threshold inappropriately.
We effectively want to run the grayscale value of the pixel through a low pass temporal
filter.
So the low frequency component of the pixel is allowed through and the high frequency components
are removed.
We can approximate this with a very simple equation.
For a given pixel value, p, we're going to update that pixel value by looking at the
difference between the input pixel value and the current pixel value and multiplying
that by a constant.
And, ultimately, if this distance is small, then the change in our output pixel is small,
and if it's large, then the change in our output pixel is large.
But we can regulate that change with this constant.
In engineering, this is also known as an RC filter, and its implementation is very simple.
In the low pass section of the program, I'm doing some user input so I can change the
value of this temporal constant.
And then I iterate through all of the pixels in a frame.
I look at the difference between the input and the output.
I scale the difference with our temporal coefficient.
And then I accumulate that difference back into the output frame with this plus symbol.
For this algorithm, the output frame is persistent between updates of the video camera
feed, meaning that output pixels are only changed by a small amount depending on how large
the change was of the input.
So here in the program, I'm now running bit three, the low pass temporal filter.
And the two images look very similar.
It might not even be that possible to see on the YouTube video, but the input image on
the left actually has quite a lot of per pixel noise.
But the output image on the right has no temporal noise visible to the naked eye.
If I move my hand into the scene, this is a particularly slow filter.
So I can make rapid changes, sort of by wiggling my fingers around here, but we can see
that the output image doesn't change very much, is ignoring those fast changes, only
allowing the really slow changes if I leave my hand in a fixed position, eventually
it feeds into the image.
So this is exaggerated in a way.
I can use the Z and X keys to change the value of this constant, so I can make it very
slow indeed.
Which might not immediately seem a useful thing to do, but if you wanted to do some background
subtraction algorithms over moving images, this is quite a nice way to do it.
You can accumulate the background of an image over time and then use that as a way to
isolate things in the foreground.
If I increase the value of the constant, it becomes far more live, let's keep going a bit.
Until the two images look very similar indeed, but if you get it to high this constant,
you'll start seeing the per pixel camera noise coming back into the output image.
So low pass tempo filtering is a great way to filter noise, and it also looks all
ghostly and cool, bit for convolution, whereas the previous two bits have looked at filtering
things in the time domain, convolution looks at filtering things in the spatial domain.
Fundamentally, we're going to decide what to do with a pixel, by looking at its neighborhood.
For this example, I'm going to look at the immediate three by three neighborhood of our target
pixel, and this neighborhood is called a kernel.
And you can think of a kernel as a template of coefficients that are used in a dot product
of the neighboring pixels in that region, and values of the kernel to give us a result
for the central pixel.
So my kernel might be defined here as a three by three matrix of values.
These values are overlaid over the corresponding pixel in that location, and we also include
the central value, which is the target pixel.
I can give these kernel values location information to identify the relationship to the
target pixel.
I work out my final pixel value by performing the dot product between a kernel coefficient
and the gray scale value of the pixel at that location.
So this component, for example, is this component of the kernel multiplied by this pixel
value, and we go on to go through all of the kernel locations.
And so what effect do you think this kernel might have then?
Well, we can see it as being regions of influence, we're strongly influenced by the
target pixel, the one in the middle.
But we're also a little bit influenced by our immediate north, south, east, and west neighbors.
Finally in this kernel, all of these values add up to one, and this is quite deliberate.
So we take the bulk of our pixels value from what it already is.
But then we take a little bit from its neighbors, but we still land within the approximate
range for that pixel.
This will give us the effect of blurring the image, because we go on to apply this kernel
for every single pixel in the image.
I've implemented convolution in a really naive way.
Here I'm going through every pixel in the frame.
And for each location, I'm accumulating into an F-sum variable.
My kernels are 3 by 3.
But I want to get the corresponding offset location for a kernel coefficient in my image.
So I've got an additional two nested four loops, which iterate through the values in my
kernel.
The kernel we've just created is a blurring kernel, and I'm representing that as just
an array of nine floating point values.
I index into the appropriate location of that kernel using some simple 2D to 1D arithmetic.
Once I've got the right coefficient, I'm multiplied by the input grayscale value, and
accumulate it into my F-sum variable.
I then set my output pixel for that location to the F-sum variable.
In this demonstration, I've included two kernels, blurring, and sharpening.
But the coefficients of the kernels are quite different.
There's a little bit of user input at the top to choose which kernel we're going to apply
to the image.
In the demonstration program, I've chosen bit 4 for convolution, and it's currently running
the blurring kernel.
And we can see that the input image on the left is sharper than the output image on the
right.
This blurring only occurs once, over a 3 by 3 neighborhood.
So it's a very delicate blur.
However, if I press the X key, I can change to the sharpening kernel.
And we can see that this kernel has the effect of enhancing any areas of high contrast
in the image.
You may have seen these filters in popular art programs.
The downside to sharpening, of course, is it also sharpens all of the noise, so we may want
to combine convolution with some of the previous filtering algorithms we've already seen.
In this convolution example, I've used a very small kernel 3 by 3, so it can only look
at its immediate neighbors.
For blurring, for example, if we wanted to have more of a blurring image, there's two
ways to go about it.
The first is we could repeatedly blur the image, so once we've blurred it once, we then
use that as the input and blur it again, and again and again and again until we've got
the desired level of blur.
The second approach is to use a much larger kernel, so we can go 5 by 5, 7 by 7, 11 by
11, whatever you want.
But the kernels and the convolution I've shown here are four levels of nested for loops.
It will explode computationally, and become very slow and difficult to get any kind
of real-time performance.
If you're serious about doing convolutions and most image processing techniques and
programmers are, then you'll want to do your convolutions in the Fourier domain, the
frequency domain instead, where you take the fast Fourier transform of your input image,
and your fast Fourier transform of your kernel, combine them, and then take the inverse
Fourier transform of the result, and this will allow you to get very large kernels
with a fixed computational overhead, a far more soozle approach for real-time image processing.
Bit 5, Sibel Edge Detection, edges in images indicate where the information of an image
is, simplistically because edges indicate where spatial change has occurred.
In this example image, we have background, and we have a box.
The background doesn't change locally, and the box doesn't change locally.
So there's no relevant information in either of those zones, so one could argue that
the box is really only defined by how it is different to the background, and this difference
lies along the edge of the box.
So detecting edges is quite an important step, and perhaps the most classical way to detect
edges is using Sibel's Edge Detection, which is a pair of kernels used in a convolution.
The two kernels detect edges in the two main directions.
We've got horizontal, the kernel looks something like this.
And we've got vertical.
If we convolve the image with the horizontal kernel, we will see the horizontal edges,
and then we'll convolve the image with the vertical kernel, and we'll see the vertical
edges.
We'll then combine the horizontal edges and the vertical edges into a single output image
to show us all edges.
As before with the motion detection, the sine bit of the result of these convolutions
can contain useful and interesting information.
But I'm going to throw it away by taking the absolute value of the results of these convolutions.
The Sibel part of the code is exactly the same as the convolution part of the code before,
except I'm doing the two things at once, I'm maintaining two summation variables instead
of the one.
And when I'm writing the output summation, I'm taking the average of the sum for the vertical
and sum for horizontal components.
I've defined the kernels for Sibel in exactly the same way as I did before.
They're just arrays of floating point values.
Here is the demonstration program running the Sibel edge detection algorithm.
And as we can see, edges are illuminated, and boring surfaces are those that have low frequency
spatial information that remain black.
The nice thing about Sibel is it works on this gray scale input, and you can see it starts
to highlight all of the areas of high frequency information.
So look at my really hurry arm that it's quite a visible thing, so that's really indicated
texture.
Bit six, morphological operations.
Even though that sounds like a mouthful, morphological operations are really a study of
how things look spatially in the image.
We want to do things regarding the shape of objects in the image.
And we do this by working in the binary domain.
So we must threshold the image first to binarize our pixels to zero one values.
And for this bit, it's really split into three different bits.
The first I'm going to look at is called erosion.
In this simple demonstration image, I'm assuming that the background is zero, and my object
is one.
Erosion is the effect of eroding a single pixel of all edges of our object.
Simply shrinking it.
And it's useful to remove airiness, spurious pixels from other stages of image processing.
Because if I had a single pixel like this, or a cluster of a single pixels in a line, and
I eroded it, then if we're removing a one pixel from all edges, it's going to disappear
entirely.
Whereas larger objects, the morphology remains intact.
The opposite to erosion is dilation.
And it is quite literally the opposite.
This time we grow a one pixel boundary around our shape.
So just going back to the previous example, if we had some spurious small information, and
we first erode it to remove it.
And then we dilate the image again.
There's nothing here to actually dilate, but our original shape will go back to something
very similar to how it was originally.
This is a nice way of removing spatial noise from an image.
In many ways, implementing morphological operations is very similar to convolutions, but
this time we use logic instead of dot products.
Looking at erosion, we use a very similar 3x3 kernel to how we use the convolutions, but
this time my kernel is just going to be a 3x3 matrix of logic ones.
For every pixel in my source image, I overlay my morphological operation kernel for that
pixel.
So let's for example say, I put it here, centered around this pixel.
I then do a logic and with all of the elements in the kernel and all of the elements in
the image.
In this case, we can see that one and zero here is, well, zero, and this one's going
to be zero, this one's going to be zero, and once we've ended all of those together,
the end result is going to be zero, because we've got some zeros in there.
So I will write to my image zero, however, when I get round to operating on this section
of the image, the result is a logic one, because all of the kernel coefficients and all
of the surrounding neighborhood pixels are going to end together to give me a one.
This pixel that was on its own has been eroded, but this pixel that is robustly supported
by its neighborhood stays intact.
Instead of just doing logic and we could also do some simple arithmetic, we could count
how many of our neighbors are equal to one and come to the same conclusion.
So in this case, I've got eight neighbors all equal to one, so my current value pixel
is a one, but in the original scenario, only three of my neighbors were one.
So I could then look for less than eight neighbors, and in that condition, I set myself
to zero.
Now the interesting thing with erosion is that one man's erosion is another man's
dilation.
If I inverted all of these bits and applied the same kernel, I would have the effect
of dilating the image.
But I can also dilate in a second way without requiring this inversion, and this one is
quite simple.
If a given pixel exists, then set its neighbors to exist as well.
And this would have the effect of growing our object by one pixel along all edges.
Before I get stuck into the code on this, I just want to show visually another useful
thing for dilation, that if I can identify a particular point on an image, let's say here,
and I repeatedly dilate, I can effectively flood fill in the image.
Now there's an important condition here, is that after every dilation, if we logically
and that with the original image, we'll never go beyond the boundaries of the original
image.
So this allows me to fill a secondary image with just the space occupied by a single shape
in binary space of the input image.
And this is a great way for doing image segmentation, the extraction of objects or labeling
image parts.
In fact, I find this to be really interesting, and I think it's worthy of a video on
its own right.
I've got some interesting examples of how we can demonstrate that.
So expect in the near future of follow-up video to this, just showing morphological operations.
In the code, the morphological section is the largest of them all.
It's got a great degree of user input, one to select, which particular dilation or erosion
operation we're going to do.
There's also a third one called edge, which I'll talk about when we get to the code.
But I can also control the number of times that I perform the dilation over erosion.
So we'll be able to see those effects visually.
But the first thing and the most important thing is we need to take our image from the
gray-scale domain into the binary domain.
So I'm just thresholding it.
And I'm going to threshold at using the value that we've specified in the threshold algorithm.
So this will allow you to tweak the threshold first.
Then we go on to choose dilation, erosion, or this third one, which I'm calling edge,
because it's going to be an edge detect using morphological operations.
But dilation, it was simply a case of seeing if given pixel value is equal to one,
then set all of your neighbors to one, too.
Easy enough.
But I use this third register activity, because I don't want to alter the frame as I'm
going through the pixels.
It's important that the frames are treated homogenously.
Through erosion, rather than doing a set of logical operations, I decided to go with the
summation root.
So I'm looking at all of the values of my neighbors, and all of the ones that are one,
I'm summing together.
Now, I know that in my activity frame they're going to be ones or zeros anyway.
We've binaryized the image.
So I'm just going to add them.
And I'm going to check to see if my activity is one.
And if it is, but not all of my neighbors are set to one, then I'm going to set myself
to zero, put myself out.
This third one, which I've not drawn up in one note, I've called edge, and it's exactly
the same as erosion.
Instead of looking for a less than eight neighbors, I'm looking for precisely eight neighbors.
In the situation that I am an illuminated pixel, and I have all of my neighbors, I'm going
to extinguish myself.
So let's take a look.
The program is started up with the thresholded image.
And this is just so I can tune the threshold value appropriately.
I can still tune it later on, but it's just a bit easier to do whilst looking at the
greyscale image.
Then I'm going to choose bit six more binary morphological operations.
Now the current operation is dilation.
And it looks like there's just a single iteration being applied.
So on the left, where these individual pixels are occurring, you can see they're expanding
into a three by three region.
This has had the effect of removing the one lone code of text entirely from the book object.
So this might be a precursory stage in being able to extract where is the book this time,
if that's useful.
I can increase the number of dilation with the A and S keys.
And as you might expect, the regions just grow.
So dilation is a way of grouping things together.
Erosion is a little different.
Here on the left, those spurious single pixels have all but disappeared, as has a lot of
the noisy background.
So erosion is really just allowed us to focus on the core parts of the image.
If, for example, I eroded this image and then dilated it, well, I can't bring any of that
background noise back in.
We're looking at the eroded image now, if I dilated this, there's no source pixels
to seed anything for dilation.
So combining these two operations is a nice way to remove noise.
Let's have multiple erosions, and eventually we can erode the image away entirely.
Erosions and dilation, as I demonstrated very briefly, can be used for detecting and labeling
objects individually.
But in a primitive way, they can also be used for sizing objects.
If you continuously eroded the scene and looked for the finally illuminated pixel, you
could then dilate that pixel and logically end it back into the scene to highlight the
largest object.
So you could use erosion and dilation for sizing too.
Now, finally, there's one more mysterious one if I press the C key, and that is using more
logical operations to detect edges, and this gives you a really nice edge detect where
all of the edges are a single pixel.
If I bring my hand into the scene to a bit messy, let's find a nice high contrast scene.
There we go.
So that's quite nice.
We're isolating all of the buttons.
Now, you'll see the cameras adjusted in gain, not much I can do about that, but this
is different to Sibel, in that it gives you a really crisp outline edge for an object.
Bit seven, median filtering.
When working with real world sensors, things don't always go to plan.
Sometimes your image will have little tiny artefacts in it.
These are bad pixels or snow, depending on you might be in a radioactive environment for
all I know, but you end up with something which doesn't look quite right.
And these are usually quite tricky to filter out, unless you use a median filter.
Median filters are conceptually very simple.
In a similar way to convolution, for a given pixel, we look at its immediate neighbourhood.
In this example, I'm using a five by five neighbourhood.
And we're going to make an assumption in our image that we're not really expecting spatial
change to happen over a single column of pixels or a single row of pixels.
That's quite unusual in natural images.
So it tends to be that in general, information in images is rather blurred out across the
image.
So up here, where I've got this single airiness pixel, I can make an assumption that most
of my image in that region is behaving one particular way.
And this blank pixel is some form of outlier.
And we can statistically remove the outlier by looking at what the median pixel value
is across all 25 pixels in this five by five kernel.
If you don't remember what the median is, you take all of your values, and you sort
them in order, and you take the one that lies directly in the middle of that sorted set.
It's one of those quirky mathematical phenomena where it sounds really complicated mathematically,
but there's actually no maths involved at all.
It's just a sorting and an extraction.
For the median filter code, as usual, I'm iterating through all of the pixels in the image.
And I'm not trying to optimize this at all.
I want the code to be readable, so I'm going to do something really horrendous and create
a vector of the floating point values, which represents all of the pixels surrounding the
pixel I'm currently investigating over this five by five area.
And all I'm doing is extracting those pixels in my neighborhood and pushing them into
the vector.
Once I've got 25 pixels in my vector, I then sort them using standard sort.
Once I've got 25 pixels, the one in the middle, it will be at the 12th location of
my vector.
And that's what I'm choosing as my output value.
It's pretty simple, huh?
And so here in the demonstration program, I'm now running the median filter.
And you might be at first thinking it's just blurring the image, but it's not.
It kind of looks like it's painted it a little bit, so I'm sure median filtering is
used in some of these artifacts programs.
But I do have a test image that I've created here, where I've got the words median
filter written on my page, and you can see the median filter has filtered out the lines
because it sees those as anomalous.
And it's also filtered out to the dots.
But there's sufficient information left in the text to then go and extract it using
thresholding or morphological operations or something else.
So it's just a precursory phase to help you with later stages of image processing.
Once the dots get large enough that they occupy a region, in fact, even so zoomed in like
that, you really would struggle to identify the dots, but the text is just fine.
So when you get this sporadic salt and pepper noise in your image, median filter is the
thing to choose.
And finally, bit eight, locally adaptive threshold.
I started this video with bit one looking at thresholds that were applied globally across
the image.
More often than not, this is sufficient, but there are situations where you want to threshold
an image based on local information.
Using them to the same principle as the median filter, we make an assumption that, overall,
for a small region of image, there's not going to be a great deal of spatial variance.
So it's better to choose a threshold value based on the information in your locality,
at least biased towards a threshold value found from the information of your neighbors.
So I know, for example, in this region of the image, if I take the average value of my
neighborhood, then things that are statistically interesting.
Maybe a certain level above that average.
And that average might be different for different parts of the image, which means if I
used a global threshold, that threshold value may not be appropriate for different regions
of the image.
It might not be immediately obvious what locally adaptive thresholding buys you compared
to a global threshold.
But I find it really useful when you've got change in luminance across your scene.
Changing luminance is just a fancy word for shadows.
We're now getting quite familiar with the code for these algorithms.
We're going to iterate through all of the pixels, and in this case, I'm going to take
the average value of my immediate neighbors, 5 by 5.
But I'm then going to use that average as part of my thresholding calculation.
Region sum contains the average, but I'm going to bias that value with some user defined
constant.
And that constant is user configurable over the user interface before we go on to threshold
the image.
So let's take a look.
Here I've got the input and output showing just the regular threshold.
This was the one we started with bit one.
And I can try and find a value which sensibly thresholds the image, but you can see it's
a global effect.
And one of the things I wanted to show here is that the shadow cast by my hand is influencing
that threshold decision.
I'm going to press the 8 key now to choose the adaptive threshold.
Now we can see straight away the difference is that my shadow has basically become irrelevant
to the scene.
The local area around each pixel is used to guide what value we're using to threshold
it against.
And so the areas in shadow have overall lowered the threshold value and the areas of brightness
that overall have raised the threshold value.
And so we're choosing a value which is varying across the screen in order to make our
thresholding decision.
So if you wanted to make things that are shadow and luminance invariant, then you probably
do need to use some sort of locally adapted thresholding algorithm.
It's also quite a cool visual effect too.
And so that's that.
Image processing is a huge field and for me I find it to be a very interesting one too.
In this video we've had a very quick look, a very cursory introduction to some of the
most fundamental techniques you need if you wanted to start doing image processing.
As I mentioned in the introduction, you don't always have to work with images, a procedural
generation may also employ a lot of these techniques too, particularly the morphological
operations.
Anyway, that's it for now.
If you've enjoyed this video, give me a big thumbs up please.
Have a think about subscribing.
Come on over chat on the Discord server and I'll see you next time.
Take care.
Einführung in die Bildverarbeitung
Schwellwerttechnik
Grundlagen der Bewegungsdetektion
Niedrigpass-Zeitfilterung
Kernel-Faltungstechniken
Sobel-Kantenerkennung
Übersicht über morphologische Operationen
Erosion und Dilatation
Medianfiltertechniken
Lokal adaptive Schwellenwertbestimmung
Fazit und zukünftige Anwendungen
Was ist Binarisierung und wie verwandelt das Bilder?
Wie nutzt die Bewegungserkennung die Pixelveränderungen über die Zeit?
Warum ist das Tiefpass-Zeitfiltering so wichtig in der Bildbearbeitung?
Wie macht Convolution Bilder besser oder verschwommener?
Welche Bedeutung haben Kanten in der Bildverarbeitung?
Wie können Erosion und Dilatation genutzt werden, um die Bildqualität zu verbessern?
Wie geht die Medianfilterung eigentlich mit Salz-und-Pfeffer-Rauschen um? Das funktioniert echt gut!
Was ist der Vorteil von lokal anpassbarer Schwellenwertbestimmung im Vergleich zu globalem Schwellenwert?
Wie kann adaptives Thresholding die Bildklarheit in den Schatten verbessern?
In diesem Video werden wir die Grundlagen der Bildverarbeitung behandeln, einschliesslich wie Bilder als 2D-Datenarrays dargestellt werden und wie verschiedene Algorithmen angewendet werden können, um diese Daten zu manipulieren. Wir werden die Nützlichkeit der Bildverarbeitung in der Programmierung und ihre Anwendungen in Bereichen wie Videoaufnahmen und prozeduraler Generierung erkunden. Das Video wird die 8-Bit-Bildverarbeitungstechniken anhand eines Live-Feeds des Arms des Autors, der herumwinkt, demonstrieren und die interaktive Natur des Videos zum Lernen hervorheben. Die behandelten Themen umfassen Schwellenwertbestimmung, Manipulation des Live-Feeds und mehr. Am Ende des Videos wirst du ein solides Verständnis der Grundlagen der Bildverarbeitung und ihrer potenziellen Anwendungen in der Programmierung haben.