Learniverse

Grundlagen der Bildverarbeitung

00:00

Hello, it's been a while since I've done one of these videos, but this video is 8 bits

00:05

of image processing you should know.

00:07

And you might be thinking, well why do I need to know anything about image processing?

00:12

Well, images are just 2D arrays of data, and the algorithms that we apply to this data

00:17

can shape it in useful ways.

00:20

Obviously, some of the applications involve images and cameras and video footage.

00:25

But there's also other ways of manipulating 2D data to your advantage, for example,

00:30

in things like procedural generation.

00:32

On the whole, I think most programmers should have an awareness of image processing.

00:35

It is a very useful tool to have in your toolbox.

00:38

So let's get started.

00:40

Before I start, I'm going to show you the video that I've created to demonstrate the

00:44

8 bits, and it's quite nice because it allows us to quickly compare the algorithms.

00:49

So here it's going to show the first bit which will be thresholding and we can choose

00:52

different numbers to look at the different algorithms and see what their effects are.

00:59

Because it's working with video, you can see here's a live feed of my arm waving around.

01:05

I think it makes quite a nice interactive tool, which is great for learning.

01:09

But this video is going to be a bit different to some of my others, and not going to

01:12

go through it line by line from scratch.

01:15

I've already created the code, and what I really want to emphasize is what is the image

01:19

processing that is going on and how does it work.

01:22

Bit one.

01:23

This is the process of binarizing an image.

01:26

Here I have an image, and I'm going to assume it's a grey scale image, so the pixels

01:31

go from black to white.

01:33

Freshholding involves taking an individual pixel and classifying it as being above or below

01:38

the threshold.

01:39

If it's above the threshold, you output one.

01:41

If it's below the threshold, you output zero.

01:44

This green line represents a single row in our image.

01:47

If I take this row and plot its exposition against the brightness of that pixel, I might

01:53

get something that looks like this.

01:58

Freshholding involves specifying an appropriate value to act as a cut-off.

02:03

So any pixels above that value will get classified as one, and any below it will get classified

02:08

as zero.

02:09

The red dashed line represents my threshold value.

02:12

So now, with my blue pen, I can indicate what the binary equivalent of this might be.

02:17

So it starts down here at zero, but that goes above the threshold to one.

02:22

Below the threshold, above the threshold.

02:26

And we've binarized our image.

02:29

To demonstrate these programs, I'm using a pixel game engine application that I've already

02:34

created.

02:35

And I feel it's necessary to give you a brief overview of what this application is before

02:40

we get stuck into the algorithm code, just so it makes some sort of sense.

02:44

Fundamentally, it's based on the idea of a frame, which is a fixed 2D array of pixels

02:49

in this case 320 by 240.

02:52

The pixels are floating point type.

02:55

So instead of working with RGB values, I'm taking the RGB from the camera and converting

03:01

it to a floating point value between zero and one.

03:04

By converting to the floating point domain from the integer domain, allows me to avoid

03:08

complexities such as integer division.

03:11

This simple frame class has some accesses, get and set, which will do boundary checks

03:15

for me.

03:16

So I can quite happily set a pixel's value beyond the edges of the image.

03:20

And if I get something from beyond the image, it just returns a black pixel.

03:24

So zero is black and white is one.

03:27

My frame class also overrides the assignment operator.

03:31

So I can have multiple frames in my application.

03:34

And I can transfer the contents of one frame to the other with ease.

03:37

For this video, I'm not going to dwell on the image capture side of things.

03:41

I've already done that in other videos.

03:43

And it's enough to say that we simply use the s scappy library to capture a frame from

03:48

a webcam.

03:49

So in on user create, the webcam is initialized.

03:53

And in on user update, I capture the image from the webcam per frame and convert the

03:58

pixels to floating point and store it in a frame called input.

04:03

This program shows eight different algorithms.

04:06

And so the bulk of the code shown here handles the selection of which algorithm is currently

04:11

being demonstrated.

04:12

And the algorithms also have a degree of user input, which allows the user to change the

04:16

values to play with the algorithm and see how they respond under different circumstances.

04:21

For example, when the user presses the one key on the keyboard, it changes the current

04:25

algorithm being demonstrated to threshold.

04:28

So let's continue looking at that algorithm.

04:31

Here it is.

04:32

And you'll see this on most of the algorithms.

04:34

We do a little bit of user input if there are values to change.

04:37

And then we actually perform the algorithm under demonstration.

04:41

And thresholding is very simple.

04:43

For all of the pixels in the frame, we read the input value of a pixel for that location,

04:49

compare it with a threshold value, which will give us a one or a zero in response.

04:54

And then we write that to an output frame.

04:57

At the end of the program, I then draw the input and output frames.

05:01

Hopefully you can see thresholding is very simple indeed.

05:04

So let's take a look at it.

05:06

This is thresholding.

05:07

Now my webcam has some automatic gain correction, which is what you saw then, as the

05:12

image sort of changed and faded.

05:14

I can't override those settings using the API for the camera.

05:18

But for this video, it doesn't really matter.

05:20

I'm in threshold mode now, and we can see the input image here on the left.

05:25

It's in gray scale, but the output image here on the right is in black and white.

05:29

It's been binarized.

05:31

It says here, I can use the Z and X keys to change the value of the threshold.

05:35

So currently it's at 0.5.

05:37

It's halfway between the minimum and maximum intensities for the gray scale.

05:42

As I increase the threshold value, we see less pixels being attributed to a binary one.

05:49

And as I decrease it, we see the opposite.

05:52

Fresh holding is essentially the corsister filters.

05:55

And it's usually the first step in removing as much rubbish from an image as you can.

06:00

For example here, you can see on the notebook, the text one lone coder comes through quite

06:04

clearly, but the lines and the slight grayness of it doesn't.

06:10

So if we were then to go on and extract this text, for example, it's much easier now

06:14

we're not contaminated with this spatial background noise with thresholded it out.

06:19

Bit two, motion.

06:21

And for this video, I'm assuming the simplest kind of motion.

06:24

We won't be able to get any direction information from this.

06:28

The word motion implies that something has moved.

06:32

And for something to move takes time.

06:35

So to detect motion in an image, we need to allow time to have elapsed.

06:39

Fortunately, with video, this is quite simple because a video camera returns successive

06:45

frames in time, which means we have a built in delta time between each frame.

06:51

Alongside movement in time, motion also implies movement in space.

06:56

The object was in one location and now it's in another.

06:59

But for this bit, let's not think of objects as being the things we're looking at instead

07:03

we're looking at pixel gray scale values.

07:06

So over time, if something is moving in the image, a particular pixel value is also changing.

07:13

So we can identify that motion has occurred.

07:17

By looking at the difference of pixel values between successive frames of video input.

07:23

And so on this graph, we can see that the difference between A and B is related to

07:28

the change in that gray scale value.

07:30

The end result of this could be signed and in some applications, that's a useful thing.

07:35

It gives you additional information.

07:37

But for our application, I'm just going to take the absolute value of this.

07:41

To tell us that motion for that pixel is likely to have occurred.

07:45

The code for motion detection is equally as simple as thresholding.

07:49

I'm going to go through every single pixel in my frames with these nested for loops.

07:53

And I'm going to look at the difference between the current frame and the previous frame.

07:59

By subtracting them.

08:00

I'm them taking the absolute value of that result and setting that in the corresponding

08:05

location to the output frame.

08:07

I then draw the input and output frames.

08:10

I update the previous input frame before I acquire a new image in the input frame.

08:15

Here's the algorithm running and it's looking at a reasonably static scene.

08:19

But as soon as things start to move, I bring my hand into the scene.

08:24

We're looking at the difference between one frame and the previous frame.

08:28

But we only see illumination in the output, where there has been change.

08:33

So that signifies that motion has occurred in those locations.

08:36

Because the frame rate of my camera is reasonably quick, it's about 20 frames per second,

08:41

I get what looks like an edge around the object that's moving.

08:45

But don't be fooled by this, it's not strictly an edge, although you can use it as an

08:49

edge.

08:50

It is just the difference between the two frames.

08:53

Motion detection like this is usually a foundation algorithm.

08:56

It is used to guide your decisions in subsequent algorithms that you apply to the image.

09:01

For example, I might want a system to shut down if nothing in the scene is moving.

09:06

I mean why bother taking more images if nothing has changed.

09:09

So I could detect that by accumulating the sum of all of the pixels in the output image

09:14

and then checking that against a threshold value.

09:18

To tell me has there been enough motion in the image for the system to switch on.

09:23

Bit three, low pass temporal filtering.

09:26

As we've just seen in bit two, the value of a pixel changes over time.

09:30

And if we look over a longer period of time, we might see that the pixels change values

09:35

quite rapidly.

09:37

Between frames, this is called noise, because sensors aren't perfect.

09:42

Using conditions, electronics and all sorts of things can influence the value of a pixel.

09:47

This noise can cause us problems, because what we actually want to see is the real value

09:52

of the pixel change over time, which is indicated by this green line.

09:57

We can approximate that that's somewhere in between all of these noise values.

10:01

A noise can become a problem if you do things such as thresholding, because the noise

10:05

might just tip you above or below the threshold inappropriately.

10:09

We effectively want to run the grayscale value of the pixel through a low pass temporal

10:14

filter.

10:15

So the low frequency component of the pixel is allowed through and the high frequency components

10:20

are removed.

10:21

We can approximate this with a very simple equation.

10:24

For a given pixel value, p, we're going to update that pixel value by looking at the

10:30

difference between the input pixel value and the current pixel value and multiplying

10:36

that by a constant.

10:38

And, ultimately, if this distance is small, then the change in our output pixel is small,

10:44

and if it's large, then the change in our output pixel is large.

10:48

But we can regulate that change with this constant.

10:52

In engineering, this is also known as an RC filter, and its implementation is very simple.

10:58

In the low pass section of the program, I'm doing some user input so I can change the

11:01

value of this temporal constant.

11:04

And then I iterate through all of the pixels in a frame.

11:07

I look at the difference between the input and the output.

11:11

I scale the difference with our temporal coefficient.

11:14

And then I accumulate that difference back into the output frame with this plus symbol.

11:19

For this algorithm, the output frame is persistent between updates of the video camera

11:24

feed, meaning that output pixels are only changed by a small amount depending on how large

11:30

the change was of the input.

11:33

So here in the program, I'm now running bit three, the low pass temporal filter.

11:37

And the two images look very similar.

11:39

It might not even be that possible to see on the YouTube video, but the input image on

11:43

the left actually has quite a lot of per pixel noise.

11:47

But the output image on the right has no temporal noise visible to the naked eye.

11:52

If I move my hand into the scene, this is a particularly slow filter.

11:57

So I can make rapid changes, sort of by wiggling my fingers around here, but we can see

12:02

that the output image doesn't change very much, is ignoring those fast changes, only

12:08

allowing the really slow changes if I leave my hand in a fixed position, eventually

12:13

it feeds into the image.

12:15

So this is exaggerated in a way.

12:17

I can use the Z and X keys to change the value of this constant, so I can make it very

12:22

slow indeed.

12:27

Which might not immediately seem a useful thing to do, but if you wanted to do some background

12:32

subtraction algorithms over moving images, this is quite a nice way to do it.

12:37

You can accumulate the background of an image over time and then use that as a way to

12:43

isolate things in the foreground.

12:45

If I increase the value of the constant, it becomes far more live, let's keep going a bit.

12:51

Until the two images look very similar indeed, but if you get it to high this constant,

12:56

you'll start seeing the per pixel camera noise coming back into the output image.

13:01

So low pass tempo filtering is a great way to filter noise, and it also looks all

13:06

ghostly and cool, bit for convolution, whereas the previous two bits have looked at filtering

13:11

things in the time domain, convolution looks at filtering things in the spatial domain.

13:16

Fundamentally, we're going to decide what to do with a pixel, by looking at its neighborhood.

13:22

For this example, I'm going to look at the immediate three by three neighborhood of our target

13:27

pixel, and this neighborhood is called a kernel.

13:30

And you can think of a kernel as a template of coefficients that are used in a dot product

13:36

of the neighboring pixels in that region, and values of the kernel to give us a result

13:41

for the central pixel.

13:43

So my kernel might be defined here as a three by three matrix of values.

13:49

These values are overlaid over the corresponding pixel in that location, and we also include

13:55

the central value, which is the target pixel.

13:58

I can give these kernel values location information to identify the relationship to the

14:04

target pixel.

14:08

I work out my final pixel value by performing the dot product between a kernel coefficient

14:14

and the gray scale value of the pixel at that location.

14:17

So this component, for example, is this component of the kernel multiplied by this pixel

14:23

value, and we go on to go through all of the kernel locations.

14:32

And so what effect do you think this kernel might have then?

14:35

Well, we can see it as being regions of influence, we're strongly influenced by the

14:39

target pixel, the one in the middle.

14:41

But we're also a little bit influenced by our immediate north, south, east, and west neighbors.

14:49

Finally in this kernel, all of these values add up to one, and this is quite deliberate.

14:54

So we take the bulk of our pixels value from what it already is.

14:58

But then we take a little bit from its neighbors, but we still land within the approximate

15:03

range for that pixel.

15:05

This will give us the effect of blurring the image, because we go on to apply this kernel

15:09

for every single pixel in the image.

15:12

I've implemented convolution in a really naive way.

15:16

Here I'm going through every pixel in the frame.

15:19

And for each location, I'm accumulating into an F-sum variable.

15:24

My kernels are 3 by 3.

15:27

But I want to get the corresponding offset location for a kernel coefficient in my image.

15:33

So I've got an additional two nested four loops, which iterate through the values in my

15:39

kernel.

15:40

The kernel we've just created is a blurring kernel, and I'm representing that as just

15:44

an array of nine floating point values.

15:47

I index into the appropriate location of that kernel using some simple 2D to 1D arithmetic.

15:53

Once I've got the right coefficient, I'm multiplied by the input grayscale value, and

15:58

accumulate it into my F-sum variable.

16:01

I then set my output pixel for that location to the F-sum variable.

16:05

In this demonstration, I've included two kernels, blurring, and sharpening.

16:10

But the coefficients of the kernels are quite different.

16:13

There's a little bit of user input at the top to choose which kernel we're going to apply

16:17

to the image.

16:18

In the demonstration program, I've chosen bit 4 for convolution, and it's currently running

16:22

the blurring kernel.

16:24

And we can see that the input image on the left is sharper than the output image on the

16:28

right.

16:29

This blurring only occurs once, over a 3 by 3 neighborhood.

16:33

So it's a very delicate blur.

16:35

However, if I press the X key, I can change to the sharpening kernel.

16:40

And we can see that this kernel has the effect of enhancing any areas of high contrast

16:44

in the image.

16:47

You may have seen these filters in popular art programs.

16:50

The downside to sharpening, of course, is it also sharpens all of the noise, so we may want

16:54

to combine convolution with some of the previous filtering algorithms we've already seen.

17:01

In this convolution example, I've used a very small kernel 3 by 3, so it can only look

17:06

at its immediate neighbors.

17:08

For blurring, for example, if we wanted to have more of a blurring image, there's two

17:12

ways to go about it.

17:13

The first is we could repeatedly blur the image, so once we've blurred it once, we then

17:17

use that as the input and blur it again, and again and again and again until we've got

17:22

the desired level of blur.

17:24

The second approach is to use a much larger kernel, so we can go 5 by 5, 7 by 7, 11 by

17:29

11, whatever you want.

17:31

But the kernels and the convolution I've shown here are four levels of nested for loops.

17:37

It will explode computationally, and become very slow and difficult to get any kind

17:41

of real-time performance.

17:43

If you're serious about doing convolutions and most image processing techniques and

17:47

programmers are, then you'll want to do your convolutions in the Fourier domain, the

17:52

frequency domain instead, where you take the fast Fourier transform of your input image,

17:57

and your fast Fourier transform of your kernel, combine them, and then take the inverse

18:02

Fourier transform of the result, and this will allow you to get very large kernels

18:07

with a fixed computational overhead, a far more soozle approach for real-time image processing.

18:14

Bit 5, Sibel Edge Detection, edges in images indicate where the information of an image

18:21

is, simplistically because edges indicate where spatial change has occurred.

18:27

In this example image, we have background, and we have a box.

18:32

The background doesn't change locally, and the box doesn't change locally.

18:37

So there's no relevant information in either of those zones, so one could argue that

18:42

the box is really only defined by how it is different to the background, and this difference

18:48

lies along the edge of the box.

18:50

So detecting edges is quite an important step, and perhaps the most classical way to detect

18:55

edges is using Sibel's Edge Detection, which is a pair of kernels used in a convolution.

19:02

The two kernels detect edges in the two main directions.

19:06

We've got horizontal, the kernel looks something like this.

19:11

And we've got vertical.

19:14

If we convolve the image with the horizontal kernel, we will see the horizontal edges,

19:19

and then we'll convolve the image with the vertical kernel, and we'll see the vertical

19:22

edges.

19:23

We'll then combine the horizontal edges and the vertical edges into a single output image

19:28

to show us all edges.

19:31

As before with the motion detection, the sine bit of the result of these convolutions

19:35

can contain useful and interesting information.

19:39

But I'm going to throw it away by taking the absolute value of the results of these convolutions.

19:44

The Sibel part of the code is exactly the same as the convolution part of the code before,

19:49

except I'm doing the two things at once, I'm maintaining two summation variables instead

19:53

of the one.

19:54

And when I'm writing the output summation, I'm taking the average of the sum for the vertical

19:59

and sum for horizontal components.

20:01

I've defined the kernels for Sibel in exactly the same way as I did before.

20:05

They're just arrays of floating point values.

20:08

Here is the demonstration program running the Sibel edge detection algorithm.

20:12

And as we can see, edges are illuminated, and boring surfaces are those that have low frequency

20:18

spatial information that remain black.

20:20

The nice thing about Sibel is it works on this gray scale input, and you can see it starts

20:25

to highlight all of the areas of high frequency information.

20:28

So look at my really hurry arm that it's quite a visible thing, so that's really indicated

20:33

texture.

20:35

Bit six, morphological operations.

20:40

Even though that sounds like a mouthful, morphological operations are really a study of

20:44

how things look spatially in the image.

20:49

We want to do things regarding the shape of objects in the image.

20:53

And we do this by working in the binary domain.

20:56

So we must threshold the image first to binarize our pixels to zero one values.

21:02

And for this bit, it's really split into three different bits.

21:06

The first I'm going to look at is called erosion.

21:10

In this simple demonstration image, I'm assuming that the background is zero, and my object

21:15

is one.

21:16

Erosion is the effect of eroding a single pixel of all edges of our object.

21:26

Simply shrinking it.

21:28

And it's useful to remove airiness, spurious pixels from other stages of image processing.

21:35

Because if I had a single pixel like this, or a cluster of a single pixels in a line, and

21:40

I eroded it, then if we're removing a one pixel from all edges, it's going to disappear

21:45

entirely.

21:47

Whereas larger objects, the morphology remains intact.

21:50

The opposite to erosion is dilation.

21:54

And it is quite literally the opposite.

21:56

This time we grow a one pixel boundary around our shape.

22:03

So just going back to the previous example, if we had some spurious small information, and

22:09

we first erode it to remove it.

22:13

And then we dilate the image again.

22:15

There's nothing here to actually dilate, but our original shape will go back to something

22:20

very similar to how it was originally.

22:23

This is a nice way of removing spatial noise from an image.

22:27

In many ways, implementing morphological operations is very similar to convolutions, but

22:33

this time we use logic instead of dot products.

22:37

Looking at erosion, we use a very similar 3x3 kernel to how we use the convolutions, but

22:43

this time my kernel is just going to be a 3x3 matrix of logic ones.

22:49

For every pixel in my source image, I overlay my morphological operation kernel for that

22:55

pixel.

22:57

So let's for example say, I put it here, centered around this pixel.

23:03

I then do a logic and with all of the elements in the kernel and all of the elements in

23:08

the image.

23:10

In this case, we can see that one and zero here is, well, zero, and this one's going

23:15

to be zero, this one's going to be zero, and once we've ended all of those together,

23:20

the end result is going to be zero, because we've got some zeros in there.

23:23

So I will write to my image zero, however, when I get round to operating on this section

23:30

of the image, the result is a logic one, because all of the kernel coefficients and all

23:35

of the surrounding neighborhood pixels are going to end together to give me a one.

23:40

This pixel that was on its own has been eroded, but this pixel that is robustly supported

23:45

by its neighborhood stays intact.

23:49

Instead of just doing logic and we could also do some simple arithmetic, we could count

23:53

how many of our neighbors are equal to one and come to the same conclusion.

23:59

So in this case, I've got eight neighbors all equal to one, so my current value pixel

24:04

is a one, but in the original scenario, only three of my neighbors were one.

24:09

So I could then look for less than eight neighbors, and in that condition, I set myself

24:14

to zero.

24:16

Now the interesting thing with erosion is that one man's erosion is another man's

24:20

dilation.

24:21

If I inverted all of these bits and applied the same kernel, I would have the effect

24:25

of dilating the image.

24:28

But I can also dilate in a second way without requiring this inversion, and this one is

24:32

quite simple.

24:33

If a given pixel exists, then set its neighbors to exist as well.

24:40

And this would have the effect of growing our object by one pixel along all edges.

24:46

Before I get stuck into the code on this, I just want to show visually another useful

24:49

thing for dilation, that if I can identify a particular point on an image, let's say here,

24:56

and I repeatedly dilate, I can effectively flood fill in the image.

25:01

Now there's an important condition here, is that after every dilation, if we logically

25:06

and that with the original image, we'll never go beyond the boundaries of the original

25:11

image.

25:12

So this allows me to fill a secondary image with just the space occupied by a single shape

25:18

in binary space of the input image.

25:20

And this is a great way for doing image segmentation, the extraction of objects or labeling

25:25

image parts.

25:27

In fact, I find this to be really interesting, and I think it's worthy of a video on

25:32

its own right.

25:33

I've got some interesting examples of how we can demonstrate that.

25:36

So expect in the near future of follow-up video to this, just showing morphological operations.

25:42

In the code, the morphological section is the largest of them all.

25:46

It's got a great degree of user input, one to select, which particular dilation or erosion

25:50

operation we're going to do.

25:51

There's also a third one called edge, which I'll talk about when we get to the code.

25:55

But I can also control the number of times that I perform the dilation over erosion.

26:00

So we'll be able to see those effects visually.

26:02

But the first thing and the most important thing is we need to take our image from the

26:06

gray-scale domain into the binary domain.

26:08

So I'm just thresholding it.

26:10

And I'm going to threshold at using the value that we've specified in the threshold algorithm.

26:14

So this will allow you to tweak the threshold first.

26:17

Then we go on to choose dilation, erosion, or this third one, which I'm calling edge,

26:21

because it's going to be an edge detect using morphological operations.

26:24

But dilation, it was simply a case of seeing if given pixel value is equal to one,

26:29

then set all of your neighbors to one, too.

26:32

Easy enough.

26:33

But I use this third register activity, because I don't want to alter the frame as I'm

26:40

going through the pixels.

26:42

It's important that the frames are treated homogenously.

26:46

Through erosion, rather than doing a set of logical operations, I decided to go with the

26:50

summation root.

26:51

So I'm looking at all of the values of my neighbors, and all of the ones that are one,

26:55

I'm summing together.

26:56

Now, I know that in my activity frame they're going to be ones or zeros anyway.

27:00

We've binaryized the image.

27:01

So I'm just going to add them.

27:03

And I'm going to check to see if my activity is one.

27:06

And if it is, but not all of my neighbors are set to one, then I'm going to set myself

27:11

to zero, put myself out.

27:13

This third one, which I've not drawn up in one note, I've called edge, and it's exactly

27:17

the same as erosion.

27:20

Instead of looking for a less than eight neighbors, I'm looking for precisely eight neighbors.

27:25

In the situation that I am an illuminated pixel, and I have all of my neighbors, I'm going

27:31

to extinguish myself.

27:33

So let's take a look.

27:34

The program is started up with the thresholded image.

27:37

And this is just so I can tune the threshold value appropriately.

27:40

I can still tune it later on, but it's just a bit easier to do whilst looking at the

27:45

greyscale image.

27:46

Then I'm going to choose bit six more binary morphological operations.

27:50

Now the current operation is dilation.

27:53

And it looks like there's just a single iteration being applied.

27:56

So on the left, where these individual pixels are occurring, you can see they're expanding

28:01

into a three by three region.

28:04

This has had the effect of removing the one lone code of text entirely from the book object.

28:09

So this might be a precursory stage in being able to extract where is the book this time,

28:14

if that's useful.

28:15

I can increase the number of dilation with the A and S keys.

28:19

And as you might expect, the regions just grow.

28:22

So dilation is a way of grouping things together.

28:26

Erosion is a little different.

28:28

Here on the left, those spurious single pixels have all but disappeared, as has a lot of

28:32

the noisy background.

28:33

So erosion is really just allowed us to focus on the core parts of the image.

28:38

If, for example, I eroded this image and then dilated it, well, I can't bring any of that

28:43

background noise back in.

28:44

We're looking at the eroded image now, if I dilated this, there's no source pixels

28:49

to seed anything for dilation.

28:51

So combining these two operations is a nice way to remove noise.

28:55

Let's have multiple erosions, and eventually we can erode the image away entirely.

29:04

Erosions and dilation, as I demonstrated very briefly, can be used for detecting and labeling

29:08

objects individually.

29:10

But in a primitive way, they can also be used for sizing objects.

29:15

If you continuously eroded the scene and looked for the finally illuminated pixel, you

29:20

could then dilate that pixel and logically end it back into the scene to highlight the

29:26

largest object.

29:27

So you could use erosion and dilation for sizing too.

29:30

Now, finally, there's one more mysterious one if I press the C key, and that is using more

29:35

logical operations to detect edges, and this gives you a really nice edge detect where

29:40

all of the edges are a single pixel.

29:42

If I bring my hand into the scene to a bit messy, let's find a nice high contrast scene.

29:47

There we go.

29:48

So that's quite nice.

29:49

We're isolating all of the buttons.

29:50

Now, you'll see the cameras adjusted in gain, not much I can do about that, but this

29:55

is different to Sibel, in that it gives you a really crisp outline edge for an object.

30:01

Bit seven, median filtering.

30:04

When working with real world sensors, things don't always go to plan.

30:09

Sometimes your image will have little tiny artefacts in it.

30:13

These are bad pixels or snow, depending on you might be in a radioactive environment for

30:18

all I know, but you end up with something which doesn't look quite right.

30:22

And these are usually quite tricky to filter out, unless you use a median filter.

30:27

Median filters are conceptually very simple.

30:30

In a similar way to convolution, for a given pixel, we look at its immediate neighbourhood.

30:39

In this example, I'm using a five by five neighbourhood.

30:43

And we're going to make an assumption in our image that we're not really expecting spatial

30:48

change to happen over a single column of pixels or a single row of pixels.

30:53

That's quite unusual in natural images.

30:55

So it tends to be that in general, information in images is rather blurred out across the

31:01

image.

31:02

So up here, where I've got this single airiness pixel, I can make an assumption that most

31:08

of my image in that region is behaving one particular way.

31:12

And this blank pixel is some form of outlier.

31:16

And we can statistically remove the outlier by looking at what the median pixel value

31:20

is across all 25 pixels in this five by five kernel.

31:25

If you don't remember what the median is, you take all of your values, and you sort

31:29

them in order, and you take the one that lies directly in the middle of that sorted set.

31:35

It's one of those quirky mathematical phenomena where it sounds really complicated mathematically,

31:40

but there's actually no maths involved at all.

31:43

It's just a sorting and an extraction.

31:45

For the median filter code, as usual, I'm iterating through all of the pixels in the image.

31:50

And I'm not trying to optimize this at all.

31:53

I want the code to be readable, so I'm going to do something really horrendous and create

31:57

a vector of the floating point values, which represents all of the pixels surrounding the

32:02

pixel I'm currently investigating over this five by five area.

32:06

And all I'm doing is extracting those pixels in my neighborhood and pushing them into

32:11

the vector.

32:12

Once I've got 25 pixels in my vector, I then sort them using standard sort.

32:17

Once I've got 25 pixels, the one in the middle, it will be at the 12th location of

32:22

my vector.

32:23

And that's what I'm choosing as my output value.

32:26

It's pretty simple, huh?

32:27

And so here in the demonstration program, I'm now running the median filter.

32:31

And you might be at first thinking it's just blurring the image, but it's not.

32:35

It kind of looks like it's painted it a little bit, so I'm sure median filtering is

32:39

used in some of these artifacts programs.

32:41

But I do have a test image that I've created here, where I've got the words median

32:46

filter written on my page, and you can see the median filter has filtered out the lines

32:50

because it sees those as anomalous.

32:52

And it's also filtered out to the dots.

32:55

But there's sufficient information left in the text to then go and extract it using

32:59

thresholding or morphological operations or something else.

33:03

So it's just a precursory phase to help you with later stages of image processing.

33:08

Once the dots get large enough that they occupy a region, in fact, even so zoomed in like

33:13

that, you really would struggle to identify the dots, but the text is just fine.

33:18

So when you get this sporadic salt and pepper noise in your image, median filter is the

33:24

thing to choose.

33:25

And finally, bit eight, locally adaptive threshold.

33:29

I started this video with bit one looking at thresholds that were applied globally across

33:34

the image.

33:35

More often than not, this is sufficient, but there are situations where you want to threshold

33:39

an image based on local information.

33:42

Using them to the same principle as the median filter, we make an assumption that, overall,

33:47

for a small region of image, there's not going to be a great deal of spatial variance.

33:52

So it's better to choose a threshold value based on the information in your locality,

33:57

at least biased towards a threshold value found from the information of your neighbors.

34:03

So I know, for example, in this region of the image, if I take the average value of my

34:07

neighborhood, then things that are statistically interesting.

34:11

Maybe a certain level above that average.

34:14

And that average might be different for different parts of the image, which means if I

34:18

used a global threshold, that threshold value may not be appropriate for different regions

34:23

of the image.

34:24

It might not be immediately obvious what locally adaptive thresholding buys you compared

34:28

to a global threshold.

34:30

But I find it really useful when you've got change in luminance across your scene.

34:36

Changing luminance is just a fancy word for shadows.

34:39

We're now getting quite familiar with the code for these algorithms.

34:41

We're going to iterate through all of the pixels, and in this case, I'm going to take

34:44

the average value of my immediate neighbors, 5 by 5.

34:49

But I'm then going to use that average as part of my thresholding calculation.

34:55

Region sum contains the average, but I'm going to bias that value with some user defined

35:00

constant.

35:01

And that constant is user configurable over the user interface before we go on to threshold

35:06

the image.

35:07

So let's take a look.

35:09

Here I've got the input and output showing just the regular threshold.

35:13

This was the one we started with bit one.

35:15

And I can try and find a value which sensibly thresholds the image, but you can see it's

35:19

a global effect.

35:21

And one of the things I wanted to show here is that the shadow cast by my hand is influencing

35:26

that threshold decision.

35:28

I'm going to press the 8 key now to choose the adaptive threshold.

35:32

Now we can see straight away the difference is that my shadow has basically become irrelevant

35:37

to the scene.

35:38

The local area around each pixel is used to guide what value we're using to threshold

35:44

it against.

35:45

And so the areas in shadow have overall lowered the threshold value and the areas of brightness

35:51

that overall have raised the threshold value.

35:53

And so we're choosing a value which is varying across the screen in order to make our

35:58

thresholding decision.

35:59

So if you wanted to make things that are shadow and luminance invariant, then you probably

36:05

do need to use some sort of locally adapted thresholding algorithm.

36:08

It's also quite a cool visual effect too.

36:11

And so that's that.

36:13

Image processing is a huge field and for me I find it to be a very interesting one too.

36:18

In this video we've had a very quick look, a very cursory introduction to some of the

36:22

most fundamental techniques you need if you wanted to start doing image processing.

36:27

As I mentioned in the introduction, you don't always have to work with images, a procedural

36:31

generation may also employ a lot of these techniques too, particularly the morphological

36:35

operations.

36:36

Anyway, that's it for now.

36:38

If you've enjoyed this video, give me a big thumbs up please.

36:40

Have a think about subscribing.

36:42

Come on over chat on the Discord server and I'll see you next time.

36:45

Take care.

00:00

Einführung in die Bildverarbeitung

06:20

Schwellwerttechnik

06:20

Grundlagen der Bewegungsdetektion

09:25

Niedrigpass-Zeitfilterung

15:53

Kernel-Faltungstechniken

18:14

Sobel-Kantenerkennung

20:38

Übersicht über morphologische Operationen

21:51

Erosion und Dilatation

32:30

Medianfiltertechniken

33:25

Lokal adaptive Schwellenwertbestimmung

36:20

Fazit und zukünftige Anwendungen

01:23

Was ist Binarisierung und wie verwandelt das Bilder?

06:21

Wie nutzt die Bewegungserkennung die Pixelveränderungen über die Zeit?

09:25

Warum ist das Tiefpass-Zeitfiltering so wichtig in der Bildbearbeitung?

16:05

Wie macht Convolution Bilder besser oder verschwommener?

18:14

Welche Bedeutung haben Kanten in der Bildverarbeitung?

21:50

Wie können Erosion und Dilatation genutzt werden, um die Bildqualität zu verbessern?

33:18

Wie geht die Medianfilterung eigentlich mit Salz-und-Pfeffer-Rauschen um? Das funktioniert echt gut!

34:24

Was ist der Vorteil von lokal anpassbarer Schwellenwertbestimmung im Vergleich zu globalem Schwellenwert?

36:10

Wie kann adaptives Thresholding die Bildklarheit in den Schatten verbessern?


Visualisierung von Daten und InformationenAlgorithmusComputerwissenschaftenAnalyse der DatenDigitale BildverarbeitungMathematische Morphologie

Beschreibung

In diesem Video werden wir die Grundlagen der Bildverarbeitung behandeln, einschliesslich wie Bilder als 2D-Datenarrays dargestellt werden und wie verschiedene Algorithmen angewendet werden können, um diese Daten zu manipulieren. Wir werden die Nützlichkeit der Bildverarbeitung in der Programmierung und ihre Anwendungen in Bereichen wie Videoaufnahmen und prozeduraler Generierung erkunden. Das Video wird die 8-Bit-Bildverarbeitungstechniken anhand eines Live-Feeds des Arms des Autors, der herumwinkt, demonstrieren und die interaktive Natur des Videos zum Lernen hervorheben. Die behandelten Themen umfassen Schwellenwertbestimmung, Manipulation des Live-Feeds und mehr. Am Ende des Videos wirst du ein solides Verständnis der Grundlagen der Bildverarbeitung und ihrer potenziellen Anwendungen in der Programmierung haben.