LOADING ...

Teaching convolutional neural networks to give me friends

756K+ views   |   21K+ likes   |   502 dislikes   |  
May 03, 2017

Thumbs

Teaching convolutional neural networks to give me friends
Teaching convolutional neural networks to give me friends thumb Teaching convolutional neural networks to give me friends thumb Teaching convolutional neural networks to give me friends thumb

Transcription

  • Hey guys, so this is—no joke—a list of
  • all my real life friends, but the problem
  • is just that- they're kind of ugly.
  • [ Illuminati Music ]
  • Since I have such ugly friends
  • and since computers are so darn powerful
  • why don't I just have my computer
  • generate prettier friends?
  • That means the goal is to have my
  • computer automatically generate a wide
  • range of human face images
  • without any human work required.
  • One option is to load up a videogame like The Sims
  • or Nintendo Miis and randomize the settings
  • of their avatar creators.
  • But that's not "ma-chine learn-y" enough and
  • I know-- I just know! that machine learning is
  • what you guys want. So let's take a look at
  • convolutional neural networks
  • some of you viewers will already know what this
  • is but I want my videos to be as
  • beginner-friendly as possible so I'll
  • assume you know nothing.
  • Say you have an image
  • meaning a two-dimensional array of
  • pixels that are all either black or white.
  • You want to find out where all the
  • donut shapes are. How would you go about
  • doing that? Well, let's make a donut filter
  • it will specify the requirements for
  • something to be a donut.
  • Then we'll center our filter around the upper-left
  • pixel and ask:
  • “Is every condition of the filter satisfied?”
  • No?! Then it's NOT a donut.
  • We can move the filter over each
  • pixel, asking the same question, most
  • pixels like this one will say, "no donut"
  • since not every condition is satisfied...
  • ...but a select few will say, "yes donut."
  • After that's all said and done, we now
  • have markers at the center of where all
  • the donuts are. Voila! Our goal is complete!
  • However, most images aren't as
  • simple as black or white. For most images
  • the pixels brightness exists on a
  • spectrum from 0 to 1 so it could be 0.5
  • or 0.1 (ignore color for now) so when
  • we're searching for donuts we can't use
  • a filter that's so simplistic it only
  • asks for yes or no questions. Rather than asking:
  • ”is this a donut, or is this not a donut?”
  • Our new improved filter, should
  • instead ask the question:
  • ”How donutty is this pixel?“
  • On a continuous scale from
  • negative infinity to positive infinity
  • with higher values meaning this is more
  • like a doughnut and lower values meaning
  • this is less like a donut. How can we
  • engineer a filter that does this?
  • Well, let's imagine the filter is a set of
  • multipliers—like this—some multipliers
  • are higher than others
  • some are positive and some are negative
  • but let's see what they do.
  • We can center the filter around a single pixel and
  • then we can multiply those underlying
  • image pixel values by those multipliers,
  • add up all those products, and we get an
  • overall score of how "donutty" that pixel is.
  • You can think of the positive
  • multipliers as if they're saying:
  • ”If you want to be considered donutty,
  • you'd better have a high value for this pixel.”
  • And the negative multipliers like they're saying:
  • ”Ooh donutty pixels
  • don't typically have high values here.”
  • In the end we can apply this
  • continuous "donutty" filter to every pixel of the image.
  • So this pixel with a score of 3.64 is
  • the most "donutty." Which makes sense
  • because it's a dark pixel surrounded
  • by quite a few light pixels.
  • A few other contenders get pretty close.
  • Now this pixel has the worst donut score,
  • which kind of makes sense because it
  • looks like an inverted donut.
  • By the way—if you're curious—there are quite a few
  • methods to handle the literal edge cases.
  • You can cut them off,
  • fill the exterior with zeros,
  • extend the boarders to infinity,
  • or just loop the image.
  • For our example, we'll just fill the exterior
  • with zeros because it's the easiest to understand.
  • And also, each application of
  • a filter is called a convolution which
  • gives the convolutional neural network its name.
  • But hold on! The donutty score of
  • each pixel is a scaler.
  • Meaning: a number on a one-dimensional number line.
  • And guess what!? The original
  • brightness of each pixel was also a scalar.
  • What does this mean? It means that
  • applying this continuous donutty filter
  • converts data of one type into
  • data of the same type.
  • In other words:
  • it converts a grayscale image into another grayscale image.
  • So if we wanted to, we could apply this filter to
  • the image once, and then again, and again, and again.
  • Forever. To be honest that's actually not
  • very interesting. What is interesting, is
  • if you apply a different filter in the second layer,
  • and a different filter in the third layer
  • and so on-- and also:
  • If you apply multiple filters to each image,
  • creating this giant web of filters,
  • each looking for different things.
  • Since each filter can be different,
  • you don't have to be searching for just donuts.
  • You can have one filter that's good for
  • finding vertical lines, and maybe another is
  • good at finding horizontal lines.
  • At the second layer you can combine the two,
  • to create a filter that finds cross shapes.
  • Think of it this way:
  • Perhaps the first layer can find edges,
  • then the second layer takes those edges as input.
  • That means the second layer can find edges of edges,
  • meaning corners.
  • The third layer can find edges of edges, of edges.
  • Here, interpretation gets a little fuzzy,
  • because we humans don't really know how
  • a computer effectively uses its filters.
  • But I'd guess that edges of edges, of edges,
  • could be used to detect arrangements of corners;
  • in other words, simple shapes,
  • like equilateral triangles.
  • Perhaps, further layers could see
  • arrangements of triangles,
  • and further layers than that, can soon detect
  • whole objects. From pencils, to apples,
  • to chihuahuas, to humans. With more layers, and
  • more convolutions per layer, you can find
  • more and more advanced features in your
  • original image. Got three or four filters
  • that can find ridges of darkness at just
  • the right angles? Boom. You've got a nose detector!
  • Use a few other filters to find
  • pairs of dark ellipses that are twice as
  • far apart as their width, and there's
  • your eye detector. Add in the rest of the
  • body parts somewhere else, and then
  • combine them in a final convolution that
  • makes sure they're all in the right place.
  • And you've just got a web of convolutions that
  • tell you—exactly—where there are
  • human faces in the image.
  • hmm... Doesn't that look familiar...?
  • Okay. It doesn't tell you exactly where
  • the human faces are,
  • since neural networks behave
  • randomly and unpredictably,
  • they'll never achieve 100% accuracy,
  • but they can get into the high 90s pretty easily now.
  • hmm... I brushed over this topic. But usually,
  • interspersed throughout the webs of convolutions,
  • you have points where you just downscale the
  • image by a factor of two,
  • and this is called pooling.
  • If you downscale enough you can
  • slowly convert your image, of thousands of pixels, into
  • an image of just one pixel.
  • Which can either be light or dark,
  • or anywhere in between.
  • Essentially, this can be used as a marker
  • to look at a whole image, not just one location,
  • and answer, "is there a human in this picture...?
  • ...or was this image taken indoors or outdoors...?"
  • If the final pixel's brightness is one, that means yes;
  • but it's at zero, that means no;
  • and anything in between means maybe.
  • I also bet you're asking how to deal with colored images.
  • Simple.
  • Almost all photos have three color channels:
  • Red, Green, and Blue.
  • So you can just interpret that as
  • three different grayscale images overlaid
  • on top of each other. That means you can
  • just set up your convolutional neural network to
  • have three images in the earliest layer,
  • instead of one.
  • Pretty simple actually. Each color of RGB is called
  • a color channel and convolutions and
  • further layers are also called channels
  • more advanced CNNs can have like 40 or
  • 60 or even a 100 channels in a
  • single layer because that's how many
  • features they're simultaneously trying
  • to search for. So yes, this is a
  • convolutional neural network. It takes in
  • an image of 𝘯-channels as an input and
  • outputs a scalar or an 𝘯-dimensional
  • vector if you're looking for multiple
  • things, or just whatever you want it to output.
  • That's great at all but even if
  • you were to program in this whole
  • structure perfectly, you still wouldn't
  • have a working convolutional neural network
  • because you'd have no idea what to set the filters to.
  • I mean, the filters determine what the network is even
  • searching for, so they're pretty darn important.
  • Maybe you could set them up
  • manually using your own common sense to
  • figure out what elements each filter
  • should specifically be designated for?
  • That would be the hardest math puzzle
  • of all time-- please don't do that.
  • Instead we want to use a ton of training data with
  • labels of how we would want our network
  • to respond to this data,
  • and gradient descent,
  • and calculus,
  • and math,
  • BUT UH-OH!!!
  • This video is already getting long,
  • so I guess it'll have to wait for part two.
  • Besides, you guys are getting impatient
  • and probably just want to see what my
  • new prettier friends look like.
  • Okay, I can introduce you to them.
  • At the beginning all these filters I mentioned earlier
  • are set with random values,
  • so you'll see nonsensical images,
  • but then it'll train to get better.
  • The training data is 15,000 images of celebrities
  • from FamousBirthdays[dot]com.
  • I'll explain why I chose the source in part two.
  • The machine learning program I'm using is
  • called "Hyper-Găn" by 255-bits.
  • Which is Martin and Michael, and I'll also explain
  • why I chose this in part two.
  • Also, the timer at the top, shows how long my
  • computer has been training for,
  • in the hours minutes seconds format. { HH:MM:SS }
  • Anyway, enough talking-- Let's go!
  • Yep! Yep! These are my new friends all right!
  • So much prettier than my old, real life friends.
  • I am so excited to hang out with
  • this beautiful, new crowd.
  • We can watch movies, go bowling,
  • rip out my brain cells and
  • replace them with neural networks,
  • go shopping, eat dinner.
  • It'll just be a blast!
  • Let me answer some questions
  • while an irrelevant time lapse plays.
  • “What was that music during the training time lapse?”
  • It's "Skyline" by "JujuMas"
  • who you should really go subscribe to.
  • “What happens when you train it for more than 7 hours?”
  • Not much.
  • I actually trained it for a day,
  • and the results didn't get significantly better.
  • Which brings me to...
  • “Shouldn't you remove the non-photographs from the training data?”
  • Yeah I should, but it takes too much work
  • to sift through 15,000 images,
  • and if the non-photographs are a small enough proportion,
  • they shouldn't affect the end result much, anyway.
  • “What was actually your procedure for setting this up?”
  • Again, I'll talk about the details in part two.
  • Before I end this video, I want to point out that
  • many other—actually smart—researchers
  • have gotten much better results than I have.
  • For example, the HyperGAN GitHub page itself
  • shows much larger more realistic looking
  • generated faces-- that just-- I mean look.
  • Can you even tell these aren't real?
  • And then, I keep seeing even better,
  • and better results, as time goes on,
  • on the R–slash–machine-learning subreddit.
  • That might lead you to ask,
  • “Cary, why would you spend so long showing your
  • own—mediocre work—when other people have
  • literally done exactly the same thing as you,
  • but 10 times better?”
  • And that's a valid question.
  • I'd like to think all my projects in the past
  • were unique in some way, but this one really isn't.
  • But one, I want to make it more visible to more people,
  • because I feel like a not that many people read the academic papers,
  • but a lot of people are on YouTube.
  • And two, this whole journey has really just been
  • to prove to myself that the code that's
  • used to generate these images can indeed
  • work successfully on just my computer alone.
  • No more relying on
  • what other people post the results could be.
  • I want to see my computer reaching those
  • results myself. Anyway, I got a juicy
  • NVIDIA GTX 1080 GPU for this,
  • so I want to make sure I can use it to it's full potential.
  • But don't worry, more original stuff
  • is coming in the future.
  • Like this!
  • What's this image?! I'm so confused!
  • This is unlike anything I've ever seen before~
  • hm-
  • I better subscribe to "carykh" to find out what all those
  • interesting lines are!
  • I can't believe I stooped that low...
  • Okay, end of the video, but I want to
  • promise to all the people who've been requesting.
  • I am going to make a ton of tutorial videos
  • from here on out.
  • For example, showing you how to program
  • a neural net completely from scratch,
  • assuming you know nothing;
  • or how to replicate the results I got in my
  • Baroque music video...
  • It's all coming, just be patient, and good bye.

Download subtitle

Description

The 1.5-month-long hiatus is over! Note to self: Never lip-sync things that don't need to be lip-synced. It takes forever

Here's HyperGAN, the tool I used to create the images: https://github.com/255BITS/HyperGAN

Skip to # if you just want to see the friends being generated!

There's a bug with the simulation at #... if a "must be black" pixel hovers over a white pixel that's also part of a donut, it will answer "yes" even though it should answer "no". Hopefully you can still see what's supposed to happen, though...

I probably should've clarified I'm not an expert in this field, so maybe I shouldn't have spoken with such confidence.

Music: "Kingdom of One" by Shogun Taira
/watch?v=4co-nL0rVas

'Mind Over Matter''
Jay Man - OurMusicBox
http://www.youtube.com/c/ourmusicbox

"No More Lasers" by me
/watch?v=FlfOcIDRKPQ

"Skyline" by JujuMas
I believe this is the right JujuMas? Not sure though: https://www.youtube.com/channel/UC9x3tPxNuXC-38CqcxY1yFA/featured

Trending videos