LOADING ...

1. Introduction, Threat Models

405K+ views   |   3K+ likes   |   45 dislikes   |  
01:17:12   |   Jul 14, 2015

Thumbs

1. Introduction, Threat Models
1. Introduction, Threat Models thumb 1. Introduction, Threat Models thumb 1. Introduction, Threat Models thumb

Transcription

  • The following content is provided under a Creative
  • Commons license.
  • Your support will help MIT OpenCourseWare
  • continue to offer high quality educational resources for free.
  • To make a donation or to view additional materials
  • from hundreds of MIT courses, visit MIT OpenCourseWare
  • at ocw.mit.edu
  • PROFESSOR: In this class, this semester,
  • the other co-lecturer is going to be
  • James Mickens, who is a visiting professor from Microsoft
  • Research.
  • He'll lecture on some other topics like web security
  • later on.
  • But we'll decide later what's going on exactly, in terms
  • of the lecture split up.
  • We also have four TAs this year, Stephen, Webb, [INAUDIBLE],
  • and James.
  • And hopefully you'll meet them in office hours over the year
  • if you need help.
  • So the plan for this class is to understand
  • how to build secure systems, why computer systems sometimes
  • are insecure and how we can make them better,
  • and what goes wrong.
  • And in order to do this, there's not really a great textbook
  • about this topic.
  • So instead, what we're going to do
  • is, each lecture other than this one is
  • going to be focused around some research, typically a paper,
  • that we'll assign on the website and you guys
  • should read ahead of time.
  • And there are some question that you
  • should answer in the submission system about the paper.
  • And submit your own question by 10:00 PM before the lecture
  • day.
  • And then when you come to lecture,
  • we'll actually discuss the paper, figure out,
  • what is the system?
  • What problem does it solve?
  • When does it work?
  • When does it not work?
  • Are these ideas any good in other cases?
  • Et cetera.
  • And hopefully, through these case studies,
  • we'll get some appreciation of how do we actually build
  • systems that are secure.
  • And we have some preliminary schedule up on the website.
  • If there's other topics you guys are particularly interested in,
  • or if there's particular papers you're excited about,
  • just send us email and we'll see if we can
  • fit them in or do something.
  • We're pretty flexible.
  • So if there's anything that you'd
  • like to hear more about, just let us know.
  • And in a similar vein, if you ever have a question
  • or if there's some mistake, just interrupt and ask
  • us what's going on in lecture, anytime.
  • Security is, in many ways, all about the details
  • and getting everything right.
  • And I will inevitably make mistakes.
  • So if something doesn't seem right,
  • there's a good chance it's not.
  • And you should just interrupt and ask.
  • And we'll figure out what's going on
  • and what's the right way to do things.
  • And I guess in terms of the class organization,
  • the other large part of the class,
  • in addition to lectures, is going
  • to be a series of lab assignments.
  • The first one is already posted on the website.
  • And these lab assignments will help
  • you go through understanding the different range of security
  • problems and how do you prevent them in a simple web server.
  • So in lab one, which is out right now,
  • you'll actually take a web server
  • that we give you and find ways to exploit buffer overflow
  • vulnerabilities in it and take control of this website
  • by just sending it carefully-crafted requests
  • and packets.
  • And in other labs, you'll look at ways
  • to defend the web server, to find bugs in the code,
  • to write worms that run in the user's
  • browser, and other kinds of interesting security problems.
  • One thing that surprises many students
  • is that every lab uses a different language.
  • So lab one is all about C and Assembly.
  • Lab two involves a lot of Python coding.
  • Lab three will be something else.
  • Lab five will be JavaScript.
  • And so on.
  • This is sort of inevitable.
  • And I sort of apologize ahead of time
  • that you're going to have to learn
  • all these languages if you haven't seen them already.
  • In some ways it's useful, because the real world
  • is like this.
  • All the systems are complicated and composed
  • of different parts.
  • And in the long run, it'll be useful for you,
  • for your moral character or something
  • like that, to learn this stuff.
  • But it will take some preparation,
  • especially if you haven't seen these languages before.
  • It might be helpful to start early.
  • In particular, lab one is going to rely
  • on a lot of subtle details of C and Assembly
  • code that we don't really teach in other classes
  • here in as much detail.
  • So it's probably a good idea to start early.
  • And we'll try to get the TAs to hold office hours next week
  • where we'll do some sort of a tutorial session
  • where we can help you get started with understanding what
  • a binary program looks like, how to disassemble it,
  • how to figure out what's on the stack, and so on.
  • All right.
  • And I guess the one other thing, we're actually
  • videotaping lectures this year.
  • So you might be able to watch these online.
  • We'll post them as soon as we get them ourselves
  • from the video people.
  • And the last bit of administrivia
  • is you should, if you have questions online,
  • we're using Piazza, so I'm sure you've
  • used this in other classes.
  • All right.
  • So before we dive into security, I need to tell you one thing.
  • There is a sort of rules that MIT has for accessing
  • MIT's network when you're, especially, doing security
  • research or playing with security problems,
  • you should be aware that not everything you can technically
  • do is legal.
  • And there's many things that you will learn in this class that
  • are technically possible.
  • We'll understand how systems can be broken or compromised.
  • Doesn't mean you should go out and do this everywhere.
  • And there's this link in the lecture notes
  • we'll post that has some rules that are good guidelines.
  • But in general, if you're in doubt,
  • ask one of the lecturers or a TA as to what you should do.
  • And hopefully it's not too puzzling, what's going on.
  • All right.
  • So any questions about all this administrivia
  • before we dive in?
  • Feel free to ask questions.
  • OK.
  • So what is security?
  • So we'll start with some basic stuff today.
  • And we'll look at just some general examples
  • of why security is hard and what it means to try
  • to build a secure system.
  • Because there's not really a paper,
  • this will not have sort of deep intellectual content, maybe,
  • but it'll give you some background and context for how
  • to think about secure systems.
  • So security, in general, is all about
  • achieving some goal when there is an adversary present.
  • So think of it as there's some bad guy out there that wants
  • to make sure you don't succeed.
  • They want to steal your files.
  • They want to delete your entire hard drive contents.
  • They want to make sure nothing works
  • and your phone doesn't connect, all these things, right?
  • And a secure system is one that can actually
  • do something, regardless of what the bad guy is
  • trying to do to you.
  • So it's kind of cool that we can actually potentially build
  • systems that are resilient to a whole range
  • of bad guys, adversaries, attackers,
  • whatever you want to call them.
  • And we can still build computer systems that
  • allow us to get our work done.
  • And the general way to think about security
  • is sort of break it up into three parts.
  • One part is roughly the policy that you
  • want your system to enforce.
  • This is roughly the goal that you want to achieve.
  • Like well, maybe, only I should be
  • able to read the grades file for 6.858.
  • Or maybe the TAs as well, and all the co-lecturers,
  • et cetera.
  • But there is some statement about what I want my system
  • to be able to do.
  • And then, if you want sort of think
  • about what kinds of policies you might write,
  • typical ones have to do with either confidentiality of data,
  • so the grades file is only accessible to the 6.858 course
  • staff.
  • Another example of a security policy
  • has something to do with integrity.
  • For example, only the course staff
  • can also modify the grades file.
  • Or only the course staff can upload the final grades
  • to the registrar's office.
  • That'll be great.
  • Then you can also think about things like availability.
  • So for example, a website should be available,
  • even if the bad guys try to take it down and mount
  • some sort of a DOS-- Denial of Service-- attack on it.
  • So this is all well and good.
  • So these are the policies that we might actually
  • care about from a system.
  • But because it's security, there's a bad guy involved.
  • We need to understand, what are we thinking
  • the bad guy is going to do?
  • And this is typically what we call a threat model.
  • And this is basically just a set of assumptions
  • about the bad guy or adversary.
  • And it's important to have some sort of assumptions
  • about the bad guy because, if the bad guy is omnipresent
  • and is everywhere at once and you can do anything they want,
  • it's going to be hard to achieve some semblance of security.
  • So for example, you probably want
  • to assume the bad guy doesn't exactly know your password,
  • or they don't actually have physical access to your phone
  • and your keys and your laptop.
  • Otherwise, it's going to be hard to make some sort of progress
  • in this game.
  • And turns out that while this is actually quite tricky
  • to come up with, but I guess one general rule is it's
  • much better err on the side of caution
  • and being conservative in picking your threat model,
  • because bad guy might always surprise you
  • in terms of what they might be able to do in practice.
  • And finally, in order to achieve security, in order
  • to achieve our goal under the set of assumptions,
  • we're going to look at some mechanism.
  • And this is the, basically, software or hardware
  • or whatever part of system design,
  • implementation, et cetera, that's
  • going to try to make sure our policy is followed
  • as long as the bad guy follows the threat model.
  • So the end result is that, as long
  • as our threat model was correct, hopefully we'll
  • satisfy our policy.
  • And it has to be the case that the mechanism doesn't screw up.
  • Make sense?
  • Fairly high level story about how
  • to think about this kind of stuff.
  • So why is this so hard, right?
  • It seems like a simple plan.
  • You write down these three things,
  • and you're off and running.
  • But in practice, as you, I'm sure, have seen in the world,
  • computer systems are almost always compromised
  • in some way or another.
  • And break ins are pretty commonplace.
  • And the big reason why security tends
  • to be a difficult problem is because what we have here
  • is sort of, this will be familiar to those of you
  • took 6.033, this is a negative goal,
  • meaning that we have to make sure our security policy is
  • followed regardless of what the attacker can do.
  • So just by contrast, if you want to build a file system,
  • and you want to make sure that my TAs can access the grades
  • file, that's pretty easy.
  • I just ask them, hey, can you guys test and see?
  • Can you access the grades file?
  • And if they all can access it, done.
  • The system works.
  • But if I want to say that no one other than the TAs
  • can access the grades file, this is a much harder problem
  • to solve, because now I have to figure out
  • what could all these non TA people in the world to try
  • to get my grades file, right?
  • They could try to just open it and read it.
  • Maybe my file system will disallow it.
  • But they might try all kinds of other attacks,
  • like guessing the password for the TAs
  • or stealing the TAs laptops or breaking into the room
  • or who knows, right?
  • This is all stuff that we have to really put
  • into our threat model.
  • Probably for this class, I'm not that concerned about
  • the grades file to worry about these guys' laptops being
  • stolen from their dorm room.
  • Although maybe I should be.
  • I don't know.
  • It's hard to tell, right?
  • And as a result, this security game
  • is often not so clear cut as to what
  • the right set of assumptions to make is.
  • And it's only after the fact that you often realize,
  • well should have thought of that.
  • All right.
  • And sort of, as a result, this is very much
  • an iterative process.
  • And the thing you end up realizing at every iteration
  • is, well, here's the weakest link into my system.
  • Maybe I got the threat model wrong.
  • Maybe my mechanism had some bugs in it because it's a software
  • and it's going to be large systems.
  • They'll have lots of bugs.
  • And you sort of fix them up.
  • You change your threat model a bit.
  • And you iterate and try to design a new system,
  • and hopefully, make things better.
  • So one possible interpretation of this class-- well,
  • one danger-- is that you come away thinking, man, everything
  • is just broken.
  • Nothing works.
  • We should just give up and stop using computers.
  • And this is one possible interpretation.
  • But it's probably not quite the right one.
  • The reason this is going to come up
  • or you're going to think this way
  • is because, throughout this class,
  • we're going to look at all these different systems,
  • and we're going to sort of push them to the edge.
  • We're going to see, OK, well, what if we do this?
  • Is it going to break?
  • What if we do that?
  • Is it going to break then?
  • And inevitably, every system is going
  • to have some sort of a breaking point.
  • And we'll figure out, oh hey.
  • This system, we can break it in if we push this way.
  • And this system doesn't work under these set of assumptions.
  • And it's inevitable that every system
  • will have a breaking point.
  • But that doesn't mean that every system is worthless.
  • It just means you have to know when
  • to use every system design.
  • And it's sort of useful to do this pushing exercise
  • to find the weaknesses so that you
  • know when certain ideas work, when certain ideas are not
  • applicable.
  • And in reality, this is a little more fuzzy boundary, right?
  • The more secure you make your system, the less likely
  • you'll have some embarrassing story on the front page of New
  • York Times saying, your start up company
  • leaked a million people's social security numbers.
  • And then you pay less money to recover from that disaster.
  • And I guess one sort of actually positive note on security
  • is that, in many ways, security enables cool things that you
  • couldn't do before, because security, especially
  • mechanisms, that allow us to protect
  • against certain classes of attacks, are pretty powerful.
  • As one example, the browser used to be fairly boring in terms
  • of what you could do with it.
  • You could just view web pages, maybe
  • run some JavaScript code in it.
  • But now there's all these cool mechanisms
  • we'll learn about in a couple of weeks
  • that allow you to run arbitrary x86 native code in the web
  • browser and make sure it doesn't do
  • anything funny to your machine.
  • And it can send-- and there's a technique or system
  • called Native Client from Google that actually
  • allows us to do this securely.
  • And before, in order to run some native game on your machine,
  • you'd have download and install it, click on lot's of dialogue
  • boxes, say yes, I allow this.
  • But now, you can just run it in a browser,
  • no clicking required.
  • It just runs.
  • And the reason it's so easy and powerful
  • is that our security mechanism can sandbox this program
  • and not have to assume anything about the user choosing
  • the right game and not some malicious game to play
  • in their computer, or some other program to run.
  • So in many ways, good security mechanisms
  • are going to enable constructing cool new systems that weren't
  • possible to construct before.
  • All right.
  • Make sense?
  • Any questions about this story?
  • All right.
  • So I guess in the rest of the lecture,
  • I want to go through a bunch of different examples of how
  • security goes wrong.
  • So, so far, we've seen how you can think of it.
  • But inevitably, it's useful to see examples
  • of what not to do so that you can have a better mindset when
  • you're approaching security problems.
  • And in this sort of breakdown of a security system,
  • pretty much every one of these three things goes wrong.
  • In practice, people get the policy wrong,
  • people get the threat model wrong,
  • and people get the mechanism wrong.
  • And let's, I guess, start with policies and examples
  • of how you can screw up a system's policy.
  • Maybe the cleanest or sort of simplest example of this
  • are account recovery questions.
  • So typically, when you sign into a website,
  • you provide a password.
  • But what happens if you lose your password?
  • Some sites will send you email if you
  • lose your password with a link to reset your password.
  • So it's easy enough, if you have another email address.
  • But what if this is your email provider?
  • So at least, several years ago, Yahoo
  • hosted email, webmail, for anyone on the internet.
  • And when you forgot your Yahoo password,
  • they couldn't really send you email
  • because you couldn't get it.
  • So instead, they had you register
  • a couple of questions with them that hopefully only you know.
  • And if you forget your password, you can click on a link
  • and say, well, here's the answers to my questions.
  • Let me have my password again.
  • And what turns out to be the case is-- well,
  • some people failed to realize is that this changes your policy,
  • because before, the policy of the system
  • is people that can log in are the people that
  • know the password.
  • And when you introduce these recovery questions,
  • the policy becomes, well, you can log in
  • if you know either the password or those security questions.
  • So it strictly weakens the security of your system.
  • And many people have actually taken advantage of this.
  • One sort of well known example is, I think a couple years ago,
  • Sarah Palin had an email account at Yahoo.
  • And her recovery questions were things like, well,
  • where'd you go to school?
  • What was your friend's name?
  • What's your birthday?
  • Et cetera.
  • These were all things written on her Wikipedia page.
  • And as a result, someone can quite easily,
  • and someone did, actually, get into her Yahoo email account
  • just by looking up on Wikipedia what her high school was
  • and what her birthday was.
  • So you really have to think carefully
  • about the implications of different security
  • policies you're making here.
  • Perhaps a more intricate and, maybe, interesting example,
  • is what happens when you have multiple systems that start
  • interacting with one another.
  • So there's this nice story about a guy called Mat Honan.
  • Maybe you read this story a year or two ago.
  • He's a editor at this wired.com magazine.
  • And had a bit of a problem.
  • Someone basically got into his Gmail account
  • and did lots of bad things.
  • But how did they do it, right?
  • So it's kind of interesting.
  • So all parties in this story seem
  • to be doing reasonable things.
  • But we'll see how they add up to something unfortunate.
  • So we have Gmail.
  • And Gmail lets you reset your password
  • if you forget, as do pretty much every other system.
  • And the way you do a reset at Gmail
  • is you send them a reset request.
  • And what they say is, well, you weren't
  • going to do this recovery questions, at least
  • not for this guy.
  • What they do is they send you a recovery link to a backup email
  • address, or some other email address that you have.
  • And helpful, they actually print the email address for you.
  • So for this guy's account, someone
  • went and asked Gmail to reset the password.
  • And they said, well, yeah.
  • Sure.
  • We just sent the recovery link to this email,
  • foo@me.com, which was some Apple email service.
  • OK, but the bad guy doesn't have access to me.com, either.
  • But they want to get this password reset
  • link to get access to Gmail.
  • Well, the way things worked was that,
  • in Apple's case, this me.com site,
  • allowed you to actually reset your password if you know
  • your billing address and the last four digits of your credit
  • card number.
  • So it's still not clear how you're going to get this
  • guy's-- well, home address, maybe you could look it up
  • somewhere.
  • This guy was a well known person at the time.
  • But where do you get the last four digits of his credit card
  • number?
  • Well, not clear, but let's keep going further.
  • So you need to send these things to me.com to get access
  • to his email account there.
  • Well, it turns out this guy had an account at Amazon, which
  • is another party in this story.
  • Amazon really wants you to buy things.
  • And as a result, they actually have a fairly elaborate account
  • management system.
  • And in particular, because they really want you to buy stuff,
  • they don't require you to sign in
  • in order to purchase some item with a credit card.
  • So I can actually go on Amazon, or at least at the time,
  • I was able to go on Amazon and say, well, I'm this user.
  • And I want to buy this pack of toothbrushes.
  • And if I wanted to use the saved credit card
  • number in the guy's account, I shouldn't be able to do this.
  • But if I just was providing a new credit card, what Amazon
  • would do is, they can actually add a new credit
  • card to some guy's account.
  • So that seems not too bad, right?
  • I'm basically ordering toothbrushes
  • through one of your Amazon accounts.
  • But it's not your credit card anyway.
  • It's just my credit card number being used.
  • So it's not clear how things go wrong yet.
  • But Amazon had another interface.
  • All these are complicated systems.
  • And Amazon had an interface for password reset.
  • And in order to reset a password in Amazon,
  • what you had to provide is just one of the user's credit card
  • numbers.
  • So I can order stuff and add a credit card number
  • to your account.
  • And then I can say, hey, I want to reset my password.
  • This is one of my credit card numbers.
  • And this, in fact, worked.
  • So this is where the bad guy got a hold of this guy's, Mat's,
  • Amazon account.
  • But OK.
  • How do you fish out the credit card number
  • for resetting Apple's site?
  • Well, Amazon was actually very careful.
  • Even if you break into someone's Amazon account,
  • it will not print you the saved credit card
  • numbers from that person.
  • But it will show the last four digits.
  • Just so you know which credit card you're talking about.
  • So you can list all the credit cards, other than the one
  • you added.
  • You can then go and break into me.com.
  • You can click on this link and get access
  • to the guy's Gmail account.
  • This is all very subtle stuff.
  • And in isolation, each system seems
  • to be doing somewhat sensible things.
  • But it's actually quite hard to reason
  • about these vulnerabilities and weaknesses
  • unless you have this whole picture explained to you
  • and you've sort of put all the pieces together.
  • So this is actually fairly tricky stuff.
  • And unfortunately, well, much like for every one
  • of these three categories, the answer for how to avoid this
  • is often think hard and be careful.
  • I guess the one general plan is, be conservative in terms
  • of what you set your policy to be,
  • to maybe not depend on things other sites might reveal.
  • So well, I'm not sure if any really great advice would
  • have prevented this problem.
  • But now you know.
  • And now you'll make other mistakes.
  • There's many other examples of policies
  • going wrong and allowing a system to be compromised.
  • That's interesting enough.
  • But let's look at how people might screw up threat models.
  • So let me turn off this blue square.
  • OK.
  • So what are examples of threat models that go wrong?
  • Well, probably a big one in practice is human factors.
  • So we often make assumptions about what
  • people will do in a system, like they
  • will pick a good, strong password,
  • or they will not click on random websites
  • that they get through email and enter their password there.
  • So these are-- well, as you probably suspect,
  • and in practice, happens to be the case,
  • these are not good assumptions in all cases.
  • And people pick bad passwords.
  • And people will click on random links.
  • And people will enter their password
  • on sites that are actually not the right site at all.
  • And they will not be paying a lot of attention.
  • So you probably don't want to have threat models that
  • make very strong assumptions about what
  • humans will do because inevitably, something
  • will go wrong.
  • Make sense?
  • Any questions?
  • All right.
  • Another sort of good thing to watch out in threat models
  • is that they sometimes change over time.
  • Or whether something is a good assumption or not
  • changes over time.
  • One example of this is actually at MIT in the mid '90s-- mid
  • '80s, actually-- Project Athena developed
  • this system called Kerberos.
  • And we'll read about this in a couple of weeks in this class.
  • And at the time, they were sort of figuring out, well, Kerberos
  • is going to be based on cryptography.
  • So we need to pick some size keys
  • to make sure they're not going to be
  • guessed by arbitrary people.
  • And they said, OK.
  • Well you know, 56-bit keys, at the time,
  • for this cypher called DES, seemed like a plausible size.
  • Maybe not great, but certainly not entirely unreasonable.
  • And this was in the mid '80s.
  • But then you know, this system got popular and got used a lot.
  • MIT still uses it.
  • And they never really went back to seriously revisit
  • this assumption.
  • And then, a couple years ago, a group of 6.858 students
  • figured out that actually, yeah, you can just break this, right?
  • It's easy enough to enumerate all the 256 keys these days.
  • Computers are so fast, you can just do it.
  • And as a result, they were able to,
  • with the help of some hardware from a particular web
  • service-- we'll have some links the lecture notes-- they were
  • able to get, basically, anyone's Kerberos account key in roughly
  • a day.
  • And so this assumption was good in the mid 1980s.
  • No longer a good assumption today.
  • So you really have to make sure your assumptions
  • sort of keep up with the times.
  • Maybe a more timely example is, if your adversary--
  • or if you're worried about government attacks,
  • you might realize that you shouldn't trust hardware even
  • these days, right?
  • There was all these revelations about what
  • the NSA is capable of doing.
  • And they have hardware back doors
  • that they can insert into computers.
  • And maybe up until a couple years ago, well, who knows?
  • I guess we didn't know about this stuff.
  • So maybe it was a reasonable assumption
  • to assume your laptop is not going
  • to be compromised physically, the hardware itself.
  • But now you know.
  • Actually, if you're worried about the government
  • being after you, you probably have a much harder problem
  • to deal with because your laptop might
  • be compromised physically, regardless
  • of what you install in it.
  • So we really have to be careful with your threat model
  • and really sort of balance it against who
  • you think is out to get you.
  • I think it's going to be a very expensive proposition if you're
  • going to try to protect yourself from the NSA, really.
  • On the other hand, if you're just protecting yourself
  • from random other students that are,
  • I don't know, snooping around in your Athena home directory
  • or whatnot, maybe you don't have to worry
  • about this stuff as much.
  • So it's really a balancing game and picking the right threat
  • model.
  • Another example of a bad threat model shows up in the way
  • secure websites these days check certificates of a website
  • that you're connecting to.
  • So in this SSL protocol or TLS, when you connect to a website
  • and it says HTTPS-- we'll talk much more about this
  • in later lectures-- but what happens
  • is that the site you're connecting to presents you
  • a certificate signed by one of the certificate authorities
  • out there that attests that, yep, this key
  • belongs to Amazon.com.
  • And architecturally, the sort of mistake
  • or the bad threat model that these guys assumed
  • is that all these CAs are going to be trustworthy.
  • They will never make a mistake.
  • And in fact, the way system works
  • is that there's hundreds of these CAs out there.
  • The Indian postal authority, I think, has a CA.
  • The Chinese government has a CA.
  • Lots of entities are certificate authorities in this design.
  • And any of them can make a certificate
  • for any host name or a domain name.
  • And as a result, what happens if you're a bad guy,
  • if you want to compromise Gmail or if you want to impersonate
  • Gmail's website, you just have to compromise
  • one of these certificate authorities.
  • And it turns out the weakest link is probably
  • some poorly run authority somewhere in some,
  • you know, not particularly up to date country.
  • Who knows, right?
  • And as a result, it's probably a bad assumption
  • to build a system-- or it's a bad idea
  • to build a system around the assumption
  • that you'll manage to keep all 300 certificate
  • authorities spread out around the globe perfectly secure.
  • But yet, that's the assumption underpinning
  • the security mechanism of today's SSL protocol used
  • by web browsers.
  • And there's sort of many other, I guess, examples
  • that are things you might not have thought of.
  • Another sort of amusing example from the 1980s was DARPA.
  • This defense agency, at the time,
  • really wanted to build secure operating systems.
  • And they actually went so far as to get
  • a bunch of universities and researchers
  • to build secure OS prototypes.
  • And then they actually got a red team,
  • like a team of bad guys pretending to be the attackers,
  • and told them, well, go break into these secure operating
  • systems any way you can.
  • We actually want to know, is it secure?
  • And it's kind of amusing, some of the surprising ways
  • they compromised the systems.
  • One was that there was this OS research
  • team that seemed to have a perfectly secure OS,
  • but it got compromised.
  • And the way it happened is that the server in which the source
  • code of the operating system was stored
  • was some development machine in someone's office
  • that wasn't secured at all.
  • But that had all the source code.
  • So the bad guys broke into that server.
  • It was not protected very well.
  • Changed the source code of the operating system
  • to introduce a back door.
  • And then, when the researchers built their operating systems,
  • well, it had this back door.
  • And the bad guys were able to break in.
  • So you really have to think about all the possible sort
  • of assumptions you're making about
  • where your software is coming from,
  • about how the bad guy can get in,
  • in order to make sure your system is really secure.
  • And there's many other examples in lecture notes, if you want.
  • So I'm using anecdotes.
  • You can page through those.
  • Probably the most pervasive problem that shows up,
  • of course, is in mechanisms, though.
  • And in part, it's because mechanisms
  • are the most complicated part of the story.
  • It's the entirety of all the software and hardware
  • and all that sort of system components
  • that make up what is trying to enforce your security policy.
  • And there's no end of ways in which mechanisms can fail.
  • And, partly as a result, much of this class
  • will focus pretty heavily on mechanisms
  • and how do you make mechanisms that are secure,
  • that provide correct enforcement of security policies.
  • And we'll talk about threat models and policies as well.
  • But turns out it's much easier to make clean,
  • sort of crisp statements about mechanisms and ways
  • they work and don't work, as opposed to policies and threat
  • models which, really, you have to figure out
  • how to fit them into a particular context
  • where you're using a system.
  • So let's look at some examples of, I guess, mechanism bugs.
  • One that you might have heard in the last couple of days
  • was a problem in the security mechanism in Apple's cloud
  • infrastructure called iCloud.
  • Well actually, any one of you that has an iPhone
  • might be using this iCloud service.
  • They basically provide storage for files
  • and let you find your iPhone if you lose it, and probably
  • lots of other useful features.
  • And I think it's some relative of this me.com service
  • that was implicated in this scheme a couple years back.
  • And the problem that someone discovered
  • in this iCloud service is that they
  • didn't enforce the same sort of mechanism at all interfaces.
  • OK, so what does iCloud look like?
  • Well, it basically provides lots of services for the same sort
  • of set of accounts.
  • So maybe you have your file storage on iCloud.
  • Maybe you have your photo sharing.
  • Maybe you have other interfaces.
  • And one of the interfaces into iCloud--
  • these are all sort of at different APIs
  • that they provide-- was this feature to find my iPhone,
  • I think.
  • And all these interfaces want to make sure
  • that you are the right user, you're authenticated correctly.
  • And unfortunately, the developers
  • all this iCloud system, you know it's a giant piece of software.
  • I'm sure lots of developers worked on this.
  • But on this particular interface,
  • the find my iPhone interface, when
  • you tried to log in with a username and password,
  • they didn't keep track of how many times you tried to log in.
  • And the reason is important is that, as I mentioned earlier,
  • humans are not that great at picking good passwords.
  • So actually building a system that authenticates users
  • with passwords is pretty tricky.
  • We'll actually read a whole paper about this later on.
  • But one good strategy is, there's probably
  • a million passwords out there that will account
  • for 50% percent of accounts.
  • So if you can guess, make a million attempts
  • at someone's account, then there's
  • a good chance you'll get their password because people
  • actually pick predictable passwords.
  • And one way to try to defeat this
  • is to make sure that your system doesn't
  • allow an arbitrary number of attempts
  • to log in to an account.
  • Maybe after three or 10 tries, you
  • should say, well, you've had enough tries.
  • Time out.
  • You can try again in 10 minutes or in an hour.
  • And this way you really slow down the attacker.
  • So they can only make a handful of guesses a day,
  • instead of millions of guesses.
  • And as a result, even if you have not the greatest
  • of passwords, it's going to be pretty hard for someone
  • to guess it.
  • What would happen is that iCloud had this password guessing
  • prevention or, basically, back off, on some interfaces,
  • like if you tried to log in through other interfaces
  • and you failed 10 times, it would say, well, sorry.
  • You have to wait until you try again.
  • But on this find my iPhone interface,
  • they forget this check.
  • That's probably, you know, some guy just
  • forgot to call this function on this API.
  • But the result is that, for the same set of accounts,
  • a bad guy would be able to now guess your password
  • through this interface at millions of attempts per day
  • easily, because this is just limited up to how fast they
  • can send packets to this iCloud thing.
  • And they can probably guess your password
  • with pretty good accuracy, or with pretty good success rate,
  • after making many guesses.
  • And this led to some unfortunate break ins.
  • And people's confidential data got stolen
  • from this iCloud service.
  • So this is sort of an example of you had the right policy.
  • Only the user and the right password
  • would get you access to the files.
  • You even had the right threat model
  • that, well, the bad guy might be able to guess the password.
  • So we'll have to break limit the number of guess attempts.
  • But he just screwed up, like the mechanism had a bug in it.
  • He just forgot to enforce this right policy and mechanism
  • at some interface.
  • And this shows up again and again in systems,
  • where just made a mistake and it has pretty drastic effects
  • on the security of the overall system.
  • This make sense?
  • Any questions so far?
  • All right.
  • OK.
  • So another example-- this is sort of an example of you
  • forget to check for password guessing attempts.
  • There's many other things you can forget.
  • You could forget to check for access control altogether.
  • So one example is, Citibank had a website-- actually, still
  • has a website that allows you to look at your credit card
  • account information.
  • So if you have a credit card with Citibank,
  • you go to this website, it tells you,
  • yeah, you have this credit card.
  • Here's all the charges, all this great stuff.
  • And the workflow a couple of years ago was that you go
  • to some site, you provide a log in username and password,
  • and you get redirected to another URL,
  • which is something like, I don't know, I'm guessing,
  • but basically like citi.com/account?id= you know,
  • whatever, one two three four.
  • And it turns out that some guy figured out, well,
  • if you change this number, you just
  • get someone else's account.
  • And it's not clear quite how to think of this.
  • One possibility is that these guys were just thinking right,
  • but they, again, forgot to check a function in this account
  • page that, not only do I have a valid ID number,
  • but it's also the ID number of the guy that's
  • currently logged in.
  • It's an important check to me.
  • But it's easy to forget.
  • Another thing is, maybe these guys were thinking,
  • no, no one could hit URLs.
  • Maybe they had a bad threat model, right?
  • Maybe they're thinking, the URL--
  • if I don't print this URL, no one can click on it.
  • It's like a bad threat model.
  • So maybe that's-- well, it's hard to tell exactly what went
  • wrong.
  • But anyway, these mistakes do happen.
  • And they show up a lot.
  • So easy to have small, seemingly, bugs
  • in your mechanism lead to pretty unfortunate consequences.
  • Another example that's not so much in missing checks
  • is a problem that showed up on Android
  • phones a couple of months ago.
  • Maybe I'll use this board over here.
  • So the problem was related to Bitcoin, which is this-- well,
  • I'm sure you've heard-- this electronic currency
  • system that's pretty popular these days.
  • And the way that Bitcoin works, at a very high level,
  • is that your balance of Bitcoins is
  • associated with a private key.
  • And if you have someone's private key
  • you can, of course, spend their Bitcoins.
  • So the security of Bitcoin relies quite heavily
  • on no one else knowing your private key.
  • It's kind of like a password, except it's even more
  • important, because people can probably make lots of guesses
  • at your private key.
  • And there's no real server that's checking your key.
  • It's just cryptography.
  • So any machine can try to make lots of guesses
  • at your private key.
  • And if they guess it, then they can transfer your Bitcoins
  • to someone else.
  • And as a result, it's critically important
  • that you generate good, random keys
  • that no one else can guess.
  • And there are people using Bitcoin on Android.
  • And the Android applications for Bitcoin were getting random
  • values for these keys using this Java API called SecureRandom(),
  • which sounds great, but as people figured out, well, OK.
  • So what it is, right, it doesn't really get real random numbers.
  • Inside of it, there's this construction
  • called Pseudorandom Number Generator,
  • or PRNG that, given a particular seed
  • value, like you get maybe a couple of hundred
  • bits of randomness and you shove it into this PRNG,
  • you can keep asking it for more randomness and sort of stretch
  • these random bits into as many random bits as you want.
  • So you see them initially, and then you
  • can generate as many random bits as you want.
  • And for various cryptographic reasons I won't go into here,
  • it actually works.
  • If you give it a couple of hundred really good random bits
  • initially, it's going to be very hard for anyone
  • to predict what the pseudorandom values it's generating are.
  • But the problem is that this Java library
  • had a small bug in it.
  • In some set of circumstances, it forgot
  • to initialize the PRNG with a seed,
  • so it was just all zeros, which means that everyone could just
  • figure out what your random numbers were.
  • If they start with zeros, they'll
  • produce the same random numbers as you,
  • which means they'll produce the same private key as you.
  • So they can just generate the same private key
  • and transfer your Bitcoins.
  • So this is, again, a small or not small bug,
  • depending on, I guess, who is asking.
  • But nonetheless, right?
  • Another example of small programming mistakes
  • leading to pretty catastrophic results.
  • Lot's of people got their Bitcoin balances stolen
  • because of this weakness.
  • Of course, the fix is pretty simple at some level.
  • You change the Java implementation
  • of SecureRandom() to always seed this PRNG with random input
  • bits.
  • And then, hopefully, you're in good shape.
  • But still, that's yet another example of mechanism failure.
  • Yeah?
  • AUDIENCE: Just to be clear, is this a different attack
  • from the DSA signature randomness?
  • PROFESSOR: Well yeah.
  • So the actual problem is a little bit more
  • complicated, as you're hinting at.
  • The problem is, even if you didn't generate
  • your key on the Android device in the first place,
  • the particular signature scheme used by Bitcoin
  • assumes that every time you generate a new signature
  • with that key, you use a fresh, what's
  • called a nonce, for generating that signature.
  • And if you ever generate two signatures with the same nonce,
  • then someone can figure out what your key is.
  • The story is pretty similar.
  • But the details are a little different.
  • So yeah, even if you actually generated your key somewhere
  • else and your key was great, it's just that every time you
  • generate a signature, you would--
  • and you generated two signatures with exactly the same nonce,
  • or random value, someone could apply some clever math
  • to your signatures and sort of extract your public key out
  • of it.
  • Or private key, more importantly.
  • All right.
  • Other questions about these problems, examples, et cetera?
  • All right.
  • So I guess, one thing I wanted to point out is that actually,
  • well, as you're starting to appreciate,
  • is that in computer security, almost every detail has
  • a chance of really mattering.
  • If you screw up almost something seemingly inconsequential,
  • like forgetting to check something, or this,
  • or forgetting to initialize the random seed,
  • it can have pretty dramatic consequences
  • for the overall system.
  • And you really have to be very clear about,
  • what is the specification of your system?
  • What is it doing?
  • Exactly what are all the corner cases?
  • And a good way to sort of think of breaking a system
  • or, conversely, figure out if your system is secure,
  • is to really push all the edge cases,
  • like what happens if my input is just large enough?
  • Or what is the biggest or the smallest input?
  • What is the sort of strangest set
  • of inputs I could provide to my program
  • and push it in all these corner cases?
  • One example of this ambiguity, sort of a good example
  • to keep in mind, is how SSL certificates, again,
  • encode names into the certificate itself.
  • So this is a different problem than the problem
  • about the certificate authorities being trusted.
  • So these SSL certificates are just sequences of bytes
  • that a web server sends to you.
  • And inside of this SSL certificate
  • is the name of the server you're connecting to,
  • so something like Amazon.com.
  • You know, you can't just put down those bytes.
  • You have to encode it somehow and specify, well,
  • it's Amazon.com.
  • And that's the end of the string.
  • So in SSL certificates, they use a particular encoding scheme
  • that writes down Amazon.com by first writing down
  • the number of bytes in the string.
  • So you first write down, OK.
  • Well, I'm going to have a 10 byte string called Amazon.com.
  • That's actually 10 bytes.
  • Great.
  • OK.
  • So this is like-- in the SSL certificate, somewhere
  • in there, there is this byte 10 followed by 10 bytes saying
  • what the host name is.
  • And there's other stuff afterwards, right, and before.
  • And when a browser takes it, well, the browser
  • is written in C. And the way C represents strings
  • is by null terminating them.
  • So in C, a string doesn't have a length count.
  • Instead, it has all the bytes.
  • And the end of the string is just the byte zero.
  • And in C, you write it with a backslash zero character.
  • So this is in memory in your browser.
  • Somewhere in memory there's this string
  • of 11 bytes, now, with an extra zero at the end.
  • And when a browser interprets this string,
  • it just keeps going until it sees an end of string
  • marker, which is a zero byte.
  • OK.
  • So, what could go wrong?
  • Any guesses?
  • Yeah?
  • AUDIENCE: You have a zero in the middle [INAUDIBLE]?
  • PROFESSOR: Yes.
  • This is great.
  • All right.
  • So, this is actually a bit of a discontinuity
  • in terms of how this guy represents
  • strings and this guy.
  • So suppose that I own the domain foo.com.
  • So I can get certificates for anything dot foo dot com.
  • So what I could do is ask for a certificate for the name
  • amazon.com0x.foo.com.
  • That's a perfectly valid string.
  • It has a bunch of bytes.
  • I guess it's 10, 11 12 13, 14, 15, 16,
  • there's another four, 20, right?
  • So this is 20 byte name with these 20 bytes.
  • So it used to be that if you go to a certificate authority,
  • in many cases, you could say, hey, I own foo.com.
  • Give me a certificate for this thing.
  • And they'd be perfectly willing to do it because it's
  • a subdomain of foo.com.
  • It's all yours.
  • But then, when a browser takes this string
  • and loads it in memory, well, what it does is the same thing
  • it did here.
  • It copies the string.
  • amazon.com0x.foo.com.
  • It'll dutifully add the terminating zero at the end.
  • But then, when the rest of the browser software
  • goes and tries to interpret the string at this memory location,
  • it'll keep going up until it gets to zero and say, OK well,
  • that's the end of the string.
  • So this is Amazon.com.
  • That's it.
  • So this sort of disconnect between how C software
  • and how SSL certificates represent names
  • led to some unfortunate security problems.
  • This was actually discovered a number of years
  • ago now by this guy, Moxie Marlinspike.
  • But it's a fairly clever observation.
  • And these kinds of encoding bugs are actually also
  • pretty common in lots of software
  • because, unless you're very diligent about exactly how you
  • encode things, there might be different ways of encoding.
  • And whenever there's disagreement,
  • there's a chance the bad guy can take advantage of this.
  • One system thinks that's a fine name.
  • Another thinks that's not, something else.
  • So these are good places to sort of push a system
  • to see how it might break.
  • That make sense?
  • All right.
  • So maybe the last example of mechanism failure
  • I'm going to talk about today is a reasonably popular one.
  • It's this problem or buffer overflows.
  • So some of you have seen this in, or at least at some level,
  • in 6.033, if you did the undergrad course.
  • But for those of you that have forgotten or haven't taken
  • oh three three, we'll sort of go over buffer overflows
  • in more detail.
  • And this will be, actually, quite critical for you guys,
  • because lab one is all about buffer overflows.
  • And you're going to be exploiting
  • these vulnerabilities in a somewhat real web server.
  • So let's figure out, what is the setting?
  • What are we talking about here?
  • So the setting we're going to be considering
  • is a system which has, let's say, a web server.
  • So what we have is, we have some computer out there
  • that has a web server on it.
  • And the web server is a program that
  • is going to accept connections from the outside world,
  • take requests-- which are basically just packets--
  • and somehow process them, and do some checking, probably.
  • If it's an illegal URL or if they're
  • trying to access a file they are not authorized to access,
  • the web server is going to return an error.
  • But otherwise, it's going to access some files,
  • maybe on disk, and send them back out
  • in some sort of a reply.
  • So this is a hugely common picture, almost any system
  • you look at.
  • What's the policy?
  • Or what's the threat model?
  • So this is a bit of a problem in many real world systems,
  • namely that it's actually pretty hard
  • to pin down what is the exact policy or threat model
  • that we're talking about.
  • And this sort of imprecision or ambiguity about policies,
  • threat models, et cetera, is what sometimes
  • leads to security problems.
  • Not in this particular case, but we'll see.
  • But maybe just to give you a sense of how
  • to think of a typical web server in the context of this policy,
  • threat model kind of stuff, is that well, probably the policy
  • is, the web server should do what the programmer intended
  • it to do.
  • It's a little vague.
  • But that's probably what's going on because anything more
  • specific, as well, the web server should
  • do exactly what the code does, is going
  • to be a bit of an [INAUDIBLE] And if your code has a bug,
  • well, your policy says, well, that's
  • exactly what I should do.
  • I should follow the bug.
  • So it's a little hard to state a policy precisely,
  • but in this case, let's go with some intuitive
  • version of, well, the web server should do what
  • the programmer wanted it to do.
  • And the threat model is probably,
  • the attacker doesn't have access to this machine,
  • can't log in to it remotely, doesn't have physical access
  • to it, but can send any packet they want.
  • So they're not restricted to certain kinds of packets.
  • Anything you can shape and sort of
  • deliver to this web server, that's fair game.
  • Seems like a reasonable threat model, in practice,
  • to have in mind.
  • And I guess the goal is that this web server shouldn't
  • allow arbitrary stuff to go wrong here.
  • I guess that sort of goes along with what
  • the programmer intended.
  • The programmer probably didn't intend any request
  • to be able to access anything on the server.
  • And yet, it turns out if you make certain kinds of mistakes
  • in writing the web server software, which is basically
  • the mechanism here, right?
  • The web server software is the thing that takes a request
  • and looks at it and makes sure that it's not
  • going to do something bad, sends a response back if everything's
  • OK.
  • The web server in this mechanism.
  • It's enforcing your policy.
  • And as a result, if the web server software is buggy,
  • then you're in trouble.
  • And one sort of common problem, if you're
  • writing software in C which, you know,
  • many things are still written in C
  • and probably will continue to be written in C for a while,
  • you can mismanage your memory allocations.
  • And as we saw in this SSL certificate naming example,
  • even sort of a single byte can really
  • make a huge difference, in terms of what goes on.
  • And I guess for this example, we'll
  • look at a small piece of code that's not quite a real web
  • server.
  • In the lab, you'll have this whole picture to play with.
  • But for lecture, I just want to give you
  • a simplified example so we can talk
  • about what's sort of at the core of what's going wrong.
  • And, in particular, if this system wakes up,
  • I will show you sort of a very small C function.
  • And we can sort of see what goes wrong
  • if you provide different inputs to that piece of code.
  • All right.
  • So the C function that I have in mind is this guy.
  • Somewhere here.
  • Oh, yeah.
  • It's coming on.
  • All right.
  • So here's the sort of program I'm talking about,
  • or I want to use as an example here.
  • So this program is just going to read a request.
  • And you can sort of imagine it's going to read
  • a request from the network.
  • But for the purposes of this example,
  • it's just going to read a request from whatever
  • I'm typing in on the keyboard.
  • And it's going to store it in a buffer here.
  • And then it's going to parse it is an integer
  • and return the integer.
  • And the program will then print whatever integer I get back.
  • It's like far from a web server.
  • But we'll at least see some basics
  • of how buffer overflows work and what goes wrong.
  • So let's see actually what happens if we run this program.
  • So I can compile it here.
  • And actually, you can sort of see
  • the-- it's already telling me what I'm screwing up, right?
  • The get function is dangerous and should not be used.
  • And we'll see in a second why the compiler is
  • so intent on telling me this.
  • And it actually is true.
  • But for now, suppose we're a happy go lucky
  • developer that is willing to ignore this warning.
  • So OK.
  • I run this redirect function, I provide some input,
  • and it works.
  • Let's see if I provide large inputs.
  • If I type in some large number, well,
  • at least it gives me some large number.
  • It basically maxes out to two to the 31 and prints that
  • and doesn't go any higher.
  • So that's maybe not disastrous, right?
  • Whatever.
  • You provided this ridiculously large number.
  • You got something didn't quite work.
  • It's not quite a problem yet.
  • But if we provide some really large input,
  • we might get some other problem, right?
  • So suppose I provide in a lot of by 12
  • I just provided things that are not numbers.
  • It prints zero.
  • That's not so bad.
  • But suppose I'm going to paste in a huge number of As.
  • OK, so now the program crashes.
  • Maybe not too surprising.
  • So if it was the case that if I send a bad request to the web
  • server, it just doesn't get back to me or doesn't send a reply,
  • that would be fine.
  • But we'll sort of look inside and see
  • what happens, and try to figure out how we can actually
  • take advantage of this crash to maybe do something much more
  • interesting, or, well, much more along with what a hacker might
  • be interested in doing.
  • So to do this, we're going to run this program
  • under a debugger.
  • You'll get super familiar with this in lab one.
  • But for now, what we're going to do
  • is set a breakpoint in that redirect function.
  • And we're going to sort of run along and see what happens.
  • So when I run the program, it's going
  • to start executing in the main function.
  • And pretty quickly, it calls redirect.
  • And the debugger is now stopped at the beginning of redirect.
  • And we can actually see what's going on here by, for example,
  • we can ask it to print the current CPU registers.
  • We're going to look at really low level stuff
  • here, as opposed to at the level of C source code.
  • We're going to look at the actual instructions
  • that my machine is executing because that's
  • what really is going on.
  • The C is actually maybe hiding some things from us.
  • So you can actually print all the registers.
  • So on x86, as you might remember.
  • Well, on [INAUDIBLE] architecture,
  • there's a stack pointer.
  • So let me start maybe drawing this diagram on the board
  • so we can try to reconstruct what's happening.
  • So what's going on is that my program, not surprisingly,
  • has a stack.
  • On x86, the stack grows down.
  • So it sort of is this stack like this.
  • And we can keep pushing stuff onto it.
  • So right now, the stack pointer points
  • at this particular memory location FFD010.
  • So some value.
  • So you can try to figure out, how did it get there?
  • One way to do it is to disassemble the code
  • of this redirect function.
  • Is this going to work better?
  • Really?
  • Convenience variable must have integer value.
  • Man.
  • What is going on with my debugger?
  • All right.
  • Well, we can disassemble the function by name.
  • So this is what the function is doing.
  • So first off, it starts by manipulating something
  • with this EBP register.
  • That's not super interesting.
  • But the first thing it does after that is
  • subtract a certain value from the stack pointer.
  • This is, basically, it's making space for all those variables,
  • like the buffer and the integer, i, we saw in the C source code.
  • So we're actually, now, four instructions
  • into the function, here.
  • So that stack pointer value that we
  • saw before is actually already in the middle, so to say,
  • of the stack.
  • And currently, there's stuff above it
  • that is going to be the buffer, that integer
  • value, and actually, also the return address
  • into the main function goes on the stack, as well.
  • So somewhere here, we'll have the return address.
  • And we actually try to figure out,
  • where are things on the stack?
  • So we can print the address of that buffer variable.
  • So the buffer variable is at address D02C.
  • We can also print the value of that integer, i.
  • That guy is at D0AC.
  • So the i is way up on the stack.
  • But the buffer is a bit lower.
  • So what's going on is that we have our buffer here
  • on the stack, and then followed above by i and maybe
  • some other stuff, and then finally, the return
  • address into the main function that called redirect.
  • And the buffer is-- this is going,
  • the stack is growing down.
  • So these are higher addresses.
  • So what this means is that the buffer-- we actually
  • have to decide, where is the zeroth element of the buffer,
  • and where is the 128th element of this buffer?
  • So where does the zeroth element of the buffer go?
  • Yeah?
  • Should be at the bottom, right, because yeah,
  • higher elements just keep going up.
  • So buff of zero is down here.
  • It just keeps going on.
  • And buff of 127 is going to be up there.
  • And then we'll have i and other stuff.
  • OK.
  • Well, let's see what happens now if we
  • provide that input that seemed to be crashing it before.
  • So I guess one thing we can actually
  • do before this is to see whether we can actually
  • find this return address.
  • Where it actually happens to live is at the EBP pointer.
  • This is just a convenient thing in the x86 calling convention,
  • that the EBP pointer, or register, actually
  • happens to point to something on the stack which is going
  • to be called the saved EBP.
  • It's a separate location, sort of after all the variables
  • but before the return address.
  • And this is the thing that's being
  • saved by those first couple of instructions at the top.
  • And you actually sort of examine it.
  • In GDB you can say, examine x, some value, so the EBP pointer
  • value.
  • So that's the location of the stack, D0B8.
  • Indeed, it's actually above even the i variable.
  • So it's great.
  • And it has some other value that happens
  • to be the EBP before this function was called.
  • But then, sort of one more memory location
  • up is going to be the return address.
  • So if we print EBP plus four, there's something else there,
  • this 0x08048E5F.
  • And let's actually see where that's pointing.
  • So this is something you're going to do a lot in the lab.
  • So you can take this address.
  • And you can try to disassemble it.
  • So what is this guy?
  • Where did we end up?
  • So GDB actually helpfully figures out which function
  • contains that address.
  • So 5F.
  • This is the guy that our return address is pointing to.
  • And as you can see, this is the instruction right
  • after the call to redirect.
  • So when we return from redirect, this
  • is exactly where we're going to jump and continue execution.
  • This is, hopefully, fairly straightforward
  • stuff from double oh four, some standard OS class.
  • OK.
  • So where are we now?
  • Just to recap, we can try to disassemble our instruction
  • pointer.
  • So we're at the beginning of redirect right now.
  • And we can run for a bit, and maybe run that getS() function.
  • So OK, we run next.
  • What this does is it runs getS() and it's waiting for getS()
  • to return.
  • We can provide our bad input to getS() and try to get it
  • to crash again and see what's going on, really, there, right?
  • So we can paste a bunch of As again.
  • OK.
  • So we got out of getS() and things are actually still OK,
  • right?
  • The program is still running.
  • But we can try to figure out, what is in memory right now
  • and why are things going to go wrong?
  • Actually, what do you guys think?
  • What happened, right?
  • So I printed out a bunch of As.
  • What did getS() do to the memory?
  • Yeah, yeah.
  • So it just keeps writing As here, right?
  • All we actually passed to getS() was a single pointer,
  • the start of this address, right?
  • So this is the argument to getS(),
  • is a pointer to this memory location on the stack.
  • So it just kept writing As.
  • And it doesn't actually know what the length is,
  • so it just keeps going, right?
  • It's going to override As all the way up the stack,
  • past the return address, probably, and into whatever
  • was up the stack above us.
  • So we can check whether that's the case.
  • So we can actually print the buffer.
  • And in fact, it tells us, yeah, we
  • have 180 As there, even though the buffer
  • should be 128 elements large.
  • So this is not so great.
  • And we can actually, again, examine what's
  • going on in that EBP pointer.
  • Dollar sign, EBP.
  • So in fact, yeah.
  • It's all 0x41, which is the ASCII encoding of the letter A.
  • And in fact, the return address is probably going
  • to be the same way, right?
  • If we print the return address, it's also all As.
  • That's not so great.
  • In fact, what's going to happen if we return now is
  • the program will jump to that address, 41414141.
  • And there's nothing there.
  • And it'll crash.
  • That's the segmentation fault you're getting.
  • So let's just step up to it and see what happens.
  • So let's run next.
  • So we keep stepping through the program.
  • And we can see where we are.
  • OK.
  • We're getting close to the end of the function.
  • So we can step over two more instructions.
  • nexti.
  • And now we can disassemble again.
  • OK.
  • We're now just at the return instruction from this function.
  • And we can actually figure out.
  • So as you can see, at the end of the function,
  • it runs this leave x86 instruction,
  • which basically restores the stack back to where it was.
  • So it sort of pushes the stack pointer
  • all the way back to the return address using the same EBP.
  • That's what it's basically for.
  • And now, the stack is pointing at the return address
  • that we're going to use.
  • And in fact, it's all A's.
  • And if we run one more instruction,
  • the CPU is going to jump to that exact memory address
  • and start executing code there and crash,
  • because it's not a valid address that's in the page table.
  • So let's actually see, just to double check, what's going on.
  • Let's print our buffer again.
  • Our buffer-- well, that's actually kind of interesting,
  • right?
  • So now, buffer, for some reason it
  • only says A repeats 128 times.
  • Whereas if you remember before, it said A repeated 180 times
  • in our buffer.
  • So what happened?
  • Yeah?
  • AUDIENCE: [INAUDIBLE].
  • PROFESSOR: Yeah, yeah.
  • Exactly.
  • So there's actually something going on
  • after the buffer overflow happens
  • that changes what's going on.
  • So actually, if you remember, we do
  • this A to i conversion of the string to an integer.
  • And if you provide all As, it actually
  • writes zero to this memory location.
  • So a zero, if you remember, terminates strings in C.
  • So GDB now thinks, yep, we have a perfectly well-terminated
  • 128 byte string of all As.
  • But you know, it doesn't really matter,
  • because we still have those As up top that
  • already corrupted our stack.
  • OK.
  • That was actually kind of an important lesson
  • that-- it's actually a little bit tricky, sometimes,
  • to explore these buffer overflows because, even
  • though you've already changed lots of stuff on the stack,
  • you still have to get to the point
  • where you use the value that you have somehow
  • placed on the stack.
  • So there's other code that's going
  • to run after you've managed to overflow
  • some buffer and corrupt memory.
  • You have to make sure that code doesn't do something silly
  • like, if it's A to i, just exited right away,
  • as soon as it saw a non-integer value,
  • we might not get to jump to all this 41414141 address.
  • So you have to massage your input in some cases.
  • Maybe not so much in this case.
  • But in other situations, you'll have
  • to be careful in constructing this input.
  • OK, so just to see what happens, we can jump one more time.
  • Well, let's look at our register.
  • So right now, our EIP, the sort of instruction pointer,
  • is pointing at the last thing in redirect.
  • And if we step one more time, hopefully we'll
  • jump to, finally, that unfortunate 4141 address.
  • Over here.
  • And in fact, yep.
  • The program now seems to be executing there.
  • If we ask GDB to print the current set of registers,
  • yep, the current instruction pointer is this strange value.
  • And if we exclude one more instruction,
  • it's going to crash because that's finally
  • trying to execute an instruction pointer that doesn't correspond
  • to a valid page in the operating system's page table
  • for this process.
  • Make sense?
  • Any questions?
  • All right.
  • Well, I've got a question for you guys, actually.
  • So what happens-- you know, it seems to be exploitable.
  • Or well, OK.
  • Maybe let's first figure out why this is particularly bad,
  • right?
  • So why is it a problem?
  • So not only does our program crash,
  • but presumably we're going to take it over.
  • So I guess, first simple question is, OK,
  • so what's the problem?
  • What can you do?
  • Yeah?
  • AUDIENCE: You can do whatever you want.
  • PROFESSOR: Yeah.
  • So I was actually pretty silly and just put in lots of As.
  • But if you were careful about knowing
  • where to put what values, you might
  • be able to put in a different value
  • and get it to jump somewhere else.
  • So let's see if we can actually do this, right?
  • We can retrace this whole thing.
  • OK.
  • Re-run the program again.
  • And I guess I have to reset the breakpoint.
  • So I can break and redirect again.
  • And run.
  • And this time, I'll, again, next,
  • supply lots of As and overflow things.
  • But I'm not going to try to carefully construct--
  • you know, figure out which point in these As corresponds
  • to the location in the stack.
  • That's something you guys are going
  • to have to do for lab one.
  • But suppose that I overflow the stack here.
  • And then I'm going to manually try
  • to change things on the stack to get it to jump to some point I
  • want to jump to.
  • And in this program, OK, so let's again-- nexti.
  • Where are we?
  • We're at, again, at the very end of redirect.
  • And let's actually look at the stack, right?
  • So if we examine esp here, we see our corrupted pointer.
  • OK.
  • Where could we jump to?
  • What interesting things could we do?
  • Unfortunately, this program is pretty limited.
  • There's almost nothing in the program's code
  • where you could jump and do anything interesting.
  • But maybe we can do a little bit of something interesting.
  • Maybe we'll find the printf in main
  • and jump directly there, and get it to print the x value,
  • or x equals something.
  • So we can do this.
  • We can actually disassemble the main function.
  • And main does a bunch of stuff, you
  • know, initializes, calls redirect, does some more stuff,
  • and then calls printf.
  • So how about we jump to this point, which is,
  • it sets up the argument to printf,
  • which is x equals percent d, and then actually calls printf.
  • So we can actually take this value
  • and try to stick it in the stack.
  • And should be able to do this with the debugger
  • pretty easily, at least.
  • So you can do this set [? int ?] esp equals this value.
  • So we can examine esp again and, indeed, it actually
  • has this value.
  • So if we continue now, well, it printed out x
  • equals some garbage, which I guess
  • happens to be just whatever is on the stack that
  • was passed to printf.
  • We didn't correctly set up all the arguments
  • because we jumped in the middle of this calling sequence.
  • But yeah, we printed this value.
  • And then it crashed.
  • Why did crash?
  • Why do you think?
  • What actually happens, right?
  • So we jump to printf.
  • And then, something went wrong.
  • Yeah?
  • Well, we changed the return address
  • so that when we return from redirect,
  • we now jump to this new address, which is that point up there,
  • right after printf.
  • So where's this crash coming from?
  • Yeah?
  • AUDIENCE: Is it restricted because your i
  • is supposed to be some sort of integer, but--
  • PROFESSOR: No, actually, well the i is like,
  • well it's a 32-bit register.
  • So whatever's in the register, it'll print.
  • In fact, that's the thing that's in the register.
  • So that's OK.
  • Yeah?
  • AUDIENCE: [INAUDIBLE] main returns.
  • PROFESSOR: Yes.
  • Actually, yeah.
  • What's going on is, you have to sort of-- OK,
  • so this is the point where we jumped.
  • It's set up some arguments.
  • It actually calls printf.
  • printf seems to work. printf is going to return.
  • Now actually, that's fine, because this call instruction
  • put a return address on the stack for printf to use.
  • That's fine.
  • Then main is going to continue running.
  • It's going to run the sleeve instruction, which doesn't
  • do anything interesting.
  • And then it does another return.
  • But the thing in this-- up to the stack,
  • it doesn't actually have a valid return address.
  • So presumably, we return to some other
  • who knows what memory location that's up on the stack
  • and jump somewhere else.
  • So unfortunately, here, our pseudoattack
  • didn't really work.
  • It ran some code.
  • But then it crashed.
  • That's probably not something you want to do.
  • So if you really wanted to be careful,
  • you would carefully plant not just this return address up
  • on the stack, but maybe you'd figure out,
  • where is this second red going to get its return address from,
  • and try to carefully place something
  • else on the stack there that will ensure
  • that your program cleanly exits after it gets exploited
  • so that no one notices.
  • So this is all stuff you'll sort of
  • try to do in lab one in a little bit more detail.
  • But I guess one thing we can try to think about now
  • is, we sort of understand why it's bad to jump to the--
  • or to have these buffer overflows.
  • One problem, or one sort of way to think of this
  • is that, the problem is just because the return address is
  • up there, right?
  • So the buffer keeps growing and eventually runs
  • over the return address.
  • What if we flip the stack around?
  • You know, some machines actually have stacks that grow up.
  • So an alternative design we could sort of imagine
  • is one where the stack starts at the bottom
  • and keeps going up instead of going down.
  • So then, if you overflow this buffer,
  • you'll just keep going up on the stack,
  • and maybe there's nothing bad that will happen.
  • Yeah?
  • AUDIENCE: [INAUDIBLE].
  • PROFESSOR: So you're right.
  • It might be that, if you have-- well,
  • so let me draw this new stack diagram.
  • And we'll sort of try to figure out what it applies to and not.
  • But OK.
  • So we'll basically just invert the picture.
  • So when you call redirect on this alternative architecture,
  • what's going to happen is the return address
  • is going to go here on the stack.
  • Then we'll have our i variable, or maybe the saved EBP.
  • Then we'll have our i variable.
  • And then we'll have buff.
  • So we'll have buff of zero, buff 127, and so on, right?
  • So then when we do the overflow, it overflows up there and maybe
  • doesn't hit anything bad.
  • I guess what you're saying is that,
  • well, maybe we had a buffer down there.
  • And if we had a buffer down there, then yeah,
  • that seems kind of unfortunate.
  • It could overrun this return address.
  • So you're right.
  • So you could still run into problems
  • on this stack growing up.
  • But what about this exact program?
  • Is this particular program safe on machines
  • where the stack grows up?
  • So just to recap what the program read is this guy.
  • Yeah?
  • AUDIENCE: Still going to overwrite
  • [INAUDIBLE] as a return value.
  • PROFESSOR: Yeah.
  • So that's actually clever, right?
  • So this is the stack frame for redirect.
  • I guess it actually spans all the way up here.
  • But what actually happens when you call getS() is that
  • redirect makes a function call.
  • It actually saves its return address up here on the stack.
  • And then getS() starts running.
  • And getS() puts its own saved EBP up here.
  • And getS() is going to post its own variables higher up.
  • And then getS() is going to fill in the buffer.
  • So this is still problematic.
  • Basically, the buffer is surrounded by return initials
  • on all sides.
  • Either way, you're going to be able to overflow something.
  • So at what point-- suppose we had a stack growing up machine.
  • At what point would you be able to take
  • control of the program's execution then?
  • Yes, and that is actually even easier in some ways.
  • You don't have to wait until redirect returns.
  • And maybe there was like, stuff that was going to mess you up
  • like this A to i.
  • No.
  • It's actually easier, because getS() is going to overflow
  • the buffer.
  • It's going to change the return address
  • and then immediately return and immediately
  • jump to wherever you sort of tried to construct,
  • makes sense.
  • So what happens if we have a program like this
  • that's pretty boring?
  • There's like no real interesting code to jump to.
  • All you can do is get it to print different x values here.
  • What if you want to do something interesting that you
  • didn't-- yeah?
  • AUDIENCE: I mean, if you have an extra cable stack,
  • you could put arbitrary code that,
  • for example, executes a shell?
  • PROFESSOR: Yeah yeah yeah.
  • So that's kind of clever, right, because you actually
  • can supply other inputs, right?
  • So at least, well-- there's some defenses against this.
  • And we'll go over these in subsequent lectures.
  • But in principle, you could have the return address here
  • that you override on either the stack up or stack down machine.
  • And instead of pointing it to some existing code,
  • like the printf inside of main, we
  • can actually have the return address point into the buffer.
  • So it's previously just some location on the stack.
  • But you could jump there and treat it as executable.
  • So as part of your request, you'll actually
  • send some bytes of data to the server,
  • and then have the return address or the thing you overwrite here
  • point to the base of the buffer, and you'll just
  • keep going from there.
  • So then you'll be able to sort of provide
  • the code you want to run, jump to it,
  • and get the server to run it.
  • And in fact, traditionally, in Unix systems,
  • what adversaries would often do is just ask the operating
  • system to execute the binsh command, which
  • lets you sort of type in arbitrary shell commands
  • after that.
  • So as a result, this thing, this piece
  • of code you inject into this buffer,
  • was often called, sort of for historical reasons, shell code.
  • And you'll try to construct some in this lab one as well.
  • All right.
  • Make sense, what you can do here?
  • Any questions?
  • Yeah?
  • AUDIENCE: Is there a separation between code and data?
  • PROFESSOR: Right.
  • So is there a separation between code and data here?
  • At least, well, historically, many machines
  • didn't enforce any separation of code and data.
  • You'd just have a flat memory address space.
  • The stack pointer points somewhere.
  • The code pointer points somewhere else.
  • And you just execute wherever the code pointer, instruction
  • pointer is pointing.
  • Modern machines try to provide some defenses
  • for these kinds of attacks.
  • And what modern machines often do is,
  • they actually associate permissions
  • with various memory regions.
  • And one of the permissions is execute.
  • So the part of your 32-bit or 64-bit address
  • space that contains code has the execute permission.
  • So if your instruction pointer points there,
  • the CPU will actually run those things.
  • And the stack and other data portions of your address space
  • typically don't have the execute permission.
  • So if you happen to somehow set your instruction pointer
  • to some non-code memory location, you can set it,
  • but the CPU will refuse to execute it.
  • So this is a reasonably nice way to defend
  • against these kinds of attacks.
  • But it doesn't prevent quite everything.
  • So just a question.
  • OK.
  • So how would you bypass this if you
  • had this non-executable stack?
  • You actually saw this example earlier, right,
  • when I actually jumped to the middle of main.
  • So that was a way of sort of exploiting this buffer
  • overflow without having to inject new code of my own.
  • So even if the stack was non-executable,
  • I would still be able to jump in the middle of main.
  • In this particular case, it's kind of boring.
  • It just prints x and crashes.
  • But in other situations, you might
  • have other pieces of code in your program
  • that are doing interesting stuff that you really
  • do want to execute.
  • And that's sort of called return to libc attacks for, again,
  • somewhat historical reasons.
  • But it is a way to bypass the security measures.
  • So in the context of buffer overflows,
  • there's not really a clear cut solution
  • that provides perfect protection against these mistakes
  • because, at the end of the day, the programmer did
  • make some mistake in writing this source code.
  • And the best way to fix it is probably just to change
  • the source code and make sure you don't call getS() very
  • much, like the compiler warned you.
  • And there's more subtle things that the compiler
  • doesn't warn you about.
  • And you still have to avoid making those calls.
  • But because it's hard, in practice,
  • to change all the software out there,
  • many people try to devise techniques
  • that make it more difficult to exploit these bugs.
  • For example, making the stack non-executable,
  • so you can't inject the shell code onto the stack,
  • and you have to do something slightly more elaborate.
  • And next couple of lectures, next two lectures,
  • actually, we'll look at these defense techniques.
  • They're not all perfect.
  • But they do, in practice, make it
  • much more difficult for that hacker to exploit things.
  • Question?
  • AUDIENCE: I just have a general administrative question.
  • PROFESSOR: Yeah?
  • AUDIENCE: I was wondering if there was a final?
  • And also if there are quizzes, and what dates--
  • PROFESSOR: Oh yeah.
  • Yeah, I think if you go to the schedule page,
  • there's two quizzes.
  • And there's no final during the final week,
  • but there's a quiz right before it.
  • So you're free for the final week,
  • but there's still something at the end of the class.
  • Yeah.
  • All right.
  • OK.
  • So I think that's probably it for buffer overflows.
  • I guess the one question is, so what
  • do you do about mechanism problems?
  • And the general answer is to probably have fewer mechanisms.
  • So as we saw here, if you're relying
  • on every piece of software to enforce your security policy,
  • you'll inevitably have mistakes that
  • allow an adversary to bypass your mechanism to exploit
  • some bug in the web server.
  • And a much better design, and one but
  • you will explore in lab two, is one
  • where you structure your whole system
  • so the security of the system doesn't
  • depend on all the pieces of software
  • enforcing your security policy.
  • The security policy is going to be
  • enforced by a small number of components.
  • And the rest of the stuff actually
  • doesn't matter, for security purposes,
  • if it's right or wrong.
  • It's not going to violate your security policy at all.
  • So this, kind of minimizing your trusted computing base
  • is a pretty powerful technique to get around these mechanism
  • bugs and problems that we've looked at today, at least
  • in a little bit of detail.
  • All right.
  • So read the paper for Monday.
  • And come to Monday's lecture.
  • And submit the questions on the website.
  • See you guys then.

Download subtitle

Description

MIT 6.858 Computer Systems Security, Fall 2014
View the complete course: http://ocw.mit.edu/6-858F14
Instructor: Nickolai Zeldovich

In this lecture, Professor Zeldovich gives a brief overview of the class, summarizing class organization and the concept of threat models.

License: Creative Commons BY-NC-SA
More information at http://ocw.mit.edu/terms
More courses at http://ocw.mit.edu