LOADING ...

3. Insertion Sort, Merge Sort

452K+ views   |   2K+ likes   |   66 dislikes   |  
Jan 14, 2013

Thumbs

3. Insertion Sort, Merge Sort
3. Insertion Sort, Merge Sort thumb 3. Insertion Sort, Merge Sort thumb 3. Insertion Sort, Merge Sort thumb

Transcription

  • The following content is provided
  • under a Creative Commons license.
  • Your support will help MIT OpenCourseWare continue
  • to offer high quality educational resources for free.
  • To make a donation or view additional materials
  • from hundreds of MIT courses, visit MIT OpenCourseWare
  • at ocw.mit.edu.
  • PROFESSOR: So today's lecture is on sorting.
  • We'll be talking about specific sorting algorithms today.
  • I want to start by motivating why
  • we're interested in sorting, which should be fairly easy.
  • Then I want to discuss a particular sorting
  • algorithm that's called insertion sort.
  • That's probably the simplest sorting algorithm
  • you can write, it's five lines of code.
  • It's not the best sorting algorithm
  • that's out there and so we'll try and improve it.
  • We'll also talk about merge sort, which is a divide
  • and conquer algorithm and that's going
  • to motivate the last thing that I want to spend time on,
  • which is recurrences and how you solve recurrences.
  • Typically the recurrences that we'll
  • be looking at in double o six are going to come from divide
  • and conquer problems like merge sort
  • but you'll see this over and over.
  • So let's talk about why we're interested in sorting.
  • There's some fairly obvious applications
  • like if you want to maintain a phone book,
  • you've got a bunch of names and numbers corresponding
  • to a telephone directory and you want
  • to keep them in sorted order so it's
  • easy to search, mp3 organizers, spreadsheets, et cetera.
  • So there's lots of obvious applications.
  • There's also some interesting problems
  • that become easy once items are sorted.
  • One example of that is finding a median.
  • So let's say that you have a bunch of items
  • in an array a zero through n and a zero through n
  • contains n numbers and they're not sorted.
  • When you sort, you turn this into b 0
  • through n, where if it's just numbers, then
  • you may sort them in increasing order or decreasing order.
  • Let's just call it increasing order for now.
  • Or if they're records, and they're not numbers,
  • then you have to provide a comparison function
  • to determine which record is smaller than another record.
  • And that's another input that you
  • have to have in order to do the sorting.
  • So it doesn't really matter what the items are
  • as long as you have the comparison function.
  • Think of it as less than or equal to.
  • And if you have that and it's straightforward,
  • obviously, to check that 3 is less than 4, et cetera.
  • But it may be a little more complicated
  • for more sophisticated sorting applications.
  • But the bottom line is that if you have your algorithm that
  • takes a comparison function as an input,
  • you're going to be able to, after a certain amount of time,
  • get B 0 n.
  • Now if you wanted to find the median of the set of numbers
  • that were originally in the array A,
  • what would you do once you have the sorted array B?
  • AUDIENCE: Isn't there a more efficient algorithm for median?
  • PROFESSOR: Absolutely.
  • But this is sort of a side effect of having a sorted list.
  • If you happen to have a sorted list,
  • there's many ways that you could imagine
  • building up a sorted list.
  • One way is you have something that's completely unsorted
  • and you run insertion sort or merge sort.
  • Another way would be to maintain a sorted list as you're
  • getting items put into the list.
  • So if you happened to have a sorted list
  • and you need to have this sorted list for some reason,
  • the point I'm making here is that finding the median
  • is easy.
  • And it's easy because all you have to do
  • is look at-- depending on whether n is odd
  • or even-- look at B of n over 2.
  • That would give you the median because you'd
  • have a bunch of numbers that are less than that
  • and the equal set of numbers that are greater than that,
  • which is the definition of median.
  • So this is not necessarily the best way, as you pointed out,
  • of finding the median.
  • But it's constant time if you have a sorted list.
  • That's the point I wanted to make.
  • There are other things that you could do.
  • And this came up in Erik's lecture,
  • which is the notion of binary search-- finding
  • an element in an array-- a specific element.
  • You have a list of items-- again a 0 through n.
  • And you're looking for a specific number or item.
  • You could, obviously, scan the array,
  • and that would take you linear time to find this item.
  • If the array happened to be sorted,
  • then you can find this in logarithmic time
  • using what's called binary search.
  • Let's say you're looking for a specific item.
  • Let's call it k.
  • Binary search, roughly speaking, would
  • work like-- you go compare k to, again, B of n over 2,
  • and decide, given that B is sorted,
  • you get to look at 1/2 of the array.
  • If B of n over 2 is not exactly k, then-- well,
  • if it's exactly k you're done.
  • Otherwise, you look at the left half.
  • You do your divide and conquer paradigm.
  • And you can do this in logarithmic time.
  • So keep this in mind, because binary search
  • is going to come up in today's lecture
  • and again in other lectures.
  • It's really a great paradigm of divide
  • and conquer-- probably the simplest.
  • And it, essentially, takes something
  • that's linear-- a linear search--
  • and turns it into logarithmic search.
  • So those are a couple of problems
  • that become easy if you have a sorted list.
  • And there's some not so obvious applications
  • of sorting-- for example, data compression.
  • If you wanted to compress a file,
  • one of the things that you could do is to--
  • and it's a set of items-- you could sort the items.
  • And that automatically finds duplicates.
  • And you could say, if I have 100 items that are all identical,
  • I'm going to compress the file by representing the item once
  • and, then, having a number associated
  • with the frequency of that item-- similar to what
  • document distance does.
  • Document distance can be viewed as a way
  • of compressing your initial input.
  • Obviously, you lose the works of Shakespeare or whatever it was.
  • And it becomes a bunch of words and frequencies.
  • But it is something that compresses the input
  • and gives you a different representation.
  • And so people use sorting as a subroutine in data compression.
  • Computer graphics uses sorting.
  • Most of the time, when you render
  • scenes in computer graphics, you have many layers
  • corresponding to the scenes.
  • It turns out that, in computer graphics,
  • most of the time you're actually rendering
  • front to back because, when you have a big opaque
  • object in front, you want to render that first,
  • so you don't have to worry about everything that's
  • occluded by this big opaque object.
  • And that makes things more efficient.
  • And so you keep things sorted front to back,
  • most of the time, in computer graphics rendering.
  • But some of the time, if you're worried about transparency,
  • you have to render things back to front.
  • So typically, you have sorted lists
  • corresponding to the different objects in both orders--
  • both increasing order and decreasing order.
  • And you're maintaining that.
  • So sorting is a real important subroutine
  • in pretty much any sophisticated application you look at.
  • So it's worthwhile to look at the variety of sorting
  • algorithms that are out there.
  • And we're going to do some simple ones, today.
  • But if you go and look at Wikipedia
  • and do a Google search, there's all sorts
  • of sorts like cocktail sort, and bitonic sort,
  • and what have you.
  • And there's reasons why each of these sorting algorithms exist.
  • Because in specific cases, they end up
  • winning on types of inputs or types of problems.
  • So let's take a look at our first sorting algorithm.
  • I'm not going to write code but it will be in the notes.
  • And it is in your document distance Python files.
  • But I'll just give you pseudocode here
  • and walk through what insertion sort looks like
  • because the purpose of describing
  • this algorithm to you is to analyze its complexity.
  • We need to do some counting here,
  • with respect to this algorithm, to figure out
  • how fast it's going to run in and what the worst case
  • complexity is.
  • So what is insertion sort?
  • For i equals 1, 2, through n, given an input to be sorted,
  • what we're going to do is we're going to insert A of i
  • in the right position.
  • And we're going to assume that we
  • are sort of midway through the sorting process, where
  • we have sorted A 0 through i minus 1.
  • And we're going to expand this to this array
  • to have i plus 1 elements.
  • And A of i is going to get inserted
  • into the correct position.
  • And we're going to do this by pairwise swaps
  • down to the correct position for the number that is initially
  • in A of i.
  • So let's go through an example of this.
  • We're going to sort in increasing order.
  • Just have six numbers.
  • And initially, we have 5, 2, 4, 6, 1, 3.
  • And we're going to take a look at this.
  • And you start with the index 1, or the second element,
  • because the very first element-- it's a single element
  • and it's already sorted by definition.
  • But you start from here.
  • And this is what we call our key.
  • And that's essentially a pointer to where we're at, right now.
  • And the key keeps moving to the right
  • as we go through the different steps of the algorithm.
  • And so what you do is you look at this
  • and you have-- this is A of i.
  • That's your key.
  • And you have A of 0 to 0, which is 5.
  • And since we want to sort in increasing order,
  • this is not sorted.
  • And so we do a swap.
  • So what this would do in this step is to do a swap.
  • And we would go obtain 2, 5, 4, 6, 1, 3.
  • So all that's happened here, in this step-- in the very
  • first step where the key is in the second position--
  • is one swap happened.
  • Now, your key is here, at item 4.
  • Again, you need to put 4 into the right spot.
  • And so you do pairwise swaps.
  • And in this case, you have to do one swap.
  • And you get 2, 4, 5.
  • And you're done with this iteration.
  • So what happens here is you have 2, 4, 5, 6, 1, 3.
  • And now, the key is over here, at 6.
  • Now, at this point, things are kind of easy,
  • in the sense that you look at it and you say, well, I
  • know this part is already started.
  • 6 is greater than 5.
  • So you have to do nothing.
  • So there's no swaps that happen in this step.
  • So all that happens here is you're
  • going to move the key to one step to the right.
  • So you have 2, 4, 5, 6, 1, 3.
  • And your key is now at 1.
  • Here, you have to do more work.
  • Now, you see one aspect of the complexity of this algorithm--
  • given that you're doing pairwise swaps-- the way
  • this algorithm was defined, in pseudocode, out there, was I'm
  • going to use pairwise swaps to find the correct position.
  • So what you're going to do is you're
  • going to have to swap first 1 and 6.
  • And then you'll swap-- 1 is over here.
  • So you'll swap this position and that position.
  • And then you'll swap-- essentially,
  • do 4 swaps to get to the point where you have
  • 1, 2, 4, 5, 6, 3.
  • So this is the result.
  • 1, 2, 4, 5, 6, 3.
  • And the important thing to understand, here,
  • is that you've done four swaps to get 1
  • to the correct position.
  • Now, you could imagine a different data structure
  • where you move this over there and you shift them
  • all to the right.
  • But in fact, that shifting of these four elements
  • is going to be computed in our model as four
  • operations, or four steps, anyway.
  • So there's no getting away from the fact
  • that you have to do four things here.
  • And the way the code that we have for insertion sort
  • does this is by using pairwise swaps.
  • So we're almost done.
  • Now, we have the key at 3.
  • And now, 3 needs to get put into the correct position.
  • And so you've got to do a few swaps.
  • This is the last step.
  • And what happens here is 3 is going to get swapped with 6.
  • And then 3 needs to get swapped with 5.
  • And then 3 needs to get swapped with 4.
  • And then, since 3 is greater than 2, you're done.
  • So you have 1, 2, 3, 4, 5, 6.
  • And that's it.
  • So, analysis.
  • How many steps do I have?
  • AUDIENCE: n squared?
  • PROFESSOR: No, how many steps do I have?
  • I guess that wasn't a good question.
  • If I think of a step as being a movement of the key,
  • how many steps do I have?
  • I have theta n steps.
  • And in this case, you can think of it as n minus 1 steps,
  • since you started with 2.
  • But let's just call it theta n steps,
  • in terms of key positions.
  • And you're right.
  • It is n square because, at any given step,
  • it's quite possible that I have to do theta n work.
  • And one example is this one, right here,
  • where I had to do four swaps.
  • And in general, you can construct a scenario
  • where, towards the end of the algorithm,
  • you'd have to do theta n work.
  • But if you had a list that was reverse sorted.
  • You would, essentially, have to do, on an average n
  • by two swaps as you go through each of the steps.
  • And that's theta n.
  • So each step is theta n swaps.
  • And when I say swaps, I could also
  • say each step is theta n compares and swaps.
  • And this is going to be important
  • because I'm going to ask you an interesting question
  • in a minute.
  • But let me summarize.
  • What I have here is a theta n squared algorithm.
  • The reason this is a theta n squared
  • algorithm is because I have theta n steps
  • and each step is theta n.
  • When I'm counting, what am I counting
  • it terms of operations?
  • The assumption here-- unspoken assumption--
  • has been that an operation is a compare and a swap
  • and they're, essentially, equal in cost.
  • And in most computers, that's true.
  • You have a single instruction and, say, the x86
  • or the MIPS architecture that can do a compare,
  • and the same thing for swapping registers.
  • So perfectly reasonably assumption
  • that compares and swaps for numbers
  • have exactly the same cost.
  • But if you had a record and you were comparing records,
  • and the comparison function that you used for the records was
  • in itself a method call or a subroutine,
  • it's quite possible that all you're doing
  • is swapping pointers or references to do the swap,
  • but the comparison could be substantially more expensive.
  • Most of the time-- and we'll differentiate
  • if it becomes necessary-- we're going
  • to be counting comparisons in the sorting algorithms
  • that we'll be putting out.
  • And we'll be assuming that either comparison swaps are
  • roughly the same or that compares are--
  • and we'll say which one, of course-- that compares
  • are substantially more expensive than swaps.
  • So if you had either of those cases for insertion sort,
  • you have a theta n squared algorithm.
  • You have theta n squared compares
  • and theta n squared swaps.
  • Now, here's a question.
  • Let's say that compares are more expensive than swaps.
  • And so, I'm concerned about the theta
  • n squared comparison cost.
  • I'm not as concerned, because of the constant factors involved,
  • with the theta n squared swap cost.
  • This is a question question.
  • What's a simple fix-- change to this algorithm that
  • would give me a better complexity in the case
  • where compares are more expensive,
  • or I'm only looking at the complexity of compares.
  • So the theta whatever of compares.
  • Anyone?
  • Yeah, back there.
  • AUDIENCE: [INAUDIBLE]
  • PROFESSOR: You could compare with the middle.
  • What did I call it?
  • I called it something.
  • What you just said, I called it something.
  • AUDIENCE: Binary search.
  • PROFESSOR: Binary search.
  • That's right.
  • Two cushions for this one.
  • So you pick them up after lecture.
  • So you're exactly right.
  • You got it right.
  • I called it binary search, up here.
  • And so you can take insertion sort
  • and you can sort of trivially turn it into a theta n log n
  • algorithm if we are talking about n
  • being the number of compares.
  • And all you have to do to do that is to say,
  • you know what, I'm going to replace
  • this with binary search.
  • And you can do that-- and that was the key observation--
  • because A of 0 through i minus 1 is already sorted.
  • And so you can do binary search on that part of the array.
  • So let me just write that down.
  • Do a binary search on A of 0 through i minus 1,
  • which is already sorted.
  • And essentially, you can think of it as theta log i time,
  • and for each of those steps.
  • And so then you get your theta n log n theta n log
  • n in terms of compares.
  • Does this help the swaps for an array data structure?
  • No, because binary search will require insertion
  • into A of 0 though i minus 1.
  • So here's the problem.
  • Why don't we have a full-fledged theta n log n algorithm,
  • regardless of the cost of compares or swaps?
  • We don't quite have that.
  • We don't quite have that because we need to insert our A of i
  • into the right position into A of 0 through i minus 1.
  • You do that if you have an array structure,
  • it might get into the middle.
  • And you have to shift things over to the right.
  • And when you shift things over to the right,
  • in the worst case, you may be shifting a lot of things
  • over to the right.
  • And that gets back to worst case complexity of theta n.
  • So a binary search in insertion sort
  • gives you theta n log n for compares.
  • But it's still theta n squared for swaps.
  • So as you can see, there's many varieties
  • of sorting algorithms.
  • We just looked at a couple of them.
  • And they were both insertion sort.
  • The second one that I just put up
  • is, I guess, technically called binary insertion sort
  • because it does binary search.
  • And the vanilla insertion sort is
  • the one that you have the code for in the doc dis program,
  • or at least one of the doc dis files.
  • So let's move on and talk about a different algorithm.
  • So what we'd like to do, now-- this class
  • is about constant improvement.
  • We're never happy.
  • We always want to do a little bit better.
  • And eventually, once we run out of room
  • from an asymptotic standpoint, you
  • take these other classes where you try and improve
  • constant factors and get 10%, and 5%, and 1%,
  • and so on, and so forth.
  • But we'll stick to improving asymptotic complexity.
  • And we're not quite happy with binary insertion sort
  • because, in the case of numbers, our binary insertion sort
  • has theta n squared complexity, if you look at swaps.
  • So we'd like to go find an algorithm that is theta n log
  • n.
  • And I guess, eventually, we'll have to stop.
  • But Erik will take care of that.
  • There's a reason to stop.
  • It's when you can prove that you can't do any better.
  • And so we'll get to that, eventually.
  • So merge sort is also something that you've probably seen.
  • But there probably will be a couple
  • of subtleties that come out as I describe this algorithm that,
  • hopefully, will be interesting to those of you who already
  • know merge sort.
  • And for those of you who don't, it's a very pretty algorithm.
  • It's a standard recursion algorithm-- recursive
  • algorithm-- similar to a binary search.
  • What we do, here, is we have an array, A. We split it
  • into two parts, L and R. And essentially, we kind of
  • do no work, really.
  • In terms of the L and R in the sense that we just call,
  • we keep splitting, splitting, splitting.
  • And all the work is done down at the bottom
  • in this routine called merge, where we are merging
  • a pair of elements at the leaves.
  • And then, we merge two pairs and get four elements.
  • And then we merge four tuples of elements, et cetera,
  • and go all the way up.
  • So while I'm just saying L terms into L prime, out here,
  • there's no real explicit code that you
  • can see that turns L into L prime.
  • It happens really later.
  • There's no real sorting code, here.
  • It happens in the merge routine.
  • And you'll see that quite clearly
  • when we run through an example.
  • So you have L and R turn into L prime and R prime.
  • And what we end up getting is a sorted array, A.
  • And we have what's called a merge routine that
  • takes L prime and R prime and merges them
  • into the sorted array.
  • So at the top level, what you see is split into two,
  • and do a merge, and get to the sorted array.
  • The input is of size n.
  • You have two arrays of size n over 2.
  • These are two sorted arrays of size n over 2.
  • And then, finally, you have a sorted array of size n.
  • So if you want to follow the recursive of execution
  • of this in a small example, then you'll
  • be able to see how this works.
  • And we'll do a fairly straightforward example
  • with 8 elements.
  • So at the top level-- before we get there, merge
  • is going to assume that you have two sorted arrays,
  • and merge them together.
  • That's the invariant in merge sort, or for the merge routine.
  • It assumes the inputs are sorted-- L and R. Actually
  • I should say, L prime and R prime.
  • So let's say you have 20, 13, 7, and 2.
  • You have 12, 11, 9, and 1.
  • And this could be L prime.
  • And this could be R prime.
  • What you have is what we call a two finger algorithm.
  • And so you've got two fingers and each of them
  • point to something.
  • And in this case, one of them is pointing
  • to L. My left finger is pointing to L prime,
  • or some element L prime.
  • My right finger is pointing to some element in R prime.
  • And I'm going to compare the two elements
  • that my fingers are pointing to.
  • And I'm going to choose, in this case,
  • the smaller of those elements.
  • And I'm going to put them into the sorted array.
  • So start out here.
  • Look at that and that.
  • And I compared 2 and 1.
  • And which is smaller?
  • 1 is smaller.
  • So I'm going to write 1 down.
  • This is a two finger algo for merge.
  • And I put 1 down.
  • When I put 1 down, I had to cross out 1.
  • So effectively, what happens is-- let
  • me just circle that instead of crossing it out.
  • And my finger moves up to 9.
  • So now I'm pointing at 2 and 9.
  • And I repeat this step.
  • So now, in this case, 2 is smaller.
  • So I'm going to go ahead and write 2 down.
  • And I can cross out 2 and move my finger up to 7.
  • And so that's it.
  • I won't bore you with the rest of the steps.
  • It's essentially walking up.
  • You have a couple of pointers and you're
  • walking up these two arrays.
  • And you're writing down 1, 2, 7, 9, 11, 12, 13, 20.
  • And that's your merge routine.
  • And all of the work, really, is done in the merge routine
  • because, other than that, the body is simply
  • a recursive call.
  • You have to, obviously, split the array.
  • But that's fairly straightforward.
  • If you have an array, A 0 through n-- and depending on
  • whether n is odd or even-- you could
  • imagine that you set L to be A 0 n by 2 minus 1,
  • and R similarly.
  • And so you just split it halfway in the middle.
  • I'll talk about that a little bit more.
  • There's a subtlety associated with that
  • that we'll get to in a few minutes.
  • But to finish up in terms of the computation of merge sort.
  • This is it.
  • The merge routine is doing most, if not all, of the work.
  • And this two finger algorithm is going
  • to be able to take two sorted arrays
  • and put them into a single sorted array
  • by interspersing, or interleaving, these elements.
  • And what's the complexity of merge
  • if I have two arrays of size n over 2, here?
  • What do I have?
  • AUDIENCE: n.
  • PROFESSOR: n.
  • We'll give you a cushion, too.
  • theta n complexity.
  • So far so good.
  • I know you know the answer as to what
  • the complexity of merge sort is.
  • But I'm guessing that most of you
  • won't be able to prove it to me because I'm kind of a hard guy
  • to prove something to.
  • And I could always say, no, I don't believe you
  • or I don't understand.
  • The complexity-- and you've said this before, in class,
  • and I think Erik's mentioned it--
  • the overall complexity of this algorithm is theta n log n
  • And where does that come from?
  • How do you prove that?
  • And so what we'll do, now, is take a look at merge sort.
  • And we'll look at the recursion tree.
  • And we'll try and-- there are many ways
  • of proving that merge sort is theta n log n.
  • The way we're going to do this is
  • what's called proof by picture.
  • And it's not an established proof technique,
  • but it's something that is very helpful
  • to get an intuition behind the proof
  • and why the result is true.
  • And you can always take that and you
  • can formalize it and make this something
  • that everyone believes.
  • And we'll also look at substitution, possibly
  • in section tomorrow, for recurrence solving.
  • So where we're right now is that we have a divide and conquer
  • algorithm that has a merge step that is theta n.
  • And so, if I just look at this structure that I have here,
  • I can write a recurrence for merge sort
  • that looks like this.
  • So when I say complexity, I can say
  • T of n, which is the work done for n items,
  • is going to be some constant time in order
  • to divide the array.
  • So this could be the part corresponding
  • to dividing the array.
  • And there's going to be two problems of size n over 2.
  • And so I have 2 T of n over 2.
  • And this is the recursive part.
  • And I'm going to have c times n, which is the merge part.
  • And that's some constant times n, which is what we have,
  • here, with respect to the theta n complexity.
  • So you have a recurrence like this and I know some of you
  • have seen recurrences in 6.042.
  • And you know how to solve this.
  • What I'd like to do is show you this recursion tree expansion
  • that, not only tells you how to solve this occurrence,
  • but also gives you a means of solving recurrences where,
  • instead of having c of n, you have something else out here.
  • You have f of n, which is a different function
  • from the linear function.
  • And this recursion tree is, in my mind,
  • the simplest way of arguing the theta n log n
  • complexity of merge sort.
  • So what I want to do is expand this recurrence out.
  • And let's do that over here.
  • So I have c of n on top.
  • I'm going to ignore this constant factor because c of n
  • dominates.
  • So I'll just start with c of n.
  • I want to break things up, as I do the recursion.
  • So when I go c of n, at the top level-- that's
  • the work I have to do at the merge, at the top level.
  • And then when I go down to two smaller problems, each of them
  • is size n over 2.
  • So I do c times n divided by 2 [INAUDIBLE].
  • So this is just a constant c.
  • I didn't want to write thetas up here.
  • You could.
  • And I'll say a little bit more about that later.
  • But think of this cn as representing the theta n
  • complexity.
  • And c is this constant.
  • So c times n, here. c times n over 2, here.
  • And then when I keep going, I have c times n over 4,
  • c times n over 4, et cetera, and so on, and so forth.
  • And when I come down all the way here,
  • n is eventually going to become 1-- or essentially a constant--
  • and I'm going to have a bunch of c's here.
  • So here's another question, that I'd like you to answer.
  • Someone tell me what the number of levels in this tree are,
  • precisely, and the number of leaves in this tree are,
  • precisely.
  • AUDIENCE: The number of levels is log n plus 1.
  • PROFESSOR: Log n plus 1.
  • Log to the base 2 plus 1.
  • And the number of leaves?
  • You raised your hand back there, first.
  • Number of leaves.
  • AUDIENCE: I think n.
  • PROFESSOR: Yeah, you're right.
  • You think right.
  • So 1 plus log n and n leaves.
  • When n becomes 1, how many of them do you have?
  • You're down to a single element, which is, by definition,
  • sorted.
  • And you have n leaves.
  • So now let's add up the work.
  • I really like this picture because it's just
  • so intuitive in terms of getting us the result
  • that we're looking for.
  • So you add up the work in each of the levels of this tree.
  • So the top level is cn.
  • The second level is cn because I added 1/2 and 1/2, cn, cn.
  • Wow.
  • What symmetry.
  • So you're doing the same amount of work modulo
  • the constant factors, here, with what's
  • going on with the c1, which we've ignored,
  • but roughly the same amount of work in each of the levels.
  • And now, you know how many levels there are.
  • It's 1 plus log n.
  • So if you want to write an equation for T of n,
  • it's 1 plus log n times c of n, which is theta of n log n.
  • So I've mixed in constants c and thetas.
  • For the purposes of this description,
  • they're interchangeable.
  • You will see recurrences that look like this, in class.
  • And things like that.
  • Don't get confused.
  • It's just a constant multiplicative factor
  • in front of the function that you have.
  • And it's just a little easier, I think,
  • to write down these constant factors
  • and realize that the amount of work done
  • is the same in each of the leaves.
  • And once you know the dimensions of this tree,
  • in terms of levels and in terms of the number of leaves,
  • you get your result.
  • So we've looked at two algorithm, so far.
  • And insertion sort, if you talk about numbers,
  • is theta n squared for swaps.
  • Merge sort is theta n log n.
  • Here's another interesting question.
  • What is one advantage of insertion sort over merge sort?
  • AUDIENCE: [INAUDIBLE]
  • PROFESSOR: What does that mean?
  • AUDIENCE: You don't have to move elements outside
  • of [INAUDIBLE].
  • PROFESSOR: That's exactly right.
  • That's exactly right.
  • So the two guys who answered the questions
  • before with the levels, and you.
  • Come to me after class.
  • So that's a great answer.
  • It's in-place sorting is something
  • that has to do with auxiliary space.
  • And so what you see, here-- and it was a bit hidden, here.
  • But the fact of the matter is that you
  • had L prime and R prime.
  • And L prime and R prime are different from L and R, which
  • were the initial halves of the inputs to the sorting
  • algorithm.
  • And what I said here is, we're going to dump this into A.
  • That's what this picture shows.
  • This says sorted array, A. And so you
  • had to make a copy of the array-- the two halves L
  • and R-- in order to do the recursion,
  • and then to take the results and put them
  • into the sorted array, A.
  • So you needed-- in merge sort-- you
  • needed theta n auxiliary space.
  • So merge sort, you need theta n extra space.
  • And the definition of in-place sorting
  • implies that you have theta 1-- constant-- auxiliary space.
  • The auxiliary space for insertion sort
  • is simply that temporary variable
  • that you need when you swap two elements.
  • So when you want to swap a couple of registers,
  • you gotta store one of the values in a temporary location,
  • override the other, et cetera.
  • And that's the theta 1 auxiliary space for insertion sort.
  • So there is an advantage of the version of insertion sort
  • we've talked about, today, over merge sort.
  • And if you have a billion elements, that's potentially
  • something you don't want to store in memory.
  • If you want to do something really fast and do everything
  • in cache or main memory, and you want
  • to sort billions are maybe even trillions of items,
  • this becomes an important consideration.
  • I will say that you can reduce the constant factor
  • of the theta n.
  • So in the vanilla scheme, you could
  • imagine that you have to have a copy of the array.
  • So if you had n elements, you essentially
  • have n extra items of storage.
  • You can make that n over 2 with a simple coding trick
  • by keeping 1/2 of A.
  • You can throw away one of the L's or one of the R's.
  • And you can get it down to n over 2.
  • And that turns out-- it's a reasonable thing
  • to do if you have a billion elements
  • and you want to reduce your storage by a constant factor.
  • So that's one coding trick.
  • Now it turns out that you can actually go further.
  • And there's a fairly sophisticated algorithm
  • that's sort of beyond the scope of 6.006
  • that's an in-place merge sort.
  • And this in-place merge sort is kind of
  • impractical in the sense that it doesn't do very well
  • in terms of the constant factors.
  • So while it's in-place and it's still theta n log n.
  • The problem is that the running time of an in-place merge sort
  • is much worse than the regular merge sort that
  • uses theta n auxiliary space.
  • So people don't really use in-place merge sort.
  • It's a great paper.
  • It's a great thing to read.
  • Its analysis is a bit sophisticated for double 0 6.
  • So we wont go there.
  • But it does exist.
  • So you can take merge sort, and I just
  • want to let you know that you can do things in-place.
  • In terms of numbers, some experiments we ran a few years
  • ago-- so these may not be completely valid
  • because I'm going to actually give you numbers--
  • but merge sort in Python, if you write a little curve fit
  • program to do this, is 2.2n log n microseconds for a given n.
  • So this is the merge sort routine.
  • And if you look at insertion sort, in Python,
  • that's something like 0.2 n square microseconds.
  • So you see the constant factors here.
  • If you do insertion sort in C, which is a compiled language,
  • then, it's much faster.
  • It's about 20 times faster.
  • It's 0.01 n squared microseconds.
  • So a little bit of practice on the side.
  • We do ask you to write code.
  • And this is important.
  • The reason we're interested in algorithms
  • is because people want to run them.
  • And what you can see is that you can actually find an n-- so
  • regardless of whether you're Python or C,
  • this tells you that asymptotic complexity is pretty important
  • because, once n gets beyond about 4,000,
  • you're going to see that merge sort in Python
  • beats insertion sort in C.
  • So the constant factors get subsumed
  • beyond certain values of n.
  • So that's why asymptotic complexity is important.
  • You do have a factor of 20, here,
  • but that doesn't really help you in terms
  • of keeping an n square algorithm competitive.
  • It stays competitive for a little bit longer,
  • but then falls behind.
  • That's what I wanted to cover for sorting.
  • So hopefully, you have a sense of what
  • happens with these two sorting algorithms.
  • We'll look at a very different sorting algorithm next time,
  • using heaps, which is a different data structure.
  • The last thing I want to do in the couple minutes I have left
  • is give you a little more intuition as to recurrence
  • solving based on this diagram that I wrote up there.
  • And so we're going to use exactly this structure.
  • And we're going to look at a couple of different recurrences
  • that I won't really motivate in terms
  • of having a specific algorithm, but I'll just
  • write out the recurrence.
  • And we'll look at the recursion tree for that.
  • And I'll try and tease out of you the complexity associated
  • with these recurrences of the overall complexity.
  • So let's take a look at T of n equals 2 T of n over 2
  • plus c n squared.
  • Let me just call that c-- no need for the brackets.
  • So constant c times n squared.
  • So if you had a crummy merge routine,
  • and it was taking n square, and you coded it up wrong.
  • It's not a great motivation for this recurrence,
  • but it's a way this recurrence could have come up.
  • So what does this recursive tree look like?
  • Well it looks kind of the same, obviously.
  • You have c n square; you have c n square divided by 4;
  • c n square divided by 4; c n square divided
  • by 16, four times.
  • Looking a little bit different from the other one.
  • The levels and the leaves are exactly the same.
  • Eventually n is going to go down to 1.
  • So you will see c all the way here.
  • And you're going to have n leaves.
  • And you will have, as before, 1 plus log n levels.
  • Everything is the same.
  • And this is why I like this recursive tree formulation so
  • much because, now, all I have to do
  • is add up the work associated with each of the levels
  • to get the solution to the recurrence.
  • Now, take a look at what happens, here.
  • c n square; c n square divided by 2; c n square divided by 4.
  • And this is n times c.
  • So what does that add up to?
  • AUDIENCE: [INAUDIBLE]
  • PROFESSOR: Yeah, exactly.
  • Exactly right.
  • So if you look at what happens, here, this dominates.
  • All of the other things are actually less than that.
  • And you said bounded by two c n square
  • because this part is bounded by c n square
  • and I already have c n square up at the top.
  • So this particular algorithm that corresponds to this crummy
  • merge sort, or wherever this recurrence came from,
  • is a theta n squared algorithm.
  • And in this case, all of the work done
  • is at the root-- at the top level of the recursion.
  • Here, there was a roughly equal amount
  • of work done in each of the different levels.
  • Here, all of the work was done at the root.
  • And so to close up shop, here, let
  • me just give you real quick a recurrence where
  • all of the work is done at the leaves, just for closure.
  • So if I had, magically, a merge routine that actually happened
  • in constant time, either through buggy analysis,
  • or because of it was buggy, then what
  • does the tree look like for that?
  • And I can think of this as being theta 1.
  • Or I can think of this as being just a constant c.
  • I'll stick with that.
  • So I have c, c, c.
  • Woah, I tried to move that up.
  • That doesn't work.
  • So I have n leaves, as before.
  • And so if I look at what I have, here, I
  • have c at the top level.
  • I have 2c, and so on and so forth.
  • 4c.
  • And then I go all the way down to nc.
  • And so what happens here is this dominates.
  • And so, in this recurrence, the whole thing runs in theta n.
  • So the solution to that is theta n.
  • And what you have here is all of the work
  • being done at the leaves.
  • We're not going to really cover this theorem that gives you
  • a mechanical way of figuring this out because we think
  • the recursive tree is a better way of looking at.
  • But you can see that, depending on what that function is,
  • in terms of the work being done in the merge routine,
  • you'd have different versions of recurrences.
  • I'll stick around, and people who answered questions, please
  • pick up you cushions.
  • See you next time.

Download subtitle

Description

MIT 6.006 Introduction to Algorithms, Fall 2011
View the complete course: http://ocw.mit.edu/6-006F11
Instructor: Srini Devadas

License: Creative Commons BY-NC-SA
More information at http://ocw.mit.edu/terms
More courses at http://ocw.mit.edu