## Colorado State University — Pueblo, Fall 2018 Math 345, Algorithms and Data Structures Course Schedule & Homework Assignments

Here is a link back to the course syllabus/policy page.

This schedule is will be changing very frequently, please check it at least every class day, and before starting work on any assignment (in case the content of the assignment has changed).

If you see the symbol below, it means that class was videoed and you can get a link by e-mailing me. Note that if you know ahead of time that you will miss a class, you should tell me and I will be sure to video that day for you.

#### Week 1

• :
• A lot of bureaucracy and introductions.
• Some content we discussed:
• Ideas of what an algorithm is
• Mention of the origin of the word algorithm in the name of the great Persian scholar Al-Khwarizmi, mathematician and astronomer at the House of Wisdom in the early 9th century, located in what is now Baghdad. [One of the words, al-jabr, in the title of Al-Khwarizmi's book on solving linear and quadratic equations also gave rise to the modern term algebra.]
• An example of a very naïve algorithm to scan through a list of courses in the univesity with the number of students enrolled in each, to compute the average class size for each department on campus, described in pseudocode:
Input:
• A list $\Cc$ of classes, each entry saying to which department the class belongs and how many students are registered.
• A list $\Dd$ of all departments at the university.
Output:
• A list $\Aa$ of consisting of pairs: a department name and the average number of students in that department's classes.
Naïve algorithm:
• set $\Aa$ to be the empty list
• for each d in $\Dd$:
• total_number_students = 0
• number_classes = 0
• for each c in $\Cc$:
• if c.deptartment == d:
• total_number_students += c.students
• number_classes += 1
• append to $\Aa$ the pair [d, total_number_students/number_classes]
[Note: the notation x += y is shorthand for x = x + y .]
• Starting discussion of how long it would take for this algorithm to run, at least by counting the number of steps
• :
• Some content we discussed:
• We have a plan for the content of the week, which is:
• Different ideas of efficiency for algorithms: time complexity, in which we compute how long it will take for an algorithm to run [in some idealized sense where each simple statement, or piece thereof, in a computer language is thought of as taking a certain unit of time; note this is not completely realistic, since some operations actually take much more clock time than others], and space complexity, which measures the amount of computer memory an algorithm would use. We shall most frequently be concerned with time complexity in this class.
• When discussing time complexity, we usually have these considerations in mind:
• We want to know how the time complexity [=running time!] depends upon the input size $n$.
• We are interested in having an upper bound, which will tell us that the running will be no more than a certain amount — which will be a formula in terms of $n$.
• We are also interested in the behavior of the upper bound as the input size $n$ tends to infinity. That is, we don't care so much how much time the algorithm will take for very small problem sizes. Instead we care what might happen if we have to solve the problem for a huge input size... and then even if we're running on a huge input, what if it gets even bigger In short, we want to know the behavior in the limit as $n\to\infty$.
• When giving such a formula for an upper bound, we tend not to care about multiplicative constants in the formula. That is, $n^2$ and $\frac12n^2$ would be pretty much the same bound. The reason for this is that practical experience tells us that many implementation details, such as which computer we use, what language, and the intricacies of how the algorithm is realized in code, will change the run time by such a constant factor. Therefore, the time complexity of the underlying algorithm (and not its concrete realization in code) should be described in some way that doesn't care about multiplicative constants. This motivates the following definition (in which we should think of the $f(n)$ as the actual, complicated formula for how long the algorithm takes for input of size $n$ and $g(n)$ is some generalized measure of complexity which we are using as an upper bound).
• Definition: For two positive functions $f(n)$ and $g(n)$, we say that $f(n)=O(g(n))$, pronounced "$f(n)$ is big Oh of $g(n)$", if there exist some numbers $k$ and $C$ such that for all $n>k$ the inequality $f(n)\le C g(n)$ is true.
In other words, for all sufficiently large $n$ (those bigger than the cut-off $k$), $f(n)$ is bounded above by a constant $C$ times $g(n)$ (which constant $C$ doesn't depend upon $n$).
• Examples:
• $x+1=O(x)$; similarly, $100x+17=O(x)$.
• $x=O(x^2)$; although this a pretty stupid bound to use for $x$, since $x=O(x)$, also.
• Common time complexities in computer science include $O(1)$, $O(n)$, $O(n^{1+\epsilon})$, $O(n^2)$, $O(2^n)$, $O(n\log n)$, ....
• Some discussion of the median in statistics. Remember, it is the middle number in a dataset, after it has been put in increasing [or decreasing] order if there are an odd number of data values; if there are an even number of data values, the median is the average of the middle two values of the dataset, again after it has been put in increasing [or decreasing] order.
• So the importance of the process of putting a dataset in numerical order appears for the first time in this course: let us start to SORT!
• A very naïve algorithm for SORT would do the following:
1. Find the minimum element in the list of values and switch it with the first element of the list.
2. Now repeat the above with the part of the list from the second element to the last, then the third to last, etc.
Here is pseudocode for that algorithm:
Input:
• A list $\Aa=\{a_1,\dots,a_n\}$ of numerical values (let's say real numbers or integers), in no particular order.
Output:
• An reordering of $\Aa$ into the list $\Bb=\left(b_1,\dots,b_n\right)$ satisfying $a_1\le\dots\le a_n$.
Naïve algorithm:
• for i in range(0,n):
• find the minimum value in $\left(a_i,\dots,a_n\right)$ — say it is at location j
• switch contents of locations i and j
• Please send your instructor an email at jonathan@poritz.net to show that we can communicate by email. Please send it from an account you use regularly so that I will then have in my contacts a good way to get in touch with you.
• :
• Read before class any (or all, if you like) of the following for definitions of the "big Oh" notation: [In many ways, the last one is the best; others are pretty good, too, though.]
Note: In any of these sources, you don't need to look at the other, related, notations like little Oh, omega, etc.
• Some content we discussed:
• Note that $f(n)=O(1)$ means that, from a certain point onward (on the $x$-axis), the function $f(n)$ is bounded above by a constant.
• Useful Python syntax:
from random import *
allows you then to use things like the program
randint(begin,end)
which generates one integer in the range from begin to end (inclusive!). (Of course, if you just wanted this one program randint, you could use instead
from random import randint
• More useful Python syntax:
[<some expression, optionally using the variable i> for i list]
which basically runs the loop
for i in list:
<some expression, optionally using the variablei>

and puts the values into a list. Note: often, this is used with the list being range(n) for some n. For example:
[1 for i in range(10)]
makes a vector of ten 1's, while
[i**2 for i in range(10)]
makes the vector
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
and
[randint(0,5) for i in range(3)]
makes a vector whose three entries are randomly chosen integers in the range from 0 to 5.
• We wrote Python code to implement the naïve algorithm from this Monday which computed the average class sizes for each department at a university.
For this implementation, we assumed the departments were given small integer code numbers, not names. So the list $\Dd$ of all departments was of the form range(n) where n was the number of departments, and their code numbers were from 0 to n-1.
We were then able to generate toy data [i.e., randomly chosen data which could be used to test the code, by the command
classlist=[[randint(0,4),randint(1,100)] for i in range(100)]
making 100 "class records" with department code numbers from 0 to 4 and class sizes from 1 to 100
• But that approach is absurdly slow: it goes through the class list way too many times! Can you think of a way to do this which only goes through the class list once? Does your new method have the same space complexity?
• :
• We will be using matplotlib to do basic plotting. If you don't have it on your personal computer, the easiest way to install it seems to be with the program pip; I don't know much about it, though, so asking your classmates for suggestions is probably your best bet — along with, of course, just searching the Internet.
• Once matplotlib is install, you will find this is a nice tutorial about how to use it for basic plotting.
• Here is the most basic thing we will do with this package for plotting:
• First, do
import matplotlib.pyplot as plt
• Put the $x$ coordinates of some interesting data values into a list x and the corresponding $y$ coordinates into y. [It is important that len(x) is the same as len(y).]
• Create a basic plot of the data with
plt.plot(x,y)
• Display the plot with
plt.show()
• Some basic formatting notes:
• That basic command just makes lines connecting the points, in the order they appear in the lists of $x$ and $y$ coordinates.
• If you want just to show the data points themselves, without the connecting line, do something like
plt.plot(x,y,"o")
which will make round dots at the locations of the data points. You can also use "+", "*", and "^" to get data points represented as a plus, a star, or a triangle, respectively.
• In addition to specifying the format of the markers of data points, you can specify the type of the lines connecting them: giving a format string witout any mention of lines will make a plot without lines; using "_" will make normal lines, while ":" will make dotted lines.
• You can specify the color of the point markers and/or lines by adding a color specifier to the format string, where "b" is blue, "k" is black, "r" is red, "g" is green, etc.
• If you call the plt.plot function more than once, it will display the separate datasets, with whatever formatting you specify, on the same graph. If the formatting does not specify colors, then succesive calls to plt.plot will be automatically assigned different colors.
• Therefore, the commands
plt.plot(x,y,"go")
plt.plot(x,y,"r:")
will first plot the data points with green [round] dots, then connect those dots with a dotted red line.
• To clear the current plot, so you can start building up an interesting new one, you can use
plt.clf()
• A basic way to get timing information about a program in Python is to save the current time, run the program, save the current time again, and print out the difference of the two times. A useful tool for this is the time.time() program, which returns the number of seconds since the Unix epcoh, which was 00:00:00 UTC on 1 January 1970. You would use this as follows:
import time
start=time.time()
<do some work>
end=time.time()
print(end-start)
• :
• Some content we discussed:
• An issue we haven't talked very much so far, but which is obviously very important, is that the algorithms we design and then implement in code should be correct. We often take a fairly mathematical approach in which we essentially make a proof of program correctness.
• Proving program correctness obviously becomes harder the more complex the program is, particularly its control flow. We look first at one of the simplest types of program statements which are not simple declaratory or computational ones, the loop. A common approach for loops is to find a loop invariant, that is, a [logical or mathematical] statement which is true about the state of [all the variables of] the program before and after each iteration of the loop.
• Why do loop invariants work? Because of The Principle of Mathematical Induction.
• Some notation we shall use in class, and future assigned readings may use:
• $\forall x \in P(x)$ — means that the statement $P(x)$ is true for all values of $x$; the $\forall$ is pronounced for all Often there is some restriction, like $\forall x>17 P(x)$, which means $P(x)$ is true for all $x$ which are greater than 17.
• $\sum_{i=a}^b f(i)$ or $\displaystyle\sum_{i=a}^b f(i)$ — this formula computes the value sum also found by this code snippet:

sum=0
for i in range(a,b+1)
sum += f(i)
• Discussion of how it would be great to send your HW solutions, even the ones which have some calculations and special [mathematical] symbols to your instructor by e-mail. The way to include interesting typesetting in an email is to use the notation of the LATEX mathematics typesetting system. Here is a nice introduction to LATEX, while here is a reasonable tutorial — but there are many other free LATEX resources, including the program itself and many thousands of additional packages to make it do additional, wonderful things.
• Today [Friday] is the last day to add classes.

#### Week 2

• :
• The divide and conquer approach to algorithm design, which approach has a quite simple structure.
To solve a complex problem — or, at least, a problem which seems computationally expensive on large input — do the following:
1. break up the original problem into smaller problems which are similar to the original one;
2. or, if the problem actually is small enough to solve immediately by consideration of some special case, just report the solution;
3. solve those smaller problems as new instances of the whole algorithm;
4. combine the solutions of the smaller problems into one which solves the original, larger instance.
• Very often, we use recursion rather than looping to solve problems with the divide and conquer. With recursion, the program which implements an algorithm calls itself ... which seems like a recipe for an infinite loop, but in fact is not: every time the program calls itself, the new version is running on an instance of the problem which is strictly smaller than the original. Since
• the starting input was not infinitely large,
• each layer of the recursion is on applied to a problem which is smaller than the previos by an amount which is not getting smaller and smaller, and
• a particular instance of the problem will not generate a new layer of recursion once it is down to a small enough size that the special case immediate solution applies
• the proram will terminate in a finite time.
If the above considerations did not apply, then either one would have to deal with getting infinitely far with finite steps, or else with an instance of Zeno's Paradox. Therefore, it is important to check that they are true in any real-world problem and proposed algorithm/implementation.
• Digression on Zeno's Paradox, just because it's fun.
• Examples of the divide and conquer approach in action, including:
1. Computing values which are in fact defined recursively, such as the factorial or the Fibonacci sequence.
2. Finding the minimum of a numerical dataset.
• Actually, this one is so simple it is unclear if recursion or looping is best.
3. The "naïve sorting algorithm" from last Monday can be thought of as a [very simple version of] a divide and conquer approach.
4. Starting merge sort.
• HW1: either hand or email to your instructor the following:
1. Do any single problem from the Exercises for §1.1 here.
2. Suppose $f(n)=O(g(n))$ and $g(n)=O(f(n))$. Does that mean that $f(n)=g(n)$ for all $n\in\NN$? If so, explain why. If not, give a counter-example.
3. Write a Python program which implements each of the naïve algorithms discussed in Week 1: the average class size computation from Monday and the sorting algorithm from Tuesday. Please make readible code, which has comments and reasonable variable names. Include a way of generating both small toy data (for which the correctness can be checked by eye) and large toy data (for which we can see if the program takes an impossibly long amount of time).
Email the code to your instructor at jonathan@poritz.net.
4. As part of a proof of correctness of the naïve sorting algorithm we've been disucssing, find a loop invariant for the main loop of the pseudocode we used to describe it.
Extra credit: Use the loop invariant method and the Principle of Mathematical Induction to prove that the algorithm we described does indeed correctly sort a collection of $n$ number, for any $n\in\NN$.
Note: if you don't finish every part of this homework assignment, that's ok, you can turn in the rest as part of HW2, next Monday.
• :
• Read §11.1.1 here and/or Merge Sort [ignore the section "Analysis" in the second of these references]
• Some hands-on coding of merge sort, of some ways to collect timing data on running programs, and graphing the results.
• :
• There are lots of tutorials on how to use matplotlib, just use your favorite search engine to get one if you need a little help with this. — Or, email your instructor for some personalized suggestions!
• Start working on HW2, due this coming Monday — it's quite long and you can use these class times to work on it (even working together, which is fine, so long as you each turn in your own work).
• :
• Keep working on HW2. Two thoughts:
1. You should make sure you have a Python environment in which you can work at home, just as you do at school. So, spend a little timet to install matplotlib on your personal device, if it is not already installed, and to make sure that you can develop and run things both at school and at home.
2. If you get stuck at any point on the HW, don't hesitate to email your instructor. You may not get a particularly speedy response, particularly during the week, but at least on the weekend there should be faster help available this way.
• :
• Keep working on HW2. Wouldn't it be nice to be finished even before the weekend?

#### Week 3

• :
• HW2: either hand or email to your instructor the following:
1. It's time we stopped simply calling the SORT algorithm discussed above on the second day of this class the naïve sorting algorithm — in fact, this is called selection sort.
In this exercise, you will study the time complexity of selection sort both theoretically and in your Python implementation.
1. Look over your Python code for selection sort and derive as precise a formula for the time complexity in theory of your implementation as you can. That is, look at your code, line by line, and see how much computation it has to do. You may assume that each basic operation takes one unit of time. This means
a=b
takes one unit of time, while
a=b[j]
takes two units — one for the determination of the memory location defined by b[j], which usually involves addition of j times the memory size of an integer to the address in memory of the beginning of the list b, and one for the assignment to the variable a; similarly,
a=b[j+k]
takes three units of time, because it also has to do the addition j+k.
Building on this, build your careful formula for the time complexity of your program.
Note that when there are if statements in your code, the program may take a different amount of time depending upon the result of that test. So you should actually find two time complexity formulæ:
• A worst case complexity, where you just assume that each time there is an if statement, it comes out in whichever way makes for the most computation.
• A best case complexity, where you just assume that each time there is an if statement, it comes out in whichever way makes for the least computation.
2. Take your two complexity formulæ (best and worst case) and then for each give the simplest function $f(n)$ you can find with the property that your careful complexity formulat is $O(f(n))$. Explain how you know this simple formula works as desired.
3. Extra credit: Give an example of an input of size $n$ to your selection sort program which would make it have the worst case running time. Next, give an example which would make it have the best case running time. Of course, explain how you know that your answers do what you claim they do.
4. Next write a program that takes inputs $n$ and $m$ and does the following: $m$ times it generate a random input of toy data of size $n$, does selection sort on the toy data, and computes the average over those $m$ times of the seconds it took to do the sorting. Be careful: when adding up the times of doing those $m$ sorts, make sure you only count the time of sorting, not the time it took to generate the toy data.
5. At this point, pick a pretty large value for $m$ (the number of runs you are averaging together to get the average running time on input size $n$), such as 100 or 1000 (don't pick one which makes this run so long you have to wait around a while for it to finish), then make two lists:
x, being the values of $n$ for which you run the average running time on inputs of size $n$ ... use something like x=[5, 10, 15, 20, ...] up as high as you have patience for (it might be nice to have a couple of very large values, like 100 or 500 or 1000) and y, the corresponding average runtime for inputs of those sizes. x and y should both be lists of around 20 or 30 values.
Make a graph of y vs x.
6. Finally, on the same graph you just made, graph the function $f(n)$ that you found above, multiplied by some constant so that it does show how the running time is $O(f(n))$.
2. Do exactly the same thing you just did for selection sort [with all the steps and parts!], only now for merge sort.
3. For selection sort, do a careful examination to figure out a precise formula for the space complexity of your program: that is, a formula telling how much memory it uses on inputs of size $n$.
Is there a difference between best and worst case space complexity? If so, give two different formulæ; if not, explain why not.
Give a simple formula $f(n)$ so that the space complexity you just found is $O(f(n))$ or, if you had different best and worst case space complexities, give two such functions, $f(n)$ and $g(n)$.
4. Do exactly the same thing you just did for the space complexity of selection sort, only now for merge sort.
5. Extra, extra credit: Can you prove that the merge sort algorithm we've been using does what it is supposed to do? You can't quite use a loop invariant, because we're using recursion rather than looping, but a quite similar approach should work.
• Some discussions of HW2 issues and working to get matplotlib working on one student's personal laptop — for which this site was very helpful.
• Today [Monday] is the last day to drop classes without a grade being recorded.
• :
• Some individualized discussion of assymptotics of recursive algorithms — in particular, merge sort.
• :
• Review of big-O notation, discussion of related HW1 problem.
• Discussion of big-$\Omega$ and big-$\Theta$ notations.
• Mentioning The Master Theorem for assymptotics of recursive algorithms. Here is a nice reference for this theorem [or here is another place to find that same document, if the first one should happen to disappear from the 'net]. But note that your instructor mentioned in class that it is hard [and not so important] to keep the precise details of the theorem in your head. However:
• It is a good idea to be able to know how to check if the theorem does apply in a particular situation, and then to be able to state what conclusion comes from that application.
• E.g.: check for yourself if the Master Theorem applies to the work we've been doing for merge sort and, if so, what it tell us.
• :
• Read the resource linked yesterday about The Master Theorem and read (or at least skim) either this resource, this one, this one, or this one, on object-oriented programming in Python.
• In math and computer science, it is best to go for a mixture of text and equations/diagrams/code. Too many words misses out on the precision of equations — e.g., better than saying "yadda-yadda happens for all integers" is to say "$\forall k\in\ZZ$ we have $S(k)$" [where $S(k)$ is the precise statement you are going for, particularized to the integer $k$ and probably having or being an equation.
• Explanations in math and computer science tend to be structured more like recipes in cookbooks than like good essays or other well-written prose. For example, an induvtion proof should have certain pieces in a certain order (stating clearly what the pieces are and why!), and that's just fine.
• In this class, at least, you must always justify your work. E.g., if the assignment is "give an example of an even prime number," the answer "2" is woefully incomplete, while "2, because it is even, being clearly divisible by 2, and a prime, since the only natural numbers it has as divisors are 1 and 2, simply becuase they are clearly divisors and there is no room for other divisors $d$ which satisfy $1\le d\le 2$."
• In order to make sure that the work we did yesterday about the time complexity of merge sort was valid, we looked fairly carefully at the steps it must perform, following the pseudocode here. Everything seemed to agree with what we did yesterday, except for the possible case of appending an element to the end of a list, and deleting an element from the beginning of another list.
If both of these processes take $O(1)$ time, we're OK; if it somehow takes more and more time to append to a larger and larger list, or to delete entries from a very large list, then we would be in trouble.
Therefore we wrote code to test this. That is, it should:
• Loop up to some very large number $N$.
• At each step, it should time the operation of appending one small item [like a small integer] to the end of a single list [which would therefore be getting very large by the end of the main loop].
• Make a plot using matplotlib of the time to append an item (as $y$) against the size of the list to which that item is being appended (as $x$).
• Joey pointed out that the time for a single append might be so small that it simply always registered as zero. So another thing to do would be to time how long it takes to append 100 items, or 1000, or some number like that, to the list which is getting longer and longer, with a small loop inside the main loop.
• You should get a picture like
But make sure you can do that yourself!
• :
• Discussion about how it is a very nice feature of what some call "rapid prototyping languages" like Python that one can very quickly produce small bits of running code, to test new ideas and to see how well some approach performs. The timing graph from yesterday's class is an example.
In order to be able to take advantage of this kind of rapid prototyping, is important to have a fairly high comfort level with Python syntax and a few other tools (such as some methods in the random and time modules), as well as with the structures in which you are interested (such as different sorting algorithms in this unit of class).
If you do not feel you have that comfort level with any of those necessary pieces, please find some time to sit down with your instructor, soon, to figure out how to get there!
• Syntax to switch variable values in Python: oddly enough,
x,y=y,x
works, as does
a[i],a[j]=a[j],a[i]
as might appear in one of our sorting algorithms. This is very convenient, use it!
• Some object-oriented work in Python; see the links from yesterday. In particular, discussing making singly linked lists, and the complexity of adding to or deleting from the head or tail of the list.
• Insertion sort, yay!

#### Week 4

• :
• We've somehow gotten way behind on doing T&Qs!
Recall from the syallabus that these are short emails you should send to your instructor about the current readings you've been assigned and/or the material we've been discussing in class.
Their content should be a thought or question you might have about the topic in question, expressed in just a few sentences (or even one, very well crafted sentence!).
There is no requirement about depth or sophistication of your T&Qs, other than that they show real engagement with the material. For example, something like
"I think American politics suck."
"This article says that the Electoral College has its roots in the racist origins of the US Consstitution, and while I can see that is important, I don't see any path to changing the way we elect presidents in the near future."
does.
Also, don't be embarrassed if your T&Q is a question and not a statement. If there is something in the assigned T&Q topic which you don't understand, state clearly and explicitly what that confusing bit is. E.g.,
"This article talks about the problem with the Electoral College being that one person does not have one vote, but I don't understand the reasoning there, or the example of big states and small ones — don't big states have more Electoral College votes, so what's the problem with that?"
would be fine [on the same current event topic].
Each T&Q is graded 0 or 1 ... you should expect to get pretty much all 1s on your T&Qs, so long as you are actually engaging with the readings and class material, and keeping on schedule with them.
With that understood, please do the following T&Qs (send each in its own email to your instructor) before class on Monday (Tuesday is also OK, if this seems like too much, too soon, or if you have a question you want to ask in class about T&Qs on Monday):
• HW3: Implement insertion sort in Python.
• Use plain Python lists — no object-oriented monkey business with linked lists, as we had discussed in class on Friday.
• Write also code to collect data and display a graph of data on the time-complexity of your implementation [you should be able to steal a bit of your own code from HW2 for the complexity piece!].
• Send your code to your instructor as a <something>.py file attached to an email. As always, when sending code to someone else, make sure your variable names are informative and you have comments in your code. Your instructor should be able to execute your code and see the compleity graph[s], so there is no need for any other text or explanation beyond the code itself and its output.
• Some content we discussed:
• Using LATEX a little more seriously:
• Consider Overleaf, an online LATEX editor, if you don't have your own LATEX on your personal computer.
• It is very easy to put LATEX on web pages, just sneak the line
<script src='https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/latest.js?config=TeX-MML-AM_CHTML async></script>
between the <head> and </head> at the beginning of the HTML file. Then you can use LATEX snippets in the <body> of your HTML file, and it will get automagically processed into beautiful mathematics, courtesy of the MathJax display engine. E.g., this very page used
\$\forall x P(X)\$ ...
where it said (above, on the first Friday of the semester)
$\forall x P(x)$ ....
• Going over a bunch of the issues that came up in T&Qs.
• Quick description of bubble sort
• :
• Read either this resource, this one, or this [much longer] one, or this one-page cheat sheet and submit T&Q6 on this topic.
• Some content we discussed:
• Using numpy: installing, basic use and features.
• Comparing numpy with other kinds of data structure, such as native Python lists and our own linked lists made with Python classes.
• In-class activity: take your code for insertion sort and change it to use numpy arrays, rather than native Python lists. Then do a timing graph (as we've done lots of times in the past) of the time complexity of this version as a function of input size, with different-sized (larger and larger) toy input data.
• Please email whatever you have at the end of class to your instructor -- code, graphs, code that makes graphs, etc.
• :
• Selecting elements in numpy arrays
• In-class activity: follow these steps:
1. Make a numpy array of integers from 2 to some limit $n$.
2. Select all elements of the array which are either in location less than or equal to 0 or are not multiples of what is in location 0 (which should be the integer 2), and update the array to have as its new value that subset of values.
3. Now do the same thing with new verision of the array, but for location 1.
4. Now make a loop that does this all the way through the array. What is a good criterion for when to stop?
5. What have you just done? Do you know what it's called?
6. Write some code which makes a time complexity graph of the above program as a function of the input number $n$.
• :
• Read this resource, this one, or this one, on operator overloading and submit T&Q7 on this subject.
• In-class (or at home, or in a room elsewhere on campus: your choice, since your instructor will not be physically present in the classroom today!) activity: make your own Python code for singly and doubly linked lists, then make versions of insertion sort which use each of these structures instead of native Python lists. Then build a time complexity graph (as we have done many times before!) for the performance of these new versions as you make the toy input data larger and larger.
• You will eventually have to email email whatever you make to your instructor -- code, graphs, code that makes graphs, etc. -- so make sure you keep a copy.
• :
• No class meeting — a good opportunity to get caught up on things which will need to be handed in.

#### Week 5

• :
• Read the first sections of this article on the stack &mdash only up to (and not including) the section labeled Applications of stacks, and submit T&Q9 on this subject.
• HW4 is due, consisting of:
1. On Tuesday last week you made (or, at least, started to make) a version of insertion sort using numpy arrays, along with code to test it for larger and larger input sizes and to make the resulting time complexity graph. Send this code to your instructor now!
2. Then on Thursday last week you made (or, at least, started to make) a version of insertion sort using linked listss, along with code to test it for larger and larger input sizes and to make the resulting time complexity graph. Send this code to your instructor as well.
3. Make one additional graph that superimposes all time complexities you have computed for insertion sort: with Python lists, with numpy arrays, and with your own linked lists. Send the code for this to your instructor.
4. On Wednesday last week you made (or, at least, started to make) code which implements the Sieve of Eratosthenes, and also to collect and graph time complexity information about this program for different. Send this code to your instructor.
Notes:
• Make sure that whenever you send code, it has good comments and variable names, code to generate toy test cases and graphs of time complexities. Your instructor should be able to cut and paste your code into a Python environment and then both fiddle with the basic code and, by running some program you have supplied, generate the graphs you were asked to make.
• You might want to make a legend when graphing several different things on the same graph. The way to do this is to put the argument "label='some text'" in each call to plot and, have an additional command plt.legend(loc='upper left') before your plt.show(). Here is a simple example:
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(0,20,1000)
y1 = np.sin(x)
y2 = np.cos(x)
plt.plot(x, y1, '-b', label='sine')
plt.plot(x, y2, '-r', label='cosine')
plt.legend(loc='upper left')
plt.show()
• Discussion of some points which came up in T&Qs.
• Discussion of some uses of stacks, particularly to be a place to put local variables in instances of programs when they recurse.
• A thematic move: our course will move forward a bit in the themes of material we cover, in that we will start to discuss searching, having been working on sorting for a while.
• As an example of a searching problem, here is a bit of code to do a guessing game which searches for a hidden random number:
import random
def guess(n=1023):
if n < 1:
print("Input must be a natural number")
return
number = random.randint(0,n)
guesses = 0
while True:
guess_text = input("Guess the number from 0 to %d: " % n)
if guess_text == "I give up":
print("Give up, eh?  Well, my number was %d." % number)
return
guess = int(guess_text)
if guess < 0 or guess > n:
print("Your guesses must be integers in the range from 0 to %d." % n)
continue
guesses += 1
if guess > number:
print("Too high")
elif guess < number:
print("Too low")
else:
print("Got it (in %d guesses)!!" % guesses)
return
Try running this to play the game — come up with a good strategy!
• In class, we discussed the binary search strategy, which repeatedly guesses the midpoint of the possible region, and then recursing to work on the lower or upper remaining half, depending upon the answer to the guess, "too high" or "too low."
• We mentioned how the time complexity of binary search is $O(n)$. This is in contrast with the most naïve searching approach — which is just to go through all the possible values and ask if it is correct — which has time complexity $O(n)$.
• In-class activity: write code which implements binary search. You may use the following class:
class Quizzer:
def __init__(self, max_val):
import random
self.n = random.randint(0,max_val)
self.max_val = max_val
def try_guess(self, k):
if k < 0 or k >self.max_val:
return 17
if k < self.n:
return -1
elif k > self.n:
return 1
else:
return 0
def __str__(self):
return str(self.n)
For example, it can be used as in the following
q=new Quizzer(65635)
for i in range(65535):
ans=q.try_guess(i)
if ans == -1:
print("Too low, trying again")
elif ans == +1:
print("Too high, how did that happen?")
elif ans == 0:
print("Got it!  Answer was %d" % i)
break
else:
print("weird.  quitting")
break
which implements the naïve searching algorithm.
But you should implement binary search! Do so, and send your nice code to your instructor.
• :
• Discussing worst case, best case, and average case complexity, in particular for the two searching methods we discussed yesterday: the naïve and binary search algorithms.
• Important topic (with examples): hen talking about average case complexity, it is very important to think about average over what population, or, even more precisely, average over what probability distribution.
• Adapting our binary search strategy from last class to situations when we're not searching among all integers from $0$ to $n$, but rather in a more random or sparse set of values. What changes need to be made?
• :
• In-class activity to test your binary search algorithm on input which is not just all integers from $0$ to $n$.
• Write some code which takes input $n$ and $p$ and generates $n$ numbers which lie in some randomly chosen interval of integers $[a,b]$, where $(b-a)/n=p$ (approximately).
• Write code which picks a random element from the above "sparse interval."
• Combine with your binary search to see if it all works.
• Make a time complexity graph.
• Starting to talk about binary trees.
• :
• Read, in Open Data strutures, the introduction to Chapter 6, here, and its continuation as § 6.1, and submit T&Q10 on this subject.
• In class we talked about many of the things we will code tomorrow.
• :
• The depth of a node in a tree is the number of links from the root to that node.
• The height of a node in a tree is the largest number of links from that node to a leaf.
• The height of a tree is the height of its root. Equivalently, it is the maximum depth of any node of that tree.
• It's time to make some code for trees. [This will be a large part of the homework due next Tuesday.]
In particular, you could use this for the basic structure of a tree node:
class BSTnode:
def __init__(self,data,left=None,right=None):
self.data = data
self.left = left
self.right = right
def is_leaf(self):
return self.left==None and self.right==None
This code includes the possibly convenient method that tells if a node is a leaf, is_leaf().
Note that sometimes it is convenient to have each node "know" its parent node. That would be with a slightly improved class definition:
class BSTnode:
def __init__(self,data,parent=None,left=None,right=None):
self.data = data
self.parent = parent
self.left = left
self.right = right
def is_leaf(self):
return self.left==None and self.right==None
def is_root(self):
return self.parent==None
This version of the code includes the possibly convenient method that tells if a node is the root of the tree, is_root().
Now please write the following methods for that class:
• insert inserts a new data value into a tree
• max_data_val finds the maxium value of a subtree starting at a certain node
• min_data_val finds the minimum value of a subtree starting at a certain node
• verify checks if a binary search tree is well-formed
• delete deletes a node
• depth_first_print prints data values in depth-first order
• ordered_print prints data values in increasing order
• depth returns the depth of a node
• height prints the height of a node
• height prints the height of a tree

#### Week 6

• :
• Some content we discussed:
• :
• Some content we discussed:
• :
• Some content we discussed:
• :
• Some content we discussed:
• :
• Some content we discussed:

#### Week 7

• :
• Some content we discussed:
• :
• Some content we discussed:
• :
• Some content we discussed:
• :
• Some content we discussed:
• :
• Some content we discussed:

#### Week 8

• :
• Some content we discussed:
• :
• Some content we discussed:
• :
• Some content we discussed:
• :
• Some content we discussed:
• :
• Some content we discussed:

#### Week 9

• :
• Some content we discussed:
• :
• Some content we discussed:
• :
• Some content we discussed:
• :
• Some content we discussed:
• :
• Some content we discussed:
• Today [Friday] is the last day to withdraw (with a W) from classes.

#### Week 10

• :
• Some content we discussed:
• :
• Some content we discussed:
• :
• Some content we discussed:
• :
• Some content we discussed:
• :
• Some content we discussed:

#### Week 11

• :
• Some content we discussed:
• :
• Some content we discussed:
• :
• Some content we discussed:
• :
• Some content we discussed:
• :
• Some content we discussed:

#### Week 12

• :
• Some content we discussed:
• :
• Some content we discussed:
• :
• Some content we discussed:
• :
• Some content we discussed:
• :
• Some content we discussed:

#### Week 13

• :
• Some content we discussed:
• :
• Some content we discussed:
• :
• Some content we discussed:
• :
• Some content we discussed:
• :
• Some content we discussed:

#### Week 14

• Thanksgiving Break! No classes, of course.

#### Week 15

• :
• Some content we discussed:
• :
• Some content we discussed:
• :
• Some content we discussed:
• :