Colorado State University — Pueblo, Fall 2018
Math 345, Algorithms and Data Structures
Course Schedule & Homework Assignments
Here is a link back to the course
syllabus/policy page.
This schedule is will be changing very frequently, please check
it at least every class day, and before starting work on any assignment
(in case the content of the assignment has changed).
When there is a reading assignment, please read the named
section(s) before that day.
If you see the symbol below, it means that class was videoed and you can
get a link by emailing me. Note that if you know ahead of time that
you will miss a class, you should tell me and I will be sure to video
that day for you.
Week 1

:

A lot of bureaucracy and introductions.

Some content we discussed:

Ideas of what an algorithm is

Mention of the origin of the word algorithm in the
name of the great Persian scholar AlKhwarizmi, mathematician
and astronomer at the House of Wisdom in the early
9^{th} century, located in what is now Baghdad.
[One of the words, aljabr, in the title of
AlKhwarizmi's book on solving linear and quadratic
equations also gave rise to the modern term algebra.]

An example of a very naïve algorithm to scan through a
list of courses in the univesity with the number of students
enrolled in each, to compute the average class size for each
department on campus, described in pseudocode:
Input:

A list $\Cc$ of classes, each entry saying to which
department the class belongs and how many students
are registered.

A list $\Dd$ of all departments at the university.
Output:

A list $\Aa$ of consisting of pairs: a department
name and the average number of students in that
department's classes.
Naïve algorithm:

set $\Aa$ to be the empty list

for each d in $\Dd$:

total_number_students = 0

number_classes = 0

for each c in $\Cc$:

if c.deptartment == d:

total_number_students += c.students

number_classes += 1

append to $\Aa$ the pair
[d, total_number_students/number_classes]
[Note: the notation x += y is shorthand for x = x + y .]

Starting discussion of how long it would take for this
algorithm to run, at least by counting the number of steps

:

Some content we discussed:

We have a plan for the content of the week, which is:

Different ideas of efficiency for algorithms: time
complexity, in which we compute how long it will take
for an algorithm to run [in some idealized sense where each
simple statement, or piece thereof, in a computer language
is thought of as taking a certain unit of time; note this is
not completely realistic, since some operations actually take
much more clock time than others], and space
complexity, which measures the amount of computer
memory an algorithm would use. We shall most frequently be
concerned with time complexity in this class.

When discussing time complexity, we usually have these
considerations in mind:

We want to know how the time complexity [=running time!]
depends upon the input size $n$.

We are interested in having an upper bound, which
will tell us that the running will be no more than a
certain amount — which will be a formula in terms
of $n$.

We are also interested in the behavior of the upper
bound as the input size $n$ tends to infinity. That is,
we don't care so much how much time the algorithm will
take for very small problem sizes. Instead we care what
might happen if we have to solve the problem for a huge
input size... and then even if we're running on a huge
input, what if it gets even bigger In short, we
want to know the behavior in the limit as $n\to\infty$.

When giving such a formula for an upper bound, we tend not
to care about multiplicative constants in the formula.
That is, $n^2$ and $\frac12n^2$ would be pretty much the
same bound. The reason for this is that practical
experience tells us that many implementation details, such
as which computer we use, what language, and the
intricacies of how the algorithm is realized in code,
will change the run time by such a constant factor.
Therefore, the time complexity of the underlying algorithm
(and not its concrete realization in code) should be
described in some way that doesn't care about
multiplicative constants. This motivates the following
definition (in which we should think of the $f(n)$ as the
actual, complicated formula for how long the algorithm
takes for input of size $n$ and $g(n)$ is some generalized
measure of complexity which we are using as an upper
bound).

Definition: For two positive functions $f(n)$
and $g(n)$, we say that $f(n)=O(g(n))$, pronounced
"$f(n)$ is big Oh of $g(n)$", if there exist
some numbers $k$ and $C$ such that for all $n>k$ the
inequality $f(n)\le C g(n)$ is true.
In other words, for all sufficiently large $n$ (those
bigger than the cutoff $k$), $f(n)$ is bounded above by
a constant $C$ times $g(n)$ (which constant $C$ doesn't
depend upon $n$).

Examples:

$x+1=O(x)$; similarly, $100x+17=O(x)$.

$x=O(x^2)$; although this a pretty stupid bound to
use for $x$, since $x=O(x)$, also.

Common time complexities in computer science include
$O(1)$, $O(n)$, $O(n^{1+\epsilon})$, $O(n^2)$,
$O(2^n)$, $O(n\log n)$, ....

Some discussion of the median in statistics.
Remember, it is the middle number in a dataset, after it
has been put in increasing [or decreasing] order if
there are an odd number of data values; if there are an
even number of data values, the median is the average of
the middle two values of the dataset, again after it has
been put in increasing [or decreasing] order.

So the importance of the process of putting a dataset in
numerical order appears for the first time in this course:
let us start to SORT!

A very naïve algorithm for SORT would do the following:

Find the minimum element in the list of values and
switch it with the first element of the list.

Now repeat the above with the part of the list from
the second element to the last, then the third to last,
etc.
Here is pseudocode for that algorithm:
Input:

A list $\Aa=\{a_1,\dots,a_n\}$ of numerical values
(let's say real numbers or integers), in no particular
order.
Output:

An reordering of $\Aa$ into the list
$\Bb=\left(b_1,\dots,b_n\right)$ satisfying
$a_1\le\dots\le a_n$.
Naïve algorithm:

for i in range(0,n):

find the minimum value in
$\left(a_i,\dots,a_n\right)$ — say it is at
location j

switch contents of locations i and j

Please send your instructor an email at
jonathan@poritz.net to
show that we can communicate by email. Please send it from an
account you use regularly so that I will then have in my
contacts a good way to get in touch with you.

:

Read before class any (or all, if you like) of the following
for definitions of the "big Oh" notation:
[In many ways, the last one is the best; others are pretty good,
too, though.]
Note: In any of these sources, you don't need to look at
the other, related, notations like little Oh, omega,
etc.

Some content we discussed:

Note that $f(n)=O(1)$ means that, from a certain point
onward (on the $x$axis), the function $f(n)$ is bounded
above by a constant.

Useful Python syntax:
from random import *
allows you then to use things like the program
randint(begin,end)
which generates one integer in the range from begin
to end (inclusive!). (Of course, if you just wanted
this one program randint, you could use instead
from random import randint

More useful Python syntax:
[<some expression, optionally using the variable i> for i list]
which basically runs the loop
for i in list:
<some
expression, optionally using the variablei>
and puts the values into a list. Note: often, this is used
with the list being range(n) for some
n. For example:
[1 for i in range(10)]
makes a vector of ten 1's, while
[i**2 for i in range(10)]
makes the vector
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
and
[randint(0,5) for i in range(3)]
makes a vector whose three entries are randomly chosen
integers in the range from 0 to 5.

We wrote Python code to implement the naïve algorithm
from this Monday which computed the average class sizes for
each department at a university.
For this implementation, we assumed the departments were
given small integer code numbers, not names. So the list
$\Dd$ of all departments was of the form range(n)
where n was the number of departments, and their
code numbers were from 0 to n1.
We were then able to generate toy data [i.e.,
randomly chosen data which could be used to test the code,
by the command
classlist=[[randint(0,4),randint(1,100)] for i in range(100)]
making 100 "class records" with department code numbers from
0 to 4 and class sizes from 1 to
100

But that approach is absurdly slow: it goes through the
class list way too many times! Can you think of a way to do
this which only goes through the class list once? Does
your new method have the same space complexity?

:

We will be using matplotlib to do basic plotting. If
you don't have it on your personal computer, the easiest way to
install it seems to be with the program pip; I don't know
much about it, though, so asking your classmates for suggestions
is probably your best bet — along with, of course, just
searching the Internet.

Once matplotlib is install, you will find this is a nice tutorial
about how to use it for basic plotting.

Here is the most basic thing we will do with this package for
plotting:

First, do
import matplotlib.pyplot as plt

Put the $x$ coordinates of some interesting data values
into a list x and the corresponding $y$ coordinates
into y. [It is important that len(x) is
the same as len(y).]

Create a basic plot of the data with
plt.plot(x,y)

Display the plot with
plt.show()

Some basic formatting notes:

That basic command just makes lines connecting the
points, in the order they appear in the lists of $x$
and $y$ coordinates.

If you want just to show the data points themselves,
without the connecting line, do something like
plt.plot(x,y,"o")
which will make round dots at the locations of the data
points. You can also use "+", "*",
and "^" to get data points represented as
a plus, a star, or a triangle, respectively.

In addition to specifying the format of the markers of
data points, you can specify the type of the lines
connecting them: giving a format string witout any
mention of lines will make a plot without lines; using
"_" will make normal lines, while ":" will make dotted
lines.

You can specify the color of the point markers and/or
lines by adding a color specifier to the format string,
where "b" is blue, "k" is black, "r" is red, "g" is
green, etc.

If you call the plt.plot function more than
once, it will display the separate datasets, with whatever
formatting you specify, on the same graph. If the
formatting does not specify colors, then succesive
calls to plt.plot will be automatically assigned
different colors.

Therefore, the commands
plt.plot(x,y,"go")
plt.plot(x,y,"r:")
will first plot the data points with green [round] dots,
then connect those dots with a dotted red line.

To clear the current plot, so you can start building up
an interesting new one, you can use
plt.clf()

A basic way to get timing information about a program in Python
is to save the current time, run the program, save the current
time again, and print out the difference of the two times. A
useful tool for this is the time.time() program, which
returns the number of seconds since the Unix epcoh,
which was 00:00:00 UTC on 1 January 1970. You would use this
as follows:
import time
start=time.time()
<do some work>
end=time.time()
print(endstart)

:

Some content we discussed:

An issue we haven't talked very much so far, but which is
obviously very important, is that the algorithms we design
and then implement in code should be correct. We
often take a fairly mathematical approach in which we
essentially make a proof of
program correctness.

Proving program correctness obviously becomes harder the
more complex the program is, particularly its control flow.
We look first at one of the simplest types of program
statements which are not simple declaratory or computational
ones, the loop. A common approach for loops is to
find a loop invariant, that is, a [logical or
mathematical] statement which is true about the state of
[all the variables of] the program before and after each
iteration of the loop.

Why do loop invariants work? Because of The Principle of
Mathematical Induction.

Some notation we shall use in class, and future assigned
readings may use:

$\forall x \in P(x)$ — means that the statement
$P(x)$ is true for all values of $x$; the $\forall$ is
pronounced for all Often there is some
restriction, like $\forall x>17 P(x)$, which means
$P(x)$ is true for all $x$ which are greater than 17.

$\sum_{i=a}^b f(i)$ or $\displaystyle\sum_{i=a}^b f(i)$
— this formula computes the value sum also
found by this code snippet:
sum=0
for i in range(a,b+1)
sum += f(i)

Discussion of how it would be great to send your HW solutions,
even the ones which have some calculations and special
[mathematical] symbols to your instructor by email. The way
to include interesting typesetting in an email is to use
the notation of the L^{A}T_{E}X mathematics typesetting system.
Here is a nice introduction to L^{A}T_{E}X, while
here
is a reasonable tutorial — but there are many
other free L^{A}T_{E}X resources, including the program itself
and many thousands of additional packages to make it do
additional, wonderful things.

Today [Friday] is the last day to add classes.
Week 2

:

Read §1.1 here.

The divide and conquer approach to algorithm design,
which approach has a quite simple structure.
To solve a complex problem — or, at least, a problem which
seems computationally expensive on large input — do the
following:

break up the original problem into smaller problems which
are similar to the original one;

or, if the problem actually is small enough to solve
immediately by consideration of some special case, just
report the solution;

solve those smaller problems as new instances of the whole
algorithm;

combine the solutions of the smaller problems into one which
solves the original, larger instance.

Very often, we use recursion rather than looping to solve
problems with the divide and conquer. With
recursion, the program which implements an algorithm
calls itself ... which seems like a recipe for an infinite loop,
but in fact is not: every time the program calls itself, the new
version is running on an instance of the problem which is strictly
smaller than the original. Since

the starting input was not infinitely large,

each layer of the recursion is on applied to a problem which
is smaller than the previos by an amount which is not getting
smaller and smaller, and

a particular instance of the problem will not generate a new
layer of recursion once it is down to a small enough size that
the special case immediate solution applies

the proram will terminate in a finite time.
If the above considerations did not apply, then either one would
have to deal with getting infinitely far with finite steps, or
else with an instance of Zeno's Paradox. Therefore, it is
important to check that they are true in any realworld problem
and proposed algorithm/implementation.

Digression on Zeno's Paradox, just because it's fun.

Examples of the divide and conquer approach in action,
including:

Computing values which are in fact defined recursively,
such as the factorial or the Fibonacci sequence.

Finding the minimum of a numerical dataset.

Actually, this one is so simple it is unclear if recursion
or looping is best.

The "naïve sorting algorithm" from last Monday can be
thought of as a [very simple version of] a divide and
conquer approach.

Starting merge sort.

HW1: either hand or email to your instructor the following:

Do any single problem from the Exercises for
§1.1 here.

Suppose $f(n)=O(g(n))$ and $g(n)=O(f(n))$. Does that mean
that $f(n)=g(n)$ for all $n\in\NN$? If so, explain why. If
not, give a counterexample.

Write a Python program which implements each of the naïve
algorithms discussed in Week 1: the average class size
computation from Monday and the sorting algorithm from
Tuesday. Please make readible code, which has comments and
reasonable variable names. Include a way of generating
both small toy data (for which the correctness can be checked
by eye) and large toy data (for which we can see if the
program takes an impossibly long amount of time).
Email the code to your instructor at
jonathan@poritz.net.

As part of a proof of correctness of the naïve sorting
algorithm we've been disucssing, find a loop invariant for
the main loop of the pseudocode we used to describe it.
Extra credit: Use the loop invariant method and
the Principle of Mathematical Induction to prove that the
algorithm we described does indeed correctly sort a
collection of $n$ number, for any $n\in\NN$.
Note: if you don't finish every part of this homework
assignment, that's ok, you can turn in the rest as part of
HW2, next Monday.

:

Read §11.1.1 here and/or Merge Sort [ignore the section "Analysis" in the second of these references]

Some handson coding of merge sort, of some ways to
collect timing data on running programs, and graphing the results.

:

There are lots of tutorials on how to use matplotlib,
just use your favorite search engine to get one if you need a
little help with this. — Or, email your instructor for
some personalized suggestions!

Start working on HW2, due this coming Monday —
it's quite long and you can use these class times to work on
it (even working together, which is fine, so long as you each
turn in your own work).

:

Keep working on HW2. Two thoughts:

You should make sure you have a Python environment in which
you can work at home, just as you do at school. So, spend a
little timet to install matplotlib on your personal
device, if it is not already installed, and to make sure
that you can develop and run things both at school and at
home.

If you get stuck at any point on the HW, don't
hesitate to email your instructor. You may not get a
particularly speedy response, particularly during the week,
but at least on the weekend there should be faster help
available this way.

:

Keep working on HW2. Wouldn't it be nice to be finished
even before the weekend?
Week 3

:

HW2: either hand or email to your instructor the following:

It's time we stopped simply calling the SORT algorithm
discussed above on the second day of this
class the naïve sorting algorithm — in
fact, this is called selection sort.
In this exercise, you will study the time complexity of
selection sort both theoretically and in your Python
implementation.

Look over your Python code for selection sort and derive
as precise a formula for the time complexity
in theory of your implementation as you can. That
is, look at your code, line by line, and see how much
computation it has to do. You may assume that each
basic operation takes one unit of time. This means
a=b
takes one unit of time, while
a=b[j]
takes two units — one for the determination of
the memory location defined by b[j], which
usually involves addition of j times the
memory size of an integer to the address in memory of
the beginning of the list b, and one for the
assignment to the variable a; similarly,
a=b[j+k]
takes three units of time, because it also has
to do the addition j+k.
Building on this, build your careful formula for the
time complexity of your program.
Note that when there are if statements in your
code, the program may take a different amount of time
depending upon the result of that test. So you should
actually find two time complexity formulæ:

A worst case complexity, where you just
assume that each time there is an if
statement, it comes out in whichever way makes for
the most computation.

A best case complexity, where you just
assume that each time there is an if
statement, it comes out in whichever way makes for
the least computation.

Take your two complexity formulæ (best and worst
case) and then for each give the simplest function $f(n)$
you can find with the property that your careful
complexity formulat is $O(f(n))$. Explain how you know
this simple formula works as desired.

Extra credit: Give an example of an input
of size $n$ to your selection sort program which would
make it have the worst case running time. Next, give
an example which would make it have the best case
running time. Of course, explain how you know that
your answers do what you claim they do.

Next write a program that takes inputs $n$ and $m$ and
does the following: $m$ times it generate a random
input of toy data of size $n$, does selection sort on
the toy data, and computes the average over those $m$
times of the seconds it took to do the sorting.
Be careful: when adding up the times of doing those
$m$ sorts, make sure you only count the time of sorting,
not the time it took to generate the toy data.

At this point, pick a pretty large value for $m$ (the
number of runs you are averaging together to get the
average running time on input size $n$), such as 100 or
1000 (don't pick one which makes this run so long you have
to wait around a while for it to finish), then make
two lists:
x, being the values of $n$ for which you run
the average running time on inputs of size $n$ ... use
something like x=[5, 10, 15, 20, ...] up as
high as you have patience for (it might be nice to have
a couple of very large values, like 100 or 500 or 1000)
and y, the corresponding average runtime for
inputs of those sizes. x and y should
both be lists of around 20 or 30 values.
Make a graph of y vs x.

Finally, on the same graph you just made, graph the
function $f(n)$ that you found above, multiplied by some
constant so that it does show how the running time is
$O(f(n))$.

Do exactly the same thing you just did for selection sort
[with all the steps and parts!], only now for
merge sort.

For selection sort, do a careful examination to figure out
a precise formula for the space complexity of your
program: that is, a formula telling how much memory it uses
on inputs of size $n$.
Is there a difference between best and worst
case space complexity? If so, give two different
formulæ; if not, explain why not.
Give a simple formula $f(n)$ so that the space complexity
you just found is $O(f(n))$ or, if you had different
best and worst case space complexities, give
two such functions, $f(n)$ and $g(n)$.

Do exactly the same thing you just did for the space
complexity of selection sort, only now for merge sort.

Extra, extra credit: Can you prove that the
merge sort algorithm we've been using does what it is
supposed to do? You can't quite use a loop invariant,
because we're using recursion rather than looping, but
a quite similar approach should work.

Some discussions of HW2 issues and working to get
matplotlib working on one student's personal laptop
— for which this site was very helpful.

Today [Monday] is the last day to drop classes without a
grade being recorded.

:

Some individualized discussion of assymptotics of recursive
algorithms — in particular, merge sort.

:

Review of bigO notation, discussion of related HW1
problem.

Discussion of big$\Omega$ and big$\Theta$ notations.

Mentioning The Master Theorem for assymptotics of
recursive algorithms. Here is a nice reference for this theorem
[or here is another place to find that same document, if the first one should happen to disappear from the 'net]. But note that your
instructor mentioned in class that it is hard [and not so
important] to keep the precise details of the theorem in your
head. However:

It is a good idea to be able to know how to check if the
theorem does apply in a particular situation, and then to
be able to state what conclusion comes from that
application.

E.g.: check for yourself if the Master Theorem
applies to the work we've been doing for merge sort
and, if so, what it tell us.

:

Read the resource linked yesterday about The Master Theorem
and read (or at least skim) either this resource,
this one, this one, or this one, on objectoriented programming in Python.

Some stylistic comments about homework assignments, including:

In math and computer science, it is best to go for a
mixture of text and equations/diagrams/code. Too many
words misses out on the precision of equations
— e.g., better than saying "yaddayadda
happens for all integers" is to say "$\forall
k\in\ZZ$ we have $S(k)$" [where $S(k)$ is the precise
statement you are going for, particularized to the integer
$k$ and probably having or being an equation.

Explanations in math and computer science tend to be
structured more like recipes in cookbooks than like good
essays or other wellwritten prose. For example, an
induvtion proof should have certain pieces in a certain
order (stating clearly what the pieces are and why!), and
that's just fine.

In this class, at least, you must always justify your
work. E.g., if the assignment is "give an
example of an even prime number," the answer "2" is woefully
incomplete, while "2, because it is even, being clearly
divisible by 2, and a prime, since the only natural numbers
it has as divisors are 1 and 2, simply becuase they are
clearly divisors and there is no room for other divisors
$d$ which satisfy $1\le d\le 2$."

In order to make sure that the work we did yesterday about
the time complexity of merge sort was valid, we looked
fairly carefully at the steps it must perform, following the
pseudocode here. Everything seemed to agree with what
we did yesterday, except for the possible case of appending an
element to the end of a list, and deleting an element from the
beginning of another list.
If both of these processes take $O(1)$ time, we're OK; if it
somehow takes more and more time to append to a larger and
larger list, or to delete entries from a very large list, then
we would be in trouble.
Therefore we wrote code to test this. That is, it should:

Loop up to some very large number $N$.

At each step, it should time the operation of appending one
small item [like a small integer] to the end of a single
list [which would therefore be getting very large by the end
of the main loop].

Make a plot using matplotlib of the time to append
an item (as $y$) against the size of the list to which that
item is being appended (as $x$).

Joey pointed out that the time for a single append might be
so small that it simply always registered as zero. So
another thing to do would be to time how long it takes to
append 100 items, or 1000, or some number like that, to the
list which is getting longer and longer, with a small loop
inside the main loop.

You should get a picture like
But make sure you can do that yourself!

:

Discussion about how it is a very nice feature of what some call
"rapid prototyping languages" like Python that one can very
quickly produce small bits of running code, to test new ideas and
to see how well some approach performs. The timing graph from
yesterday's class is an example.
In order to be able to take advantage of this kind of rapid
prototyping, is important to have a fairly high comfort
level with Python syntax and a few other tools (such as some
methods in the random and time modules), as
well as with the structures in which you are interested (such as
different sorting algorithms in this unit of class).
If you do not feel you have that comfort level with any of those
necessary pieces, please find some time to sit down with your
instructor, soon, to figure out how to get there!

Syntax to switch variable values in Python: oddly enough,
x,y=y,x
works, as does
a[i],a[j]=a[j],a[i]
as might appear in one of our sorting algorithms. This is very
convenient, use it!

Some objectoriented work in Python; see the links from
yesterday. In particular, discussing making
singly linked lists, and the complexity of adding to or deleting
from the head or tail of the list.

Insertion sort, yay!
Week 4

:

We've somehow gotten way behind on doing T&Qs!
Recall from the syallabus that these
are short emails you should send to your instructor about the
current readings you've been assigned and/or the material
we've been discussing in class.
Their content should be a thought or question you might have
about the topic in question, expressed in just a few
sentences (or even one, very well crafted sentence!).
There is no requirement about depth or sophistication of your
T&Qs, other than that they show real engagement with the
material. For example, something like
"I think American politics suck."
does not show engagement [about a reading on current events],
while
"This article says that the Electoral College has its
roots in the racist origins of the US Consstitution, and
while I can see that is important, I don't see any path to
changing the way we elect presidents in the near
future."
does.
Also, don't be embarrassed if your
T&Q is a question and not a statement. If there is
something in the assigned T&Q topic which you don't
understand, state clearly and explicitly what that
confusing bit is. E.g.,
"This article talks about the
problem with the Electoral College being that one person
does not have one vote, but I don't understand the
reasoning there, or the example of big states and small
ones — don't big states have more Electoral College
votes, so what's the problem with that?"
would be fine [on the same current event topic].
Each T&Q is graded 0 or 1 ... you should expect to get
pretty much all 1s on your T&Qs, so long as you are
actually engaging with the readings and class material, and
keeping on schedule with them.
With that understood, please do the following T&Qs (send each
in its own email to your instructor) before class on Monday
(Tuesday is also OK, if this seems like too much, too soon,
or if you have a question you want to ask in class about T&Qs
on Monday):

Read the Wikipedia page about Insertion Sort
and submit T&Q5 on this reading.

HW3: Implement insertion sort in Python.

Use plain Python lists — no objectoriented monkey
business with linked lists, as we had discussed in class on
Friday.

Write also code to collect data and display a graph of data on
the timecomplexity of your implementation [you should be able
to steal a bit of your own code from HW2 for the
complexity piece!].

Send your code to your instructor as a
<something>.py file attached to an email.
As always, when sending code to someone else, make sure
your variable names are informative and you have comments
in your code. Your instructor should be able to execute
your code and see the compleity graph[s], so there is no
need for any other text or explanation beyond the code
itself and its output.

Some content we discussed:

Using L^{A}T_{E}X a little more seriously:

Consider Overleaf,
an online L^{A}T_{E}X editor, if you don't have your own L^{A}T_{E}X on your personal
computer.

It is very easy to put L^{A}T_{E}X on web pages, just sneak the line
<script src='https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/latest.js?config=TeXMMLAM_CHTML async></script>
between the <head> and
</head> at the beginning of the HTML file.
Then you can use L^{A}T_{E}X snippets in the <body> of your
HTML file, and it will get automagically processed into
beautiful mathematics, courtesy of the MathJax display engine. E.g., this very page
used
\$\forall x P(X)\$ ...
where it said (above, on the first Friday
of the semester)
$\forall x P(x)$ ....

Going over a bunch of the issues that came up in
T&Qs.

Quick description of bubble sort

:

Read either this resource, this one, or this [much longer] one, or this onepage cheat sheet and submit T&Q6 on this topic.

Some content we discussed:

Using numpy: installing, basic use and features.

Comparing numpy with other kinds of data structure,
such as native Python lists and our own linked lists made
with Python classes.

Inclass activity: take your code for insertion sort
and change it to use numpy arrays, rather than
native Python lists. Then do a timing graph (as we've done
lots of times in the past) of the time complexity of this
version as a function of input size, with differentsized
(larger and larger) toy input data.

Please email whatever you have at the end of class to your
instructor  code, graphs, code that makes graphs,
etc.

:

Selecting elements in numpy arrays

Inclass activity: follow these steps:

Make a numpy array of integers from 2 to some
limit $n$.

Select all elements of the array which are either in
location less than or equal to 0 or are not multiples of
what is in location 0 (which should be the integer 2), and
update the array to have as its new value that subset of
values.

Now do the same thing with new verision of the array, but
for location 1.

Now make a loop that does this all the way through the
array. What is a good criterion for when to stop?

What have you just done? Do you know what it's called?

Write some code which makes a time complexity graph of the
above program as a function of the input number $n$.

If we have time, operator overloading

:

Read this resource, this one, or this one, on operator overloading and submit
T&Q7 on this subject.

Inclass (or at home, or in a room elsewhere on campus: your
choice, since your instructor will not be physically present
in the classroom today!) activity: make your own Python code
for singly and doubly linked lists, then make versions of
insertion sort which use each of these structures instead
of native Python lists. Then build a time complexity graph (as
we have done many times before!) for the performance of these
new versions as you make the toy input data larger and larger.

You will eventually have to email email whatever you make to
your instructor  code, graphs, code that makes graphs,
etc.  so make sure you keep a copy.

:

No class meeting — a good opportunity to get caught up on
things which will need to be handed in.
Week 5

:

Read this
article about the Sieve of Eratosthenes and
submit T&Q8 on this subject.

Read the first sections of this article on the stack &mdash
only up to (and not including) the section labeled
Applications of stacks, and submit T&Q9 on
this subject.
 HW4 is due, consisting of:

On Tuesday last week you made (or, at
least, started to make) a version of insertion
sort using numpy arrays, along with code to
test it for larger and larger input sizes and to make the
resulting time complexity graph. Send this code to your
instructor now!

Then on Thursday last week you made
(or, at least, started to make) a version of insertion
sort using linked listss, along with code to
test it for larger and larger input sizes and to make the
resulting time complexity graph. Send this code to your
instructor as well.

Make one additional graph that superimposes all time
complexities you have computed for insertion sort:
with Python lists, with numpy arrays, and with
your own linked lists. Send the code for this to your
instructor.

On Wednesday last week you made (or,
at least, started to make) code which implements the
Sieve of Eratosthenes, and also to collect and graph
time complexity information about this program for
different. Send this code to your instructor.
Notes:

Make sure that whenever you send code, it has good comments
and variable names, code to generate toy test cases and
graphs of time complexities. Your instructor should be
able to cut and paste your code into a Python environment
and then both fiddle with the basic code and, by running
some program you have supplied, generate the graphs you
were asked to make.

You might want to make a legend when graphing several
different things on the same graph. The way to do this is
to put the argument "label='some text'" in each
call to plot and, have an additional command
plt.legend(loc='upper left') before your
plt.show(). Here is a simple example:
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(0,20,1000)
y1 = np.sin(x)
y2 = np.cos(x)
plt.plot(x, y1, 'b', label='sine')
plt.plot(x, y2, 'r', label='cosine')
plt.legend(loc='upper left')
plt.show()

Discussion of some points which came up in T&Qs.

Discussion of some uses of stacks, particularly to be a
place to put local variables in instances of programs when they
recurse.

A thematic move: our course will move forward a bit in the
themes of material we cover, in that we will start to discuss
searching, having been working on sorting for a
while.

As an example of a searching problem, here is a bit of code to
do a guessing game which searches for a hidden random number:
import random
def guess(n=1023):
if n < 1:
print("Input must be a natural number")
return
number = random.randint(0,n)
guesses = 0
while True:
guess_text = input("Guess the number from 0 to %d: " % n)
if guess_text == "I give up":
print("Give up, eh? Well, my number was %d." % number)
return
guess = int(guess_text)
if guess < 0 or guess > n:
print("Your guesses must be integers in the range from 0 to %d." % n)
continue
guesses += 1
if guess > number:
print("Too high")
elif guess < number:
print("Too low")
else:
print("Got it (in %d guesses)!!" % guesses)
return
Try running this to play the game — come up with a good
strategy!

In class, we discussed the binary search strategy, which
repeatedly guesses the midpoint of the possible region, and then
recursing to work on the lower or upper remaining half, depending
upon the answer to the guess, "too high" or "too low."

We mentioned how the time complexity of binary search is
$O(n)$. This is in contrast with the most naïve searching
approach — which is just to go through all the possible
values and ask if it is correct — which has time complexity
$O(n)$.

Inclass activity: write code which implements binary search.
You may use the following class:
class Quizzer:
def __init__(self, max_val):
import random
self.n = random.randint(0,max_val)
self.max_val = max_val
def try_guess(self, k):
if k < 0 or k >self.max_val:
return 17
if k < self.n:
return 1
elif k > self.n:
return 1
else:
return 0
def __str__(self):
return str(self.n)
For example, it can be used as in the following
q=new Quizzer(65635)
for i in range(65535):
ans=q.try_guess(i)
if ans == 1:
print("Too low, trying again")
elif ans == +1:
print("Too high, how did that happen?")
elif ans == 0:
print("Got it! Answer was %d" % i)
break
else:
print("weird. quitting")
break
which implements the naïve searching algorithm.
But you should implement binary search! Do so, and send your
nice code to your instructor.

:

Discussing worst case, best case, and
average case complexity, in particular for the two
searching methods we discussed yesterday: the naïve and
binary search algorithms.

Important topic (with examples): hen talking about average case
complexity, it is very important to think about average over
what population, or, even more precisely, average over
what probability distribution.

Adapting our binary search strategy from last class to situations
when we're not searching among all integers from $0$ to $n$, but
rather in a more random or sparse set of values. What changes
need to be made?

:

Inclass activity to test your binary search algorithm on input
which is not just all integers from $0$ to $n$.

Write some code which takes input $n$ and $p$ and generates
$n$ numbers which lie in some randomly chosen interval of
integers $[a,b]$, where $(ba)/n=p$ (approximately).

Write code which picks a random element from the above
"sparse interval."

Combine with your binary search to see if it all works.

Make a time complexity graph.

Starting to talk about binary trees.

:

Read,
in Open
Data strutures, the introduction to Chapter
6, here, and its continuation
as § 6.1, and submit T&Q10 on this
subject.

In class we talked about many of the things we will code tomorrow.

:

The depth of a node in a tree is the number of links from
the root to that node.

The height of a node in a tree is the largest number of
links from that node to a leaf.

The height of a tree is the height of its root.
Equivalently, it is the maximum depth of any node of that tree.

It's time to make some code for trees. [This will be a large
part of the homework due next Tuesday.]
In particular, you could use this for the basic structure of
a tree node:
class BSTnode:
def __init__(self,data,left=None,right=None):
self.data = data
self.left = left
self.right = right
def is_leaf(self):
return self.left==None and self.right==None
This code includes the possibly convenient method that tells if
a node is a leaf, is_leaf().
Note that sometimes it is convenient to have each node "know" its
parent node. That would be with a slightly improved class
definition:
class BSTnode:
def __init__(self,data,parent=None,left=None,right=None):
self.data = data
self.parent = parent
self.left = left
self.right = right
def is_leaf(self):
return self.left==None and self.right==None
def is_root(self):
return self.parent==None
This version of the code includes the possibly convenient
method that tells if a node is the root of the tree,
is_root().
Now please write the following methods for that class:

insert inserts a new data value into a tree

max_data_val finds the maxium value of a subtree
starting at a certain node

min_data_val finds the minimum value of a subtree
starting at a certain node

verify checks if a binary search tree is wellformed

delete deletes a node

depth_first_print prints data values in depthfirst
order

ordered_print prints data values in increasing
order

depth returns the depth of a node

height prints the height of a node

height prints the height of a tree
Week 6

:

Read

Some content we discussed:

:

Read

Some content we discussed:

:

Read

Some content we discussed:

:

Read

Some content we discussed:

:

Read

Some content we discussed:
Week 7

:

Read

Some content we discussed:

:

Read

Some content we discussed:

:

Read

Some content we discussed:

:

Read

Some content we discussed:

:

Read

Some content we discussed:
Week 8

:

Read

Some content we discussed:

:

Read

Some content we discussed:

:

Read

Some content we discussed:

:

Read

Some content we discussed:

:

Read

Some content we discussed:
Week 9

:

Read

Some content we discussed:

:

Read

Some content we discussed:

:

Read

Some content we discussed:

:

Read

Some content we discussed:

:

Read

Some content we discussed:
 Today [Friday] is the last day to withdraw (with a
W) from classes.
Week 10

:

Read

Some content we discussed:

:

Read

Some content we discussed:

:

Read

Some content we discussed:

:

Read

Some content we discussed:

:

Read

Some content we discussed:
Week 11

:

Read

Some content we discussed:

:

Read

Some content we discussed:

:

Read

Some content we discussed:

:

Read

Some content we discussed:

:

Read

Some content we discussed:
Week 12

:

Read

Some content we discussed:

:

Read

Some content we discussed:

:

Read

Some content we discussed:

:

Read

Some content we discussed:

:

Read

Some content we discussed:
Week 13

:

Read

Some content we discussed:

:

Read

Some content we discussed:

:

Read

Some content we discussed:

:

Read

Some content we discussed:

:

Read

Some content we discussed:
Week 14
 Thanksgiving Break! No classes, of course.
Week 15

:

Read

Some content we discussed:

:

Read

Some content we discussed:

:

Read

Some content we discussed:

:

Read

Some content we discussed:

:

Read

Some content we discussed:
Week 16