Eboard 24: Sorting

You are being recorded and transcribed.

Approximate overview

Administrivia
Check in on Wednesday’s class
Some notes on sorting
Testing sorting algorithms
Loop invariants

Preliminaries

Registration preview

Upcoming work

Friday, 2024-03-15, 11pm, Third set of LAs.
- Mostly posted.
- Let me know if any look wrong.
- Let me know if you’d like to see any others posted.
- Grading is still on my priority queue, near the front.

Tokens

Academic/Scholarly

Cultural

Friday at 4pm in the Global Living Room. Middle of Everywhere.
- I don’t know whether or not that’s happening.

Peer

Wellness

Misc

Other good things to do (no tokens)

Spring Break PSA

You are awesome. Please take care of yourselves.
And don’t get me fired for sharing information with you.

Check in on Wednesday

What did you learn?

Merge sort.
Merge algorithm: Merge two sorted lists (or subarrays)
Merge sort - If the list has more than one element, divide the unsorted list in half, sort the two halves, merge them together.
Can I merge in place (e.g., [1 ,4, 6, 9, 2, 3, 5, 8])?
- Nope. I need to make a helper array to copy things into.
- This is a disadvantage of merge sort: It requires O(n) extra space.
Do I have to use merge sort when I sort each subarray?
- Nope.
How do you prove it correct?
- Experimentally, we run tests
- We could do a proof. With invariants.
Is merge sort stable?
- Stable: If there are two “equal” elements, they maintain their order. (That may have been in the reading.)
- No: It depends a bit on how you merge.
- Can we decide on an approach that maintains stability? Yes! When we merge, always take the element from the left array first.

Some notes on sorting

Stability
We all sort all the time.
- MSort: Divide the cards into two piles: Sorted and Unprocessed. For each card in unprocessed, put it into the right place in sorted.
- ASort: Similar, with a probabilistic (ML-based) approach for finding the right location.
- Insertion sort
Analysis of insertion sort (incorrect)
- We have to insert n cards from unprocessed list into the sorted list.
- Finding the right location takes O(logn) time because we can use binary search.
- Inserting it into that position is constant time.
- This algorithm is O(nlogn).
Why is this incorrect? (Hint: It may be O(nlogn) in the physical world, but not in the computer world.)
- If the data are stored in an array, it will take O(n) to insert, not O(1) because you have to shift stuff over to make room.
- If the data are stored in a link list, it only takes O(1) to insert once you’ve found the spot, but be can’t use binary search on lists, since binary search requires quick access based on index.
Analysis: O(n^2) for arrays because O(n) to insert.
Analysis: O(n^2) for lists because O(n) to find.
Reminder: Once you’ve correctly analyzed an algorithm, you have two questions to ask:
- Can I do better?
- If not: Can I prove that I can’t do better?
Detour: For the “can I prove”, Travelling Salescritter Problem
- Given a set of n cities, find the shortest path connecting them all.
- Easy solution: Make a list of all the paths, take the shortest.
- Unfortunately, there are O(n!) paths, which grows way too quickly.
In 301, you’ll get to consider whether a sorting algorithm can be faster than O(nlogn). (Preview: Not for a compare/swap algorithm.)
In 341, you’ll get to explore questions like TSP.
For insertion sort’s “can I do better”
- Design a different algorithm
- Design a new data structure, one that has both quick insertion and quick searching.
A better data structure: The binary search tree
- A binary tree is a structure in which you have a node (with a value), left subtree, and right subtree.
- Usually the left subtree has values less than the node and the right subtree has values greater than the node.
- Both left and right subtrees have the same property.
If the tree is mostly balanced (that is, we have left and right subchildren at most levels; the two subtrees are about the same size ), the number of levels is O(logn).
Search is O(logn)
Insert is O(logn)
Maintaining the property is hard.
We will return to this issue in a few weeks.

Testing sorting algorithms

You’ve written a sorting algorithm. Proofs are hard. You want tests. What kind of “write lots of tests” algorithm would you write?

Systematically
Random

Systematically

Use for loops to build arrays of different sizes
For each size array

Random

Use a for loop to build arrays of different sizes
Sort each
And make sure it’s in order AND a permutation of the original

Sam’s Sorting algorithm (which gives you an array in order)

``` public static T samSort(T[] values, Comparator order) { for (int i = 1; i < values.length; i++) { values[i] = values[0]; } } // samSort ``

More ideas

Start with a sorted array.
Permute it (or permute a copy) [shuffle; rearrange]
Sort the permutation
Compare the two arrays.

Even more

Start with a sorted array.
Sort it.
See if it’s the same. (Some non-stable sorting algorithms will rearrange equal elements, so for this, we want to ensure that equal elements are treated as equal.)

Whoops!

Loop invariants

Unless specified otherwise elsewhere on this page, this work is licensed under a Creative Commons Attribution 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc/3.0/ or send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA.

This website was built using Jekyll, Twitter Bootstrap, and the Bootswatch Cosmo Theme.