CSC362 2004S.L02 (Class 41): Code Improvement Admin: * Online grading has arrived. YA reason to hack PioneerWeb. * Custom course evaluation forms now available. Please bring a *filled out* form to class tomorrow. * Class Friday meets in the normal place. Ananta and Sam will pick up the food. Get your orders to Ananta asap. * You have now reached the top of my grading heap. * It's Pub Grading Night! * Final questions on the translator? * Yes, it is due tomorrow at 1:15 p.m. * Yes, starting yesterday was a bad idea. Overview: * Improvement basics * Goals * Places to improve * Side note: Loops [* Detour: Flow Graphs and Dataflow] * From intermediate to actual code (code generation) * Favorite peephole optimizations Improvement is as old as compilation * Fortran: * In the longagos: 1950's at IBM * First goal: Transportability * Second goal: Ease of programming * Third goal: Produce code "(almost) as good as the average programmer" * Today: * Typical commercial compilers produce code better than almost any assembly language programmer (architectures are more complicated, lots of code improvement techniques developed) * Unfortunate name for the process of improving code to make it more efficient: Optimization * Most improvement focuses on running time * Code footprint (length of the code) * Memory use * Power usage (if you have a laptop; kinds of operations) * Observation: Lots of subtle issue type matrix = array[1..N,1..N] of integer; var a, b: matrix; begin ... { copy b to a; the details of b := a; } for i := 1 to N do for j := 1 to N do b[i,j] := a[i,j] vs. for i := 1 to N do for j := 1 to N do b[j,i] := a[j,i] Claim: The way the arrays are arranged in memory affects which of these runs faster. Observation: Two basic ways to arrange two-dimensional arrays Row-major: A[1,1],A[1,2],A[1,3]...A[1,N],A[2,1],A[2,2]....,A[2,N],... Column-major: A[1,1],A[2,1],A[3,1]...,A[N,1],A[2,2],... If your arrays are big enough (say, you do Physics programming or work for Google), the second for loop will do lots of paging. To solve this problem: * Programmer: Arrange the order of the array to match the loop * Programmer: Arrange the loop to match the order of the array * Compiler: Detrmine the access patterns and arrange the array to match the access pattern Observations: * Need to think about more than just the assembly code produced * Improvement can be done at different compilation levels The programmer can: * Choose the right algorithm for the problem! * [Analyze and determine the "hot spots" and work on those.] * Use integers rather than floats or doubles. * Turn tail-recursive functions into loops (unless your compiler does) * Lots of special-purpose stuff, like loop unrolling Loop unrolling: * Given a loop, reproduce the loop by writing the body multiple times for i := 1 to 5 do writeln(i); writeln(1); writeln(2); writeln(3); writeln(4); writeln(5); MOV $1 i LOOP: ARG i CALL writeln JEQ i $5 ENDLOOP IADD $1 i -> i JUMP LOOP ENDLOOP: Unroll ARG $1 CALL writeln ARG $2 CALL writeln ARG $3 CALL writeln ARG $4 CALL writeln ARG $5 CALL writeln ENDLOOP: Direct improvement: Fewer instructions Indirect improvement: Faster on pipelined architectures (no need to flush the pipe) The translator (from tree to intermediate code) * Should not do anything * Better to produce clear code you can prove correct than unclear fast code Intermediate code * Can be "peephole improved" * Can have some global analyses applied (like loop unrolling and function inlining) Intermediate to Assembly * Choose the best translation (which is often architecture-specific) At assembly level * Move peephole optimization Typical peephole optimizations: * Copy propagation: MOVE x y ... ADD x x -> z * Advantages: If we are lucky, we eliminate *all* uses of y, Eliminate the MOVE, save temporaries, increase chance of successful register allocation * "Shared code elimination" ADD x y -> q ... ADD x y -> r * Replace by ADD x y -> q ... MOVE q r * A move is usally faster than an add * Permits additional copy propagation Final optimization: Simplify math IMULT $2 a -> b can be LSHIFT a $1 -> b or IADD a a -> b