Subject: Pipelining and Scheduling. Memory Hierarchy.
Text: Read Appel p. 474—488 (without Other Control Flow). In chapter 21 read pp. 503—504 (Alignment in the instruction cache), pp. 511—514 (sections. 21.4—21.5).
Information about software pipelining in cl6x compiler can be found in TMS320C6000 Programmer's Guide [SPRU198], section 2.4.3. Instruction scheduling constraints are described in TMS320C6000 CPU and Instruction Set Reference Guide [SPRU189], section 4.5. Advanced (mostly manual) optimization techniques are described in chapter 5 of Programmer's Guide [SPRU198].
Prerequisites: Dataflow optimizations. Loop unrolling.
Comments: Comments about scalar replacement on p. 479 are probably less helpful than the author assumed. Instead of program 20.4 basically assume that the following program is the starting point:
b = V[0] for i = 1 to N a = j + b b = a + f c = e + j d = f + c e = b + d f = U[i] g : V[i] = b h : W[i] = d j = X[i]
Then Fig. 20.4b is basically this program annotated with iteration numbers, except that the first line is missing (most likely a typo in the book anyway).
I wholeheartedly recommend self-studying chapter 5 of Programmer's Guide after this lecture. It contains careful study optimization opportunities for your code. Do not be fooled by the fact that the chapter expresses things in terms of assembly language. Many of the techniques are applicable on the C level too. If you see that a given technique (like software pipelining) cannot be applied on the C level, and the compiler generated code is not good enough, then consider reimplementing the relevant fragment in assembly, applying the technique.
Time: Wednesday, 11 May, 8:30—10:15. Room A5-006
Tutorial: 10:30–12:00. Room your group office.
Resources: