head 1.1; branch 1.1.1; access ; symbols start':1.1.1.1 cd16:1.1.1; locks ; strict; comment @# @; 1.1 date 2003.08.15.17.26.01; author beckert; state Exp; branches 1.1.1.1; next ; 1.1.1.1 date 2003.08.15.17.26.01; author beckert; state Exp; branches ; next ; desc @@ 1.1 log @Initial revision @ text @ Optimizer

The Optimizer

file: OP.F

To describe the stack model, two different virtual stacks are used to describe the initial and final stack configurations.

D is a model of the final stack configuration. 
X is a model of the initial stack configuration.
The stack models are shown here with the top of stack at the right. The parameters are numbered 1,2,3… with 1 being the top of the stack. 

The target VM may use general-purpose registers to hold the top of stack. The number of general-purpose registers used to represent the top of stack is called CACHE. P1, P2, P3, … are stack parameters. For example, if one register is used for the top of stack then P1 is a register, P2=[S], P3=cellsize[S], etc. CACHE=1 works well for most processors. If register access is cheap compared to memory access, CACHE=2 is a little better.

The target VM also needs some scratchpad registers for use when shuffling parameters. You can tell the compiler to place a hard limit on the number of scratchpad registers it will allow. You'll need at least one. 

To understand the relationship between CACHE and the stack models, let's look at the effects of depth changes on the virtual stacks:

Example 1. Stack growing.

After 2DUP, 0 regs  After 2DUP, 1 reg  After 2DUP, 2 regs
D: … 3 2 1 2 1 D: … 3 2 1 2 1 D: … 3 2 1 2 1
X: … 3 2 1 0 0 X: … 3 2 0 0 1 X: … 3 0 0 2 1
code code code
S=S-2 S=S-2 S=S-2
P2=P4 P3=P1 P4=P2
P1=P3 P2=P4 P3=P1

Before resolution, the stack pointer is bumped to make the stacks the same size. In this example, address units are cells and the stack grows downward. The X virtual stack represents the target's uninitialized stack memory with zeros. For each zero seen in the X list, the optimizer gets the corresponding D token (the one at the same depth as the zero in the X list). It scans the X list for this token and uses the index of the lowest seen match as the source operand. 

If a parameter is in the X list but can't be found in the D list, that parameter is free to be trashed. It will be treated the same way as a zero.


Example 2. Stack shrinking.

After 2DROP, 0 regs  After 2DROP, 1 reg  After 2DROP, 2 regs
D: … 7 6 5 4 3 D: … 7 6 5 4 3 D: … 7 6 5 4 3
X: … 7 6 5 4 3 X: … 7 6 5 4 1 X: … 7 6 5 2 1
code  code  code 
S=S+2 S=S+2 S=S+2
P1=P3 P1=P3
P2=P4

Before resolution, the optimizer shrinks the X virtual stack but keeps the cached parameters. For each stack position, X is the destination operand and D is the source operand. At the end of optimization, code is laid down to bump the stack pointer. 

Arithmetic

Stack elements are modeled using bit fields. The host machine is assumed to have 32-bit cells. The bit fields are assigned as follows:

Function b19..b12 0..255 = function type
Src2 b11..b6 0..63 = operand P0, P1, …
Src1 b5..b0 0..63 = operand P0, P1, …

The virtual stacks use two cells per element. The first cell is this packed type field. The second is optional literal data. When a function is performed on the virtual stack, one or two parameters are pulled off the stack and combined with the function number. The result is pushed back onto the stack.

That's the simple case, where the parameters are source operands. If a function has to perform a transform on another function, it lays down code to place the result if the first function in scratchpad registers. Then, it pushes the second function using the scratchpads as operands. Scratchpads are de-allocated when possible to minimize the number of them that accumulate during compilation.

During compilation, the stack is generally resolved left to right so that instructions using the carry flag are done in the right order. There is no guarantee of this, however, so you should examine the object code of words that depend on the carry flag being propagated between words. Carry is good to have, so +C is one of the primitives. Example: :D+ ( d1 d2 - d3 ) >r rot + swap r> +c ;
Results in the code: 4=fn60(4,2) 1=fn61(1,3) S=S+4 where fn60 and fn61 are ADD and ADC respectively. P1 is cached in a register, P2..P4 are locations on the stack.

When a two-parameter function consumes the results of other functions, operations will be performed out of order. In rare cases (such as R> R> -) this is bad. This kind of operation is inhibited with some functions to avoid funky behavior.

When the stack model is resolved to machine code, often the destination will be different from the source operand(s). 

Functions with similar behavior are grouped into sets. Some functions can have their S2 parameters replaced by a literal to form a new function.

Compile-time Stack Resolution

The D and X lists are first scanned to resolve anything that can be trashed. This is basically anything that isn't needed later. 

The next step is stack shuffling. This can be a little complicated, since multiple dual-operand items can generate a need for temporary storage. 

The procedure starts with saving the deepest unresolved item to a scratchpad register and resolving that item. This usually creates a free space, which is quickly resolved. The process continues until the first element is resolved. The source is checked against the scratchpad, so the scratchpad is used wherever possible. The last resolution pass usually has the scratchpad as the source.

When a dual-operand function is the source, resolving an element doesn't necessarily free up any space. Therefore, more scratchpad storage is required. The compiler provides an adjustable limit on the number of scratchpad registers allowed. Most CPUs can efficiently address a small part of memory, so the higher scratchpad location can easily be mapped onto these.

When a source operand is moved to a scratchpad location, all instances of that parameter in the D list is replaced by an index indicating the location. Scratchpad registers are numbered -1, -2, etc.

@ 1.1.1.1 log @Imported sources @ text @@