In the Simplesim assignment, we introduced a fictional machine called the Simplesim and the Simplesim Machine Language (SML). In this assignment, you are to write a compiler that converts programs written in a high-level programming language to SML. This will require you to use the techniques that you learned to convert infix expressions to postfix and to evaluate those postfix expressions. We provide you with programs in this new high-level language with the intention of compiling them on the compiler that you will write and running the resulting SML program on the simulator you wrote.
Before we begin building the compiler, we discuss a simple, yet powerful, high-level programming language called Simple.
Every Simple statement consists of a line number and a Simple instruction. Line numbers must appear in ascending order.
Each instruction begins with one of the following Simple commands: rem
, input
, data
, let
, print
, goto
, if...goto
, or end
.
Simple evaluates only integer expressions using the arithmetic operators +
, -
, *
, and /
. The operators have the same precedence as in C++. Parentheses can be used to change the order of evaluation of an expression.
All variables in a Simple program are lower case single letters. Simple uses only integer variables and constants. Simple does not have variable declarations - merely mentioning a variable name in a program causes the variable to be declared and initialized to zero automatically.
Simple uses the conditional if...goto
statement and the unconditional goto
statement to alter the flow of control during program execution. If the condition in the if...goto
statement is true, control is transferred to the specified line of the program. The following relational operators are valid in an if...goto
statement: <
, >
, <=
, >=
, ==
, or !=
. The precedence of these operators is the same as in C++.
Consider the following Simple program that accepts two input values and prints the maximum of the two:
10 rem 11 rem print the maximum of two numbers 12 rem 20 data 10 30 rem 31 rem get values 32 rem 40 input x 50 input y 60 rem 61 rem check x > y 62 rem 70 if x > y goto 111 80 rem 81 rem y is maximum, print y 82 rem 90 print y 100 goto 130 110 rem 111 rem x is maximum, print x 112 rem 120 print x 130 data 20 900 end
Each line in a Simple program must start with a line number. The line numbers must appear in ascending order and must always start in the leftmost column, i.e., no leading whitespace. Execution of a Simple program starts at the first line. In our example program above, this is line 10
, a rem
command. rem
commands are used to document the program and are not executed. Anything appearing after the word rem
is simply ignored. Execution continues through the other documentation lines 11
and 12
and we reach line 20
, a data
command. data
commands, like rem
commands are not executed. However, unlike rem
commands, data
commands do serve a purpose (explained below). data
commands always end with a single integer. Execution proceeds through lines 20
and 30-32
and we approach line 40
, an input
command.
input
commands acquire a value from a data
command and store that value in its variable. The very first input
command that is executed uses the very first data
command in the program. Subsequently executed input
commands use the remaining data
commands in the order in which they appear in the program. data
commands are never used more than once. The same input
command will consume many data
commands if it is executed many times (e.g., in a "loop").
Line 40
is the first input
command to be executed, so it acquires its value from the first data
command (line 20
) and stores the value 10
in the variable x
. Line 50
, our second input
command to be executed, acquires its value from the next data
command (line 130
) and stores the value 20
in the variable y
. Execution continues through the documentation lines 60-62
and we approach line 70
, an if...goto
command.
if...goto
commands are always of the form if <lop> <relop> <rop> goto <linenum>
where <lop>
and <rop>
are always either a single variable or constant (i.e., not expressions), <relop>
is one of the relational operators listed above, and <linenum>
is a line number appearing in the program.
Line 70
tests x > y
. In our example program, this is false, so execution continues to line 80
. Had the test been true, then execution would have continued at line 111
. Note that line 111
is a rem
command. Simple programs may branch (if...goto
or goto
) to rem
and/or data
lines, despite the fact that these lines are not executed. Execution continues through lines 80-82
and we approach line 90
, a print
command. print
commands print either a single variable or constant. In our example program, the number 20
is printed and execution continues to line 100
, a goto
command, which branches (unconditionally) to line 130
, our second data
command. Since data
commands are not executed, we proceed to line 900
, an end
command where program execution is terminated.
As a final example, consider the following Simple program that accepts a non-negative integer x
and computes/prints the sum of integers from 1 through x
.
10 rem sum 1 to x 15 input x 20 data 10 25 if x <= -1 goto 60 30 rem add x to total 35 let t = t + x 40 let x = x - 1 45 rem loop x 50 goto 25 55 rem output result 60 print t 99 end
As always, execution starts with first line of the program. The first executable statement is line 15
, an input
command, which stores the value 10
(from the first data
command on line 20
) in variable x
. Execution continues with line 25
, an if...goto
command where we test x <= -1
. In our program this test is false, so execution continues to lines 35
and 40
which add x
to t
and decrement x
, respectively. t
is the variable that we are using to accumulate the sum. The variable t
has not been explicitly initialized. Recall that variables that appear in a Simple program are initialized (implicitly) to 0. Line 50
unconditionally branches back to line 25
where we test x <= -1
. Eventually, that test will become true, in which case control will be transferred to line 60
, the value in t
(the sum) will be printed, and the program will terminate.
Simple does not provide a repetition structure (such as C++'s, for
or while
). However, Simple can simulate each of C++'s repetition structures using the if...goto
and goto
commands as we did in the example above.
The input to your program is a Simple program, just like the ones shown in the previous section. You may assume that all Simple programs given to your compiler are syntactically correct. Syntax checking is a critical component of any compiler, however, proper syntax checking is a very difficult problem and well beyond the scope of this course. Hence, when you write your compiler you may make the simplifying assumption that the programs you are compiling are syntactically correct. You may also assume
if...goto
and goto
commands are in the program,let
commands, there is at least one space before and after the =
sign (note there may or may not be spaces in the expression on the right of the =
sign),if...goto
commands, there will be at least one space separating the left and right operands from the relational operator.Your compiler will produce a complete SML program, including the -99999
line followed by any input (from any data in the Simple program) or an error message (error messages described in Section 4). The SML program generated by your compiler must be immediately suitable for execution on your Simplesim. In other words, you should be able to take the SML program generated by your compiler and "feed it" directly (without modification) as input to your simplesim
program from Assignment 4.
The result of the compiler is the SML program, which is composed of SML instructions and data, a line containing -99999
to mark the end of the program, and possibly some input data. Your compiler program should use a memory array identical to the one it used in the Simplesim assignment and a data array with a counter to collect the values from the Simple data
commands.
For example:
int memory[MEMSIZE]; int data[MEMSIZE]; int ndata;
Your program will also use an array of flags and a symbol table (both are described below). The symbol table will have 1,000 "rows", where each row is an instance of the structure below. The flags array must have a flag for each memory location.
#define SYMBOL_TABLE_SIZE 1000 struct table_entry { int symbol; char type; // 'C' constant, 'L' Simple line number, 'V' variable int location; // Simplesim address (00 to MEMSIZE-1) }; table_entry symbol_table[SYMBOL_TABLE_SIZE]; int flags[MEMSIZE];
After some minor initialization, the compiler performs two passes. The first pass constructs a symbol table in which every line number, variable name, and constant of the Simple program is stored with its type and and corresponding location in the final SML code (the symbol table is described in detail below).
Constants and variables are placed at the bottom of the Simplesim's memory (memory locations 99, 98, 97, ...) and the SML program (instructions) are placed at the top of the Simplesim's memory (locations 00, 01, 02, ...). The first pass also produces the corresponding SML instruction(s) for each Simple statement. This results in a "hole" in the middle of the Simplesim's memory (between the last SML instructions and the constants/variables) that will be used as stack space to evaluate the arithmetic expressions in the let
commands.
As we will see, if the Simple program contains let
statements or statements that transfer control to a line later in the program, the first pass results in an SML program containing some partial instructions and it uses the flags
array to mark those instructions.
Your program should use the following variables to record the Simplesim address for the next instruction, the next constant or variable, and an index for the next symbol in the symbol table.
int next_instruction_addr; int next_const_or_var_addr; int next_symbol_table_idx;
The second pass of the compiler locates (using the flags
array) and completes (using the symbol table and stack space) the partial instructions. After the second pass, the compiler prints the finished SML program, followed by any input data.
Your compiler must initialize all of the Simplesim's memory to 4444 and all the flags to -1. It should then initialize the next instruction address to the "top" of the Simplesim's memory (memory location 0), the next constant or variable address to the "bottom" of the Simplesim's memory (memory location MEMSIZE-1
), the index of the next entry into the symbol table to 0, and finally the counter for the Simple data commands to 0.
For example:
next_instruction_addr = 0; next_const_or_var_addr = MEMSIZE-1; next_symbol_table_idx = 0; ndata = 0;
The first pass of the compiler reads and processes the Simple program one line at a time. In processing a single line of the Simple program, the compiler can make addition(s) to the symbol table, add (possibly partial) SML instruction(s) to the Simplesim's memory, and add constants and/or variables to the Simplesim's memory.
The general structure of the first pass should look like something like this:
string buffer1, buffer2, line_number, command; while (getline(cin, buffer1)) { buffer2 = buffer1; // buffer2 used for 'let' istringstream ss(buffer1); ss >> line_number; // ... code to add line_number to symbol table, type 'L' ... ss >> command; if (command == "input") { // ... code to process 'input' command ... } else if (command == "data") { // ... code to process 'data' command ... } else if (command == "let") { // ... code to process 'let' command ... } else if (command == ...) { . . . } else if (command == "rem") { // ... code to process 'rem' command ... } }
(The istringstream
class is defined in the header file <sstream>
and is part of the standard namespace.)
Each time a new line is read from the Simple program, its line number must be added to the symbol table along with the corresponding address of the next instruction of the SML program.
Each time a constant and/or variable is encountered, first the symbol table is checked to see if the constant or variable already exists in the symbol table.
If the constant or variable does not exist within the symbol table (i.e., this is the first time that it has been encountered in the program) a Simplesim memory cell is allocated for it (initializing the memory to 0 if it was a variable) and the symbol table is updated by adding the variable or constant along with its memory location.
If the variable or constant already exists in the symbol table, then no additional memory is allocated for it and the symbol table is left unmodified.
4.2.1. Processing Simple Commands
Here we discuss the details of how to process each of the Simple commands. In discussing each, it is assumed that the line number of the command has already been added to the symbol table.
rem
. There is nothing left to do after adding the line number to the symbol table.
data
. Add the integer appearing at the end of the line to the data
array and increment ndata
. If there is not enough room in the data
array to add this element, print *** ERROR: too many data lines ***
and terminate the compilation immediately, i.e., without printing the SML program.
input
. Search the symbol table for the variable appearing at the end of the line. If the variable appears in the symbol table, then extract the variable's Simplesim memory location from the table. If the variable did not appear in the symbol table, allocate a Simplesim memory cell for it at next_const_or_var_addr
and add it to the symbol table. Be sure to initialize the Simplesim's memory for that variable to 0 and decrement next_const_or_var_addr
for the next variable or constant. Be sure to note the newly-allocated memory location of the new variable (by saving it in the symbol table).
Using the variable's address, whether it was extracted from the symbol table or it was newly allocated, place a READ
instruction at memory location next_instruction_addr
. Be sure to increment next_instruction_addr
for the next SML instruction.
print
. Process this in exactly the same manner as the input
command, using the address from the symbol table if the variable already existed there or creating space for it and adding to the symbol table. The only difference in processing a print
command is that you should place a WRITE instruction in memory rather than a READ.
goto
. Search the symbol table for the line number appearing at the end of the line. If the line number appears in the symbol table (i.e., it refers to a line "before" this goto
command), then extract the line number's memory location from the symbol table and place/generate a BRANCH
instruction at memory location next_instruction_addr
. If the line number was not in the symbol table, then it refers to a line "after" this goto
command. This is called a forward reference. In this case, we place/generate a partial BRANCH
instruction (omitting the address to branch to) and flag the instruction to indicate that the second pass of the compiler must complete the instruction. Use the flags
array and set flags[next_instruction_addr]
to the line number this goto
command is supposed to branch to. In either case, after writing the BRANCH
instruction (partial or otherwise) be sure to increment next_instruction_addr
for the next SML instruction.
if...goto
. Compilation of the if...goto
and let
statements is more complicated than the other statements - they are the only statements that produce more than one SML instruction. For an if...goto
statement, the constants and/or variables appearing in the test are searched for in the symbol table and added to the Simplesim's memory and symbol table if necessary (noting the Simplesim memory locations of each). Then the compiler produces SML instructions (typically a LOAD
followed by a SUBTRACT
) to test the condition and branch if appropriate. The result of the branch could be a forward reference and should be handled just like the forward references in the goto
command, i.e., writing partial BRANCHZERO
and/or BRANCHNEG
instructions and flagging them with the line number in the flags
array. Each of the relational operators can be simulated using SML's BRANCHZERO
and BRANCHNEG
instructions (or possibly a combination of both). Be sure to increment next_instruction_addr
after each SML instruction.
let
. Compilation of the let
statement is the most challenging. It not only produces more than one SML instruction but also the evaluation of the expression (rhs
) requires you to manage your own stack (as an array) using the "hole" in the Simplesim's memory between the SML instructions and the constants/variables. The problem is further complicated by the fact that during the first pass we do not know where that stack space starts, so we are forced to write partial SML instructions and flag them using the flags
array.
The first thing you must do in processing a let
statement is to search the symbol table for the variable being assigned. If it is not in the symbol table, then a Simplesim word must be allocated for it, initialized to 0, and it should be added to the symbol table. Then, using the copy of the statement (buffer2
), call your convert()
function from the previous assignment, passing it the address of the first non-whitespace character after the equal sign, to get a postfix expression. Evaluate the postfix expression using the stack algorithm to write the SML instructions (described in detail below).
The actual Simplesim memory location that will be the start of our stack cannot be known until after all the space has been allocated for constants and variables (i.e., after the first pass has been completed). Once that is done, then we can use the "hole" between the the last SML instruction and the last constant/variable. The stack will start above (but not immediately above) the last constant/variable added and grow "upward" toward the SML instructions each time we "push" a value. The first Simplesim word immediately above the last constant/variable is specially reserved for the non-commutative operators -
and /
(explained below), so the stack actually starts immediately above that specially reserved word.
Start the processing of the postfix expression by initializing a stack index integer to 0 (next_stack_idx = 0
), which represents the index into the stack where the next pushed value should go. Now you are ready to start processing the postfix expression.
Processing Operands
Each time you encounter an operand (constant or variable), you must search the symbol table for it, and if it is not there, you must add it to the Simplesim's memory and the symbol table. Extract the operand's memory location from the symbol table.
Operands are pushed onto the stack. Since there are no memory-to-memory SML instructions, pushing a value onto the stack requires the use of the accumulator, i.e., you must LOAD
and then STORE
. Write the LOAD
instruction using the operand's address from the symbol table, then write a partial STORE
instruction (omitting the address). Use the flags
array to flag this STORE
instruction by setting
flags[next_instruction_addr] = -3 - next_stack_idx;
Recall that the flags
array was initialized with -1
and that it uses positive integers to represent forward referenced line numbers. This leaves negative numbers < -1
to represent stack indices. When the flags
array element has the value -2
, that refers to the specially reserved word for processing non-commutative operators (explained below). Values < -2
are used to represent stack indices, e.g., -3
means stack + 0
, -4
means stack + 1
, -5
means stack + 2
, etc.
Thus, processing an operand requires finding its SML address from the symbol table (adding it if necessary), writing the LOAD
instruction using that address, writing a partial STORE
instruction, setting the flags
array element corresponding to that STORE
instruction to -3 - next_stack_idx
, and incrementing next_stack_idx
.
Processing Operators
Processing operators requires you to pop the two operands, perform the operation, and push the result.
Processing the commutative operators (+
and *
) is a little easier than the non-commutative operators (-
and /
). Commutative operators pop the stack, placing the value into the accumulator
(LOAD
), and then perform the operation by popping the stack a second time and applying the operation to the value in the accumulator
(ADD
or MULTIPLY
) leaving the result in the accumlator. That result is pushed back onto the stack (STORE
). Therefore, processing a commutative operator writes three partial SML instructions (omitting the address in all three).
For example, the code for processing the operator +
would look like this:
memory[next_instruction_addr] = LOAD * 100; // omit address next_stack_idx--; flags[next_instruction_addr] = -3 - next_stack_idx; next_instruction_addr++; memory[next_instruction_addr] = ADD * 100; // for addition, omit address next_stack_idx--; flags[next_instruction_addr] = -3 - next_stack_idx; next_instruction_addr++; memory[next_instruction_addr] = STORE * 100; // omit address flags[next_instruction_addr] = -3 - next_stack_idx; next_stack_idx++; next_instruction_addr++;
(Code for processing the operator *
is nearly identical - just change ADD
to MULTIPLY
.)
Processing the non-commutative operators requires more care. Recall that the operand on the top of the stack is the right operand. Therefore, we cannot simply load that value into the accumulator
and apply the operation using the next value on the stack. We must pop the right operand (LOAD
) and temporarily store (STORE
) it in the special memory location sitting just beneath the stack. Then pop the stack again, placing the left operand into the accumulator
(LOAD
). Apply the operator using the value in the special memory location (SUBTRACT
or DIVIDE
), leaving the result in the accumlator
, and then push the result back onto the stack (STORE
). Therefore, processing a non-commutative operator writes five partial SML instructions (omitting the address in all five).
For example, the code for processing the operator -
would look like this:
memory[next_instruction_addr] = LOAD * 100; // omit address next_stack_idx--; flags[next_instruction_addr] = -3 - next_stack_idx; next_instruction_addr++; memory[next_instruction_addr] = STORE * 100; // omit address flags[next_instruction_addr] = -2; next_instruction_addr++; memory[next_instruction_addr] = LOAD * 100; // omit address next_stack_idx--; flags[next_instruction_addr] = -3 - next_stack_idx; next_instruction_addr++; memory[next_instruction_addr] = SUBTRACT * 100; // for subtraction, omit address flags[next_instruction_addr] = -2; next_instruction_addr++; memory[next_instruction_addr] = STORE * 100; // omit address flags[next_instruction_addr] = -3 - next_stack_idx; next_stack_idx++; next_instruction_addr++;
(Code for processing the operator /
is nearly identical - just change SUBTRACT
to DIVIDE
.)
Once you have finished evaluating the postfix expression. the answer is sitting on the top of the stack, where it must be removed and placed into the memory location of the variable of the let
command. This is done with two SML instructions, a partial LOAD
followed by a full STORE
. The element of the flags
array that corresponds to the partial LOAD
instruction should be set to -3
(i.e., the top of the stack).
end
. Simply write a HALT
instruction at memory location next_instruction_addr
. Be sure to increment next_instruction_addr
after writing the instruction.
There are only 100 words in the Simplesim memory and it is possible to deplete that. If by adding instructions you find that you have run past the end of memory or have entered the variable/constant section of the memory, or if by allocating space for variables/constants you have entered the program section of the memory, simply print the message *** ERROR: ran out Simplesim memory ***
and terminate the compilation immediately, i.e., without printing the SML program.
4.2.2. First Pass Example
To illustrate the first pass of the compiler, we take you through its steps (see Figure 1 below) as it processes the summation example program from Section 1.
We start with the first line of the program, line 10. As is done with each line of the program, the line number is added to the symbol table (see Figure 3), and since it is a rem
command, nothing else is done.
Line 15 is an input
command. The symbol table is searched for the variable x
and it is not found (at this point there are only two entries in the symbol table, lines 10 and 15). Since x
was not in the symbol table, a Simplesim word is allocated for x
(at location 99), it is initialized to 0, and x
is added to the symbol table. We are now ready to write our first SML instruction, a READ
command using the address of x
from the symbol table. Since we were able to write a full SML instruction (i.e., complete with address) we do not flag the instruction, meaning we leave this element of the flags
array unchanged.
Compilation continues with line 20, a data
command. The value (10) is placed into the first element of the data
array and ndata
is incremented.
Proceeding to line 25, we reach an if...goto
command. The symbol table is searched for the left operand x
, which is found. The symbol table is then searched for the right operand, the constant -1
. It is not found in the symbol table, so the next available Simplesim word is allocated for -1
(at location 98), the word is initialized to -1, and the constant -1
is added to the symbol table. We are now ready to start writing the SML instructions. This particular if...goto
command is testing <=
. That test is performed by subtracting the right operand from the left and branching if the difference is zero or negative (other if...goto
tests are handled in different, yet similar ways). The addresses of the left and right operands (locations 99 and 98) are extracted from their symbol table entries and used in the LOAD
and SUBTRACT
instructions. Now to write the BRANCHNEG
and BRANCHZERO
instructions. The symbol table is searched for the specified line number (60
). Since this is a forward reference (referring to a line "below" us), line 60 does not appear in the symbol table. Hence, we are forced to write partial branching instructions (omitting the address) and we must flag these instructions by placing the line number (60) into the cells of the flags
array that correspond to the branch instructions' memory locations (flags[03] = 60;
, flags[04] = 60;
).
Continuing through line 30 (another rem
command), we encounter our first let
command in line 35. The symbol table is searched for the variable appearing on the left side of the equal sign (t
) and it is not found. Simplesim memory location 97 is assigned to the variable t
, it is initialized to 0, and t
is added to the symbol table. We extract its Simplesim address (97) from the symbol table and save it for future use (the STORE
operation at the end of processing this instruction). The infix expression t + x
is converted to its postfix equivalent t x +
and compilation continues by processing the postfix expression, one symbol at a time.
The first symbol (t
) is an operand, so we must push it onto the stack. This requires generating a LOAD
and a STORE
. The symbol table is searched for the symbol t
and it is found. Its Simplesim address is extracted from the symbol table (97) and used in the LOAD
instruction. Now we must STORE
the value in the stack. Since we do not know the Simplesim address where the stack will begin (we cannot know that until after the end of the first pass when space for all the variables/constants has been allocated), we are forced to write a partial STORE
instruction (omitting the address) and flagging this instruction by placing a representation of the stack index (0) into the cell of the flags
array corresponding to this instruction's memory location (flags[06] = -3 - next_stack_idx;
) and the stack index is incremented. This concludes "pushing" the first operand (t
) onto the stack and we continue with the next symbol in the postfix expression, x
.
x
is also an operand and must be pushed onto the stack. The symbol table is searched for the variable x
. It too is found, and its Simplesim address (99) is extracted from the symbol table and is used in the LOAD
instruction. Once again, we are forced to write a partial STORE
instruction, this time using the next slot in our stack (stack index 1), of course incrementing the stack index after writing the partial STORE
and flagging its corresponding cell of the flags
array. This concludes pushing the second operand (x
) onto the stack.
Processing of the postfix expression continues as we move on to the next symbol, the commutative binary operator +
. Processing such an operator requires us to pop the stack twice (performing the addition on the second pop) and pushing the result back onto the stack. We perform the first pop by decrementing the stack index and writing the partial LOAD
instruction. We perform the second by by decrementing the stack index and writing the partial ADD
instruction. The sum is now sitting in the accumulator
and must be pushed back onto the stack. This is performed by writing the partial STORE
instruction and then incrementing the stack index.
This concludes the processing of the postfix expression. The stack should still have one value left, the value of the entire postfix expression. We must pop the stack and place the value into the variable that is the target of the let
statement, in this case t
. We pop the stack with a partial LOAD
instruction and conclude with a full STORE
instruction (using the address 97 that we extracted from the symbol table when we started processing this let
command).
Compilation continues to line to 40 where we encounter our second let
instruction, this time involving the non-commutative binary operator -
. We start again by searching the symbol table for the variable appearing to the left of the =
sign (x
) and find it, noting its memory location (99). We continue by converting the infix expression x - 1
to its postfix equivalent x 1 -
and processing it one symbol at a time. The first two symbols of this postfix expression are also operands and they are both pushed onto the stack (SML instructions 14-17
) the same way we pushed the first two operands in the previous let
statement, with one minor difference. The constant 1
is not found in the symbol table, so Simplesim memory location 96 is allocated for it, set to 1, and the constant is added to the symbol table.
Now we must process the non-commutative binary operator -
. Recall that the value on the top of the stack is the right operand. We cannot simply load that value into the accumulator
and perform the subtraction operation using the second value from the stack. We must pop the right operand from the stack with a partial LOAD
instruction and store it into the special memory location sitting just below the stack with a partial STORE
instruction (specifying the special memory location in the flags
array, flags[19] = -2;
). Processing continues by popping the next value from the stack into the accumulator
with a partial LOAD
instruction and then writing the partial SUBTRACT
instruction using the special memory location where we temporarily stored the right operand (flags[21] = -2;
). Finally, the difference is pushed back onto the stack with a partial STORE
instruction. This concludes processing of the postfix expression and the result is popped off the stack and stored in x
(SML instructions 23-24
) in the same manner as the previous let
command.
We continue through line 45 to line 50 where we encounter our first goto
command, which always results in a BRANCH
instruction. The only thing to determine is whether or not we can write a full instruction. We search the symbol table for the specified line number (25) and it is found. This allows us to write a full BRANCH
instruction using the address that we extract from the symbol table. If the line number was not in the symbol table (e.g., a forward reference), we would have been forced to write a partial BRANCH
instruction (omitting the address) and flag the instruction, setting flags[25]
to the forward referenced line number.
Continuing to line 60, a print
command, we search the symbol table for the specified variable (t
). In this case, the variable was found. However, if the variable was not in the symbol table, space would have to be allocated for it and the symbol table would be updated. In either case, print
statements always result in a full WRITE
instruction. We conclude the first pass with line 99, an end
command which generates a HALT
instruction.
This concludes the compiler's first pass through the summation program. At the end of the first pass, data[0] = 10
and ndata = 1
(from the data
command in line 20), the symbol table appears as it is presented in Figure 3, and the Simplesim memory
and flags
arrays appear as presented in Figure 2. There we can see that the SML program will ultimately occupy Simplesim memory locations 00-27
and that the constants/variables occupy 96-99
, leaving a "hole" in the memory (28-95
) between the two. This "hole" will be used for the stack (starting at memory 94
and growing up to and including memory location 28
) and the special temporary holding cell for right operands when evaluating non-commutative binary operators -
and /
(memory location 95
). Now that we have identified the boundaries of the "hole", we are ready to commence the second pass of our compiler.
Simple program | SML | flags |
---|---|---|
10 rem sum 1 to x
|
||
15 input x
|
00 0199
|
|
99 0000
|
||
20 data 10
|
||
25 if x <= -1 goto 60
|
01 1299
|
|
02 2298
|
||
03 3200
|
60
|
|
04 3300
|
60
|
|
98 -0001
|
||
30 rem add x to total
|
||
35 let t = t + x
|
05 1297
|
|
06 1100
|
-3 (stack + 0)
|
|
07 1299
|
||
08 1100
|
-4 (stack + 1)
|
|
09 1200
|
-4 (stack + 1)
|
|
10 2100
|
-3 (stack + 0)
|
|
11 1100
|
-3 (stack + 0)
|
|
12 1200
|
-3 (stack + 0)
|
|
13 1197
|
||
97 0000
|
||
40 let x = x - 1
|
14 1299
|
|
15 1100
|
-3 (stack+0)
|
|
16 1296
|
||
17 1100
|
-4 (stack+1)
|
|
18 1200
|
-4 (stack+1)
|
|
19 1100
|
-2 (right operand)
|
|
20 1200
|
-3 (stack+0)
|
|
21 2200
|
-2 (right operand)
|
|
22 1100
|
-3 (stack + 0)
|
|
23 1200
|
-3 (stack + 0)
|
|
24 1199
|
||
96 0001
|
||
45 rem loop x
|
||
50 goto 25
|
25 3101
|
|
55 rem output result
|
||
60 print t
|
26 0297
|
|
99 end
|
27 3400
|
Figure 1: First pass of summation program
Location | memory |
flags |
---|---|---|
00
|
+0199
|
|
01
|
+1299
|
|
02
|
+2298
|
|
03
|
+3200
|
60
|
04
|
+3300
|
60
|
05
|
+1297
|
|
06
|
+1100
|
-3
|
07
|
+1299
|
|
08
|
+1100
|
-4
|
09
|
+1200
|
-4
|
10
|
+2100
|
-3
|
11
|
+1100
|
-3
|
12
|
+1200
|
-3
|
13
|
+1197
|
|
14
|
+1299
|
|
15
|
+1100
|
-3
|
16
|
+1296
|
|
17
|
+1100
|
-4
|
18
|
+1200
|
-4
|
19
|
+1100
|
-2
|
20
|
+1200
|
-3
|
21
|
+2200
|
-2
|
22
|
+1100
|
-3
|
23
|
+1200
|
-3
|
24
|
+1199
|
|
25
|
+3101
|
|
26
|
+0297
|
|
27
|
+3400
|
|
"hole" | ||
96
|
+0001
|
|
97
|
+0000
|
|
98
|
-0001
|
|
99
|
+0000
|
Figure 2:memory
andflags
arrays after first pass of summation program. Values stored inmemory
array are+4444
and values stored inflags
array are-1
unless otherwise noted.
Symbol | Type | SML Address |
---|---|---|
10
|
L
|
00
|
15
|
L
|
00
|
x
|
V
|
99
|
20
|
L
|
01
|
25
|
L
|
01
|
-1
|
C
|
98
|
30
|
L
|
05
|
35
|
L
|
05
|
t
|
V
|
97
|
40
|
L
|
14
|
1
|
C
|
96
|
45
|
L
|
25
|
50
|
L
|
25
|
55
|
L
|
26
|
60
|
L
|
26
|
99
|
L
|
27
|
Figure 3: Symbol table after first pass of summation program
At the end of the first pass, you will know the address of the last constant/variable that space was allocated for. The Simplesim word immediately before that (location 95
from the example in Section 4.2) becomes the location for the right operand of the non-commutative binary operators and the word immediately before that (location 94
from Section 4.2) is the starting location for the stack. You will need that information for the second pass.
The purpose of the second pass is to complete the partial instructions written in the first pass. This is done by traversing the flags
array and completing any instruction whose corresponding flags
value is != -1
.
A positive flags[i]
value represents a line number that was a forward reference. Search the symbol table for the line number, extract its address, and complete the SML instruction (memory[i]
) by adding the address.
A flags[i]
value of -2
represents the special right operand memory location. Complete the SML instruction (memory[i]
) by adding the address of that word (location 95
from Section 4.2).
When flags[i] < -2
, that represents a memory location in the stack. Calculate the stack index idx = -3 - flags[i]
and use that index to compute the address. Using the example from Section 4.2, flags[i] = -3
produces a stack index of 0, which corresponds to memory location 94
, flags[i] = -4
produces stack index 1 and that corresponds to memory location 93
, and so on. Complete the SML instruction (memory[i]
) by adding the calculated stack address.
There is one error that could possibly be detected during the second pass: the "hole" may not be big enough to accommodate the stack space necessary to evaluate the postfix expressions. In that case, your program should print *** ERROR: insufficient stack space ***
and terminate the compilation immediately, i.e., without printing the SML program.
After the second pass you will have a complete SML program (no partial instructions) and a data
array ready for printing. Print the entire memory
array (one word per line), followed by -99999
, and conclude by printing all the values read into the data
array.
By running the setup
command for this assignment, you will receive a makefile, a renamed version of the main routine for Assignment 4 (simplesim_main.cpp
) to be used in building your simplesim
executable, and a collection of Simple programs. They include six Simple programs that will compile (including the two example Simple programs sum.s
and max.s
) and a collection of Simple programs that will not compile. The file name and brief description of each Simple program is in the table below.
Filename | Expected Result | Description |
---|---|---|
Simple programs that will not compile: | ||
bigdata.s
|
*** ERROR: too many data lines ***
|
Simple program has too many data lines.
|
bigpgm.s
|
*** ERROR: ran out Simplesim memory ***
|
Cannot allocate space for an instruction. |
bigpgmvar.s
|
*** ERROR: ran out Simplesim memory ***
|
Cannot allocate space for a variable. |
bigpgmcmd.s
|
*** ERROR: ran out Simplesim memory ***
|
Cannot allocate space for an instruction. |
bigpgmstack.s
|
*** ERROR: insufficient stack space ***
|
Insufficient stack space for single let statement.
|
Simple programs that compile: | ||
end.s
|
no Simple output |
One line Simple program, end .
|
read.s
|
print READ: value
|
Reads an integer into a variable. |
rw.s
|
READ and output value then constant
|
Reads and prints a number and constant. |
max.s
|
max of two input values | Reads two numbers and prints max. |
sum.s
|
sum of from 1 to x
|
Reads number (x ) prints sum from 1 to x .
|
prime.s
|
1 if x is prime, else -1
|
Prints 1 if input is prime, otherwise -1. |
You will receive two executable files, simplesim_check
, an executable Simplesim, and scc_check
, an executable compiler solution to this assignment. You may use the Simple programs above along with the solutions simplesim_check
and scc_check
to debug your program. The output of your compiler and Simplesim must look exactly like the output produced by scc_check
and simplesim_check
. Your programs will be tested in the following manner,
z123456@turing:~$ ./scc_check < sum.s | ./simplesim_check > sum.key z123456@turing:~$ ./scc < sum.s | ./simplesim > sum.out z123456@turing:~$ diff sum.out sum.key z123456@turing:~$
where scc
and simplesim
are your solutions. To be eligible for full credit your output must exactly match (empty diff
) the output of simplesim_check
for each of the Simple programs listed in the table above. Note this is a minimum requirement. We may test your programs with Simple programs other than those listed above.
You must place simplesim.cpp
, simplesim.h
, sml.h
(from Assignment 4), inpost.cpp
, inpost.h
, mystack.cpp
, and mystack.h
(from Assignment 7) in your submission directory for this assignment. You will also submit a new file for this assignment, scc.cpp
which must #include "sml.h"
and use the #define
values for each of the Simplesim instructions (e.g., LOAD
, STORE
, ...), and it must #include "inpost.h"
and call the convert()
function in inpost.cpp
. We will re-make your simplesim
executable and your scc
compiler using the makefile that we supply you.
Note that your program will be tested using your solutions to the previous assignments. You will need to turn in working versions of them in order to receive full credit for this assignment (note that you must supply them again when turning in this assignment - if they don't work, then you must fix them).
The structure of this program is left largely up to you. If you want to write it as a class in a fashion similar to what was done on Assignment 4, you are welcome to modify the makefile to make that possible. You are equally welcome to write one giant main()
routine in scc.cpp
. In practice, you're probably going to want to write at least a few other functions besides main()
to avoid massive amounts of duplicated code.
The most complicated part about this assignment is the first pass of the compiler. The initialization, second pass, and outputting the program are all relatively straightforward. You might want to implement those three sections right away and start with just the while (gets(cin, buffer1))
loop of the first pass (described in Section 4.2), leaving the body of the loop initially empty. Continue by adding the code to process Simple commands, one at a time, to the body of the loop. Always convince yourself that what you have done to that point is working correctly before proceeding. Below is a suggested strategy you might take using the Simple programs we have provided.
Start with rem
and end
. This will get you to register each line number in the symbol table. Verify with end.s
and bigpgm.s
before proceeding.
Add input
and data
. You will also have to write the code to handle variables as well. Verify with read.s
, bigdata.s
, bigpgmvar.s
, and bigpgmcmd.s
before proceeding.
Add print
and goto
. Verify with rw.s
before proceeding. rw.s
requires you to process a constant and forward reference. These are the last Simple commands that produce only single SML instructions.
Add if...goto
. You might want to implment even this single Simple command in stages, adding one relational operator at a time. Verify with max.s
before proceeding. This will require you to at least implement the >
relational operator.
Conclude with let
. This is the most complicated Simple command to compile. Start by verifying with sum.s
and bigpgmstack.s
. This will require you to implement the arithmetic operators +
and -
and the <=
relational operator. After convincing yourself that your compiler works with both of those Simple programs, finish up with prime.s
which introduces the two arithmetic operators *
and /
and the relational operators ==
, <
, and >
.
The Simple programs that we have provided are there to help you. They are not intended to be a complete test suite for your program (note that some of the relational operators are not tested at all). You must write your own Simple programs to further test your compiler. When grading your program we may use your compiler to compile programs other than the ones we supplied.
There is one other warning we should issue. When searching the symbol table for a particular symbol, be sure that you check both the symbol and its type for a match. The ASCII value for the variable 'a'
is 97, 'b'
is 98, and so on. Those values can easily be mistaken for line numbers. Also, searching for a constant 10 could be mistaken for line number 10.