Disassembling Code

Here is a piece of object code:

X'5830F0144140F0185C2040005030F01C07FEF5F5000000190000000DF5F5F5F5'

Suppose this is the executable code for a small program. We want to know what it does, so we need to decode it. This is also called disassembling.

When the program begins, register 15 contains the address of the first byte of the first executable line in the program (the X'58' at the left side above). What will the machine do with this?

To make it easier to see, let's break it into bytes:

X'58 30 F0 14 41 40 F0 18 5C 20 40 00 50 30 F0 1C 07 FE F5 F5 00 00 00 19 00 00 00 0D F5 F5 F5 F5'

The machine will look at the first byte, X'58', and expect it to be the operation code of an instruction. We look it up in the Reference Summary: the instruction is L (Load). It is of type RX, so it is 4 bytes long. The first instruction is: X'58 30 F0 14'. We can decode it:

L        3,20(0,15)

We can safely put a few things around it:

MAIN     CSECT
         USING MAIN,15
         L     3,20(0,15)

         END   MAIN

In passing, we can observe that location 20(0,15) must contain some number, that is, data and not an instruction.

Next, we have the next byte after the Load: X'41'. We go through the same steps: look up the operation code, find out what type instruction it is, get hold of the right number of bytes and decode it.

We do this repeatedly. We get to this point:

MAIN     CSECT
         USING MAIN,15
         L     3,20(0,15)
         LA    4,24(0,15)
         M     2,0(0,4)
         ST    3,28(0,15)

         END   MAIN

Now we find X'07', which is the operation code for BCR (Branch on Condition Register). It is of type RR and is 2 bytes long: X'07 FE', which we can decode as:

         BCR   B'1111',14         BR    14

That should look familiar. Our programs all end with "BR 14".

The next 2 bytes are X'F5 F5'. If you look up X'F5', you will find that it is not the operation code of any instruction. Is that a mistake? Will we have an Operations Exception, which is caused by trying to execute an invalid instruction?

The answer is no. As the preceding line is an unconditional branch, we will not be trying to execute X'F5 F5'. So why is it there?

We can find out if we keep track of the location counter:

00000000    MAIN     CSECT
00000000             USING MAIN,15
00000000             L     3,20(0,15)
00000004             LA    4,24(0,15)
00000008             M     2,0(0,4)
0000000C             ST    3,28(0,15)
00000010             BCR   B'1111',14          
00000012
                     END   MAIN

Notice that we need to load a value from 20(0,15), which is location counter X'00000014', on a fullword boundary. Since we are at location X'00000012', which is not a fullword boundary, we have to advance 2 bytes to reach the next fullword boundary. What happens to the 2 bytes we skipped? Nothing--they still contain their previous values. We are using ASSIST, which initializes bytes in memory to the value X'F5', so the 2 bytes contain X'F5F5'. (If we were not using ASSIST, they might contain anything at all.) These are called slack bytes

The next three fullwords are at addresses 20(0,15), 24(0,15) and 28(0,15). Presumably these are fullword variables, as each address is used in one of the instructions that requires a fullword.

The whole tiny program looks like this:

00000000    MAIN     CSECT
00000000             USING MAIN,15
00000000             L     3,20(0,15)
00000004             LA    4,24(0,15)
00000008             M     2,0(0,4)
0000000C             ST    3,28(0,15)
00000010             BCR   B'1111',14          
00000012             DS    2C
00000014             DC    F'25'
00000018             DC    F'13'
0000001C             DS    F
                     END   MAIN

Notice that there are no labels here. All the addresses are explicit addresses. You might find it easier to read if you invented names for the variables:

00000000    MAIN     CSECT
00000000             USING MAIN,15
00000000             L     3,X
00000004             LA    4,Y
00000008             M     2,0(0,4)
0000000C             ST    3,Z
00000010             BCR   B'1111',14
00000012             DS    2C
00000014    X        DC    F'25'
00000018    Y        DC    F'13'
0000001C    Z        DS    F
                     END   MAIN

We have no way of knowing, however, what names the original programmer may have used. In fact, there may have been a literal:

         L     3,=F'25'

provided there is the line "LTORG" immediately after "BR 14".


The machine, of course, is not doing this to recreate the original code but to execute it a line at a time. It operates an instruction cycle:

Look at a byte which should be an operation code.
If it is not an operation code 
  then
    ABEND with an Operations Exception
  else
    Figure out how many bytes are needed
    Update the "instruction length code" in the PSW
    Update the "address of next instruction" in the PSW
    Execute the instruction
    If appropriate, update the "condition code" in the PSW
End-If
Start over

The cycle is an endless loop. Even when the program ends, it does so by branching to the address of an instruction belonging to whoever launched the program (the operating system or ASSIST) with "BR 14".

If we want to do fancier things, there are some refinements to the instruction cycle, but this will do for now.