MOV AX, 0 L: ADD AX, BX ADC DX, 0 LOOP L
The code does an unsigned multiply DX:AX = BX*CX
The following code does the same operation:
MUL CX
MOV AX, 0 L: SHR BX, 1 ADC AX, 0 ADD BX, 0 JNZ L
We wish to write this code without using any form of LOOP, and with no conditional jumps (we can use other forms of JMP, however). Assume that CS=SS=DS=SS for simplicity, and write a code segment that provides the same results for AX (but may use or destroy other registers). You may NOT "unfold" the above loop! Note: the basic idea suffices here. Bonus for the full implementation.
There are at least 3 ways to do it: (1) jump table (2) interrupt on
overflow (3) self-modifying code. These work as follows:
The jump table should be:
MOV DX, BX
ADD DX, 0FFFFH
MOV DX, 0
RCL DX ; DX is 1 if BX was not 0, and is 0 otherwise
RCL DX ; and is now 2 or 0, respectively
ADD DX, OFFSET TBL
JMP DX
L1:
TBL DW OFFSET L1, OFFSET L
MOV DX, BX
ADD DX, 0FFFFH
MOV DX, 0
CMC ; Invert carry flag
RCL DX
ADD DX, 07FFF ; Generates an oveflow unless DX is 0
INTO ; Interrupt on overflow
L0: JMP L
L1:
Several more things need to be done - (1) setting up the interrupt vector,
(saving its previous value elsewhere) and
(2) writing the interrupt routine. Now (1) is simple (and assumes that we
are ALLOWED to do it!), but the interrupt routine should do the following:
(1) Restore the interrupt vector to its original value, (2) Change the return
address on the stack to L1 (it was L0 during the interrupt), and (3)
return from the interrupt.
MOV DX, BX
ADD DX, 0FFFFH
CMC
MOV DL, 0
RCL DL, 5 ; Get 32 in DX if we want to exit the loop
ADD L0+1, DL ; Modify code - add 32 (or 0) to branch address
L0: JMP SHORT L ; Short is essential - so address is a byte!
NOP
... ; A sufficient number of NOPs so that L+32
NOP ; is in the NOPs...
SUB L0+1, DL ; Restore code to original
MOV AL,0 XOR AL, AL SUB AL, AL AND AL, 0 SHL AL, 8 MOV AX, 0 ; As well as anything else that zeroes AX
OR AL, 0FFH INC AL
X DB 1 Y DB 7
ADD Y, 5 SUB X, 2
If we know that subtracting 2 from X causes no overflow, is there a shorter way to do that? How?
Yes, we can do it with one instruction on a 16-bit WORD operand. The requirement for no overflow is to assure correctness, because we do NOT want a carry from Y into X! Code is an addition, as follows (can also be done with subtraction - a different immediate operand):
ADD WORD PTR X, 0FE05 ; Add 5 to low operand (Y), and ; add 0FE = -2 to high operand (X)
F: MOVE.L D0, -(A7) ; D0 to memory - predecrement mode SUBQ.L #1, D0 ; Subtract immediate - long (32 bit) operand BMI N ; Branch (jump) if result was negative JSR F ; CALL subroutine F (push PC then jump to F) MULS.L (A7)+, D1:D2 ; Signed multiply memory (postincrement) ; with D2, result in D1:D2 RTS ; Return from procedure/subroutine N: MOVE.L (A7)+, D0 ; Move memory to D0 - postincrement mode MOVEQ.L #0, D1 ; Move immediate to D1 MOVEQ.L #1, D2 RTS
What happens if we execute an instruction JSR F, with D0=0? D0=1? Other values of D0 (call that value k)?
The code pushes its argument, D0, onto the stack, and restores it before returning. Also, We get to N if D0 was 0 - and return a 64-bit quantity, 1 in this case. This is the base case of the recursive call. In all other cases, we multiply the argument D0 (popped from stack) by whatever value was returned by the same function, but with the argument, k=D0, decremented by 1. That is, the function computes: IF k=0, return 1. Else, return F(k-1)*k. This is exactly the definition of factorial, k!
a. ADD CL, AL Here and elsewhere, the extra result bit on the left denotes the carry. No overflow unless stated... 10000101 + 10101111 ----------- 1 00110100 b. ADC AL, 0E5H 1 + 10101111 + 11100101 ----------- 1 10010101 c. ADD AX, 0045H 0000000010101111 + 0000000001000101 -------------------- 0 0000000011110100 d. SUB BL, 69H 01001110 - 01101001 ------------ 1 11100101 e. RCR AX, 1 1 1000000001010111 OF = 1
Each set is a subset of [1..16] is represented by corresponding bits in the register (e.g., AX=1000000000000101 represents {1,3,16}).
Use the following table. Each entry contains a single bit. The index into the table selects which bit is set (e.g., the value at index zero has bit zero set).
BitTbl dw 1, 2, 4, 8 dw 10h, 20h, 40h, 80h dw 100h, 200h, 400h, 800h dw 1000h, 2000h, 4000h, 8000h a. Remove - removes the specified item from the set. BX contains a value in the range 0..15. AX contains a set. ADD BX, BX MOV BX, [BitTbl+BX] ; Get 1 in position specified by element XOR AX, BX ; remove from set b. Odd - Even sets the zero flag if AX contains a set of numbers that are all odd. AND AX, 5555H ; Mask (get rid) of odd bits, zero flag ; is set if there are no evens ADD AX, 0FFFFH ; create carry if AX was non-0 MOV AX, 0FFFFH ADC AX, 0 ; Get 0 only if AX was non-0 c. Member - Member clears the zero flag if BX is an element of the AX set, it sets the zero flag otherwise. BX contains a value in the range 0..15. AX contains a set. ADD BX, BX MOV BX, [BitTbl+BX] ; Get 1 in position specified by element AND BX, AX ; Mask to get only relevant bit (affects ZF) d. UnionSets - Union computes AX := AX union BX. AX and BX contain the sets. OR AX, BX e. Intersection - Intersection computes AX := AX intersect BX. AX and BX contain the sets. AND AX, BX f. Complement - Complement computes AX := - AX, that is, all elements in the set are removed, and all elements not in the set are added. NOT AX
The table is: BitTbl: .word 1, 2, 4, 8 .word 0x10, 0x20, 0x40, 0x80 .word 0x100, 0x200, 0x400, 0x800 .word 0x1000, 0x2000, 0x4000, 0x8000 .word 0x10000, 0x20000, 0x40000, 0x80000 .word 0x100000, 0x200000, 0x400000, 0x800000 .word 0x1000000, 0x2000000, 0x4000000, 0x8000000 .word 0x10000000, 0x20000000, 0x40000000, 0x80000000 a) Remove: la $t0, BitTbl ; Address of table to $t0 sll $t1, $a1, 2 ; Multiply $a1 by 4 add $t1, $t1, $t0 ; Compute address in table lw $t2, 0($t1) ; Get word from table and $v0, $a0, $a1 b) Odd: and $t0, $a0, 0x55555555 seq $v0, $zero, $t0 ; Set $v0 to 1 if result was 0 c) Member: la $t0, BitTbl ; Address of table to $t0 sll $t1, $a1, 2 ; Multiply $a1 by 4 add $t1, $t1, $t0 ; Compute address in table lw $t2, 0($t1) ; Get word from table and $v0, $t2, $a0 sne $v0, $zero, $v0 ; Set $v0 to 1 if $v0 not 0. d) Union: or $v0, $a0, $a1 e) Intersection: and $v0, $a0, $a1 f) Complement: not $v0, $a0
Initially, $t0 (which is the output value) is initialized to 0, and $t1 to 1. Now, the program loops as long as $t1 is less than $a0 (or n). Each time, the accumulator $t0 is incremented by $t1, which is then incremented by 2, in turn. Thus, the sequence of values of $t1 is: 1, 3, 5, 7, 9 and that of $t0 is 0, 1, 4, 9, 16 ...
To sum up, the code computes the square of n.
Note that the clock rate is irrelevant. Now, the answer depends on the semantics of the statement: "Any imporovment on the Graphics instructions worsen the Float instructions by 25% of the same improvement." which is ambiguous. Assuming we read this: improving Graphics by DECREASING the runtime of each instruction by s factor of x, we INCREASE the runtime of each Float instruction by 0.25x of its timing. In this case, the average time per instruction is LINEAR in x, and the result must be at some hard limit - in this case when Graphics instructions take ZERO time. Consider a typical time unit in the original processor. With the new processor the timing will be: 0.2 + 0.5 (1-x) + 0.3 (1+0.25)x. With x=1 (Graphic instructions infinitely fast) we have, the time for the same run will be 0.2+0+0.375 = 0.575, or saving 42.5% of the execution time. (Note that some people will call this: "74% faster"..., because 1/0.575=1.74...)
Again, clock rate is irrelevant. An average instruction in program Pa takes 3*0.35+4*0.4+6*0.25 = 4.15 cycles. An average instruction in program Pb takes 3*0.4+4*0.3+6*0.4 = 4.8 cycles. So, program Pa is faster.
We need to modify the hardware in 2 ways:
With the above changes, and assuming that there is already an entry in the DISPTCH 1 ROM for the jal instruction, we need to add just 1 micro instruction, for the third cycle of the instruction (i.e. fetch and decode are the same as all other istructions). This new microinstruction will have the fields:
Register control: RegDst = 2, MemtoReg = 2, RegWrite = 1, Sequencing = FETCH
Note that there are many other possible solutions as well.
addi $t0, $zero, 2 addi $t1, $zero, 0 loop: lw $t2, 50($t1) slt $t3, $t2, $s3 sw $t3, 300($t1) addi $t1, $t1, 1 bne $t1, $t0, loop
Address Instruction Mnemonic Assembler 0x00001000 0x20080002 addi $8, $0, 2 ; 1: addi $t0, $zero, 2 0x00001004 0x20090000 addi $9, $0, 0 ; 2: addi $t1, $zero, 0 0x00001008 0x8d2a0032 lw $10, 50($9) ; 3: lw $t2, 50($t1) 0x0000100c 0x0153582a slt $11, $10, $19 ; 4: slt $t3, $t2, $s3 0x00001010 0xad2b012c sw $11, 300($9) ; 5: sw $t3, 300($t1) 0x00001014 0x21290001 addi $9, $9, 1 ; 6: addi $t1, $t1, 1 0x00001018 0x15283ffc bne $9, $8, -16 [loop-0x1018] ; 7: bne $t1, $t0, loop
First, the individual instructions. addi should take the same as R-type, 4 cycles. Branch takes 3. sw takes 4, and lw takes 5. slt takes 4, as it is an R-type instruction. The loop is executed twice, as $t1 starts at 0 and gets incremented each pass, until it equals $t0, which is 2. Timing diagram is thus:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | addi | addi | lw | slt | sw | addi | bne | F D A R F D A R F D A M R F D A R F D A M F D A R F D J
And the entire string is repeated for 28 more cycles, requiring all in all 56 cycles to execute. Above, F stands for Fetch, D for Decode, A for ALU operation, M for memory access, and R for writing to register.
Sorry, this is hard to do in HTML, so come and ask...
f(0) = f(1) = 1; f(n) = f(n-1)+f(n-2)
Write a recursive procedure for 80X86 that gets n in EAX and returns f(n) in EAX, without destroying values in any other register. You may assume that f(n) will fit in 32 bits. CLUE: if your program is more than 20 instructions, you have done something wrong...
We will assume that the value in EAX is unsigned, and NOT TOO LARGE
(i.e. no overflow).
F: PROC NEAR
CMP EAX, 2
JC DONE
PUSH EBX ; General case. Save EBX (will be used as temp)
DEC EAX
MOV EBX, EAX ; Copy n-1 to EBX as temp
CALL F ; Computes f(n-1)
XCHG EAX, EBX ; Now EAX=n-1, EBX=f(n-1)
DEC EAX
CALL F ; Computes f(n-2)
ADD EAX, EBX ; Adds them
POP EBX ; Restore EBX and return
RET NEAR
DONE: MOV EAX, 1 ; Base case - EAX less than 2
RET NEAR ; I never count on defaults...
ENDP
Assuming that CX strictly greater than 0:
OutDev EQU 10H
MOV DX, 0 ; DX will keep checksum
MOV AH, 0 ; Use as high-order byte of AX operand later on...
L: MOV AL, [BX]
OUT OutDev, AL
ADD DX, AX ; Update checksum
INC BX
LOOP L
MOV AL, DL ; Now output checksum, least-significant first
OUT OutDev, AL
MOV AL, DH
OUT OutDev, AL
For memory mapped IO, if memory address of device register is OutDev,
replace all OUT instructions by
MOV BYTE PTR OutDev, AL
One bit error leads to a wrong checksum. Thus, distance is at least 2. But 2 "complementary" errors in different bytes in the same bit position do not change the checksum, and the distance is also AT MOST 2. Therefore, the distance of this code is exactly 2, i.e. it can be used to detect, but not fix, one error.
Due date: Thursday, January 31, 2002
Note: work and submission in this exercise is SOLO.