Fixing D0>¶
So here I am, the new maintainer of AmForth. I will do my very best.
Error Report¶
In August 2019 the mailing list received this simple error report.
From: Martin Nicholas via Amforth-devel
Subject: [Amforth] Missing DU<
...
Also, a bug in D0>:
Hmmm, something wrong here I feel:
> (ATmega2560)> decimal 1553994000. d0> . 1572137999. d0> .
> -1 0 ok
At the time I did not have a good idea, on how to handle this, let alone how to fix any error uncovered. But times are achanging. I decided to publish my path, attempts, insights, and decisions in the hope that interested folks can see it’s not rocket science at all. And I opted for English text in the hope to reach a larger audience. All errors in this text are mine, I’m afraid.
Reproducing the Error¶
When preparing for the annual (german speaking) Forth Tagung 2020 (see also Tagungen) — which was replaced by a video conference like so many others — I started to dig in. A few things were understood quickly:
- There is an assembly version of d0>, which exhibits the bug.
- There is a pure Forth version, which works correctly.
- The sign of the lower word was apparently used to derive the answer, which seemed odd.
At this point I understood the assembly code only partly.
Adding Test Cases¶
One way to better understand misbehaviour comes through the addition of a useful set of test cases. In order to compare the Forth word with the assembly function, we add the Forth word under a different name:
\ file: lib/forth2012/double/d-greater-zero.frt
\ #require d-less-zero.frt
: d0< nip 0< ;
: d0>v1 ( d -- f )
2dup or >r \ not equal zero
d0< 0= r> and \ and not less zero
0= 0= \ normalize to 0/-1 flag
;
There are simpler ways to declare d0>, however, we are not going to change two things at a time, do we? Thanks. Then we include the Hayes Tester
#include lib/forth2012/tester/tester-amforth.frt
The testcases I came up with, since we had already observed that the sign of the lower word did influence the result, are these:
TESTING d0>
t{ 0. d0> -> 0 }t
t{ 1. d0> -> -1 }t
t{ $7FFF. d0> -> -1 }t
t{ $8000. d0> -> -1 }t
t{ $8001. d0> -> -1 }t
t{ $10000. d0> -> -1 }t
t{ $00000000. d0> -> 0 }t
t{ $00000008. d0> -> -1 }t
t{ $00000080. d0> -> -1 }t
t{ $00000800. d0> -> -1 }t
t{ $00008000. d0> -> -1 }t
t{ $00080000. d0> -> -1 }t
t{ $00800000. d0> -> -1 }t
t{ $08000000. d0> -> -1 }t
t{ $80000000. d0> -> 0 }t
t{ $80000000. d0> -> 0 }t
t{ $80000008. d0> -> 0 }t
t{ $80000080. d0> -> 0 }t
t{ $80000800. d0> -> 0 }t
t{ $80008000. d0> -> 0 }t
t{ $80080000. d0> -> 0 }t
t{ $80800000. d0> -> 0 }t
t{ $88000000. d0> -> 0 }t
t{ $FFFFFFFF. d0> -> 0 }t
t{ $FFFF7FFF. d0> -> 0 }t
These testcases were repeated substituting d0> with d0>v1 or whatever word was going to be inspected. The result was as expected: Failed tests wherever the MostSignificantBit of both halfs of the double word argument were set.
\ somewhat edited for fewer lines
> ver
amforth 6.8 ATmega644P ok
> TESTING d0> ok
> t{ 0. d0> -> 0 }t ok
> t{ 1. d0> -> -1 }t ok
> t{ $7FFF. d0> -> -1 }t ok
> t{ $8000. d0> -> -1 }t INCORRECT RESULT: t{ $8000. d0> -> -1 }t ok
> t{ $8001. d0> -> -1 }t INCORRECT RESULT: t{ $8001. d0> -> -1 }t ok
> t{ $10000. d0> -> -1 }t ok
> t{ $00000000. d0> -> 0 }t ok
> t{ $00000008. d0> -> -1 }t ok
> t{ $00000080. d0> -> -1 }t ok
> t{ $00000800. d0> -> -1 }t ok
> t{ $00008000. d0> -> -1 }t INCORRECT RESULT: t{ $00008000. d0> -> -1 }t ok
> t{ $00080000. d0> -> -1 }t ok
> t{ $00800000. d0> -> -1 }t ok
> t{ $08000000. d0> -> -1 }t ok
> t{ $80000000. d0> -> 0 }t INCORRECT RESULT: t{ $80000000. d0> -> 0 }t ok
> t{ $80000000. d0> -> 0 }t INCORRECT RESULT: t{ $80000000. d0> -> 0 }t ok
> t{ $80000008. d0> -> 0 }t INCORRECT RESULT: t{ $80000008. d0> -> 0 }t ok
> t{ $80000080. d0> -> 0 }t INCORRECT RESULT: t{ $80000080. d0> -> 0 }t ok
> t{ $80000800. d0> -> 0 }t INCORRECT RESULT: t{ $80000800. d0> -> 0 }t ok
> t{ $80008000. d0> -> 0 }t ok
> t{ $80080000. d0> -> 0 }t INCORRECT RESULT: t{ $80080000. d0> -> 0 }t ok
> t{ $80800000. d0> -> 0 }t INCORRECT RESULT: t{ $80800000. d0> -> 0 }t ok
> t{ $88000000. d0> -> 0 }t INCORRECT RESULT: t{ $88000000. d0> -> 0 }t ok
> t{ $FFFFFFFF. d0> -> 0 }t ok
> t{ $FFFF7FFF. d0> -> 0 }t INCORRECT RESULT: t{ $FFFF7FFF. d0> -> 0 }t ok
time: 9.46132898331 seconds
Adding a new Function and the Joys of rjmp
¶
So I set out to add another assembly function d0>e0 to my
AmForth-System, starting with a copy of d0>. I created a
new file words/ew-d-greaterzero.asm
and added its name to
dict_appl.inc
. The first round of error messages:
.../avr8\words/d-greaterzero.asm(4): error: Duplicate label: 'VE_DGREATERZERO'
.../avr8\words/d-greaterzero.asm(9): error: Duplicate label: 'XT_DGREATERZERO'
.../avr8\words/d-greaterzero.asm(11): error: Duplicate label: 'PFA_DGREATERZERO'
This is ok, because these labels are now used twice. So we rename them in the additional definition. The second round of error messages is a little more subtle:
words/ew-d-greaterzero.asm(17): error: Relative branch out of reach
words/ew-d-greaterzero.asm(18): error: Relative branch out of reach
words/ew-d-greaterzero.asm(19): error: Relative branch out of reach
Oh my! After staring at it for a bit it dawned on me, that the tail
call optimization, i.e. rjmp PFA_ZERO1
did not work, because the
new word was included too far away for the available address range of
rjmp
; it could not reach PFA_ZERO1
or PFA_TRUE1
. I solved
this by copying the relevant code and changing the labels. Including
this function into the nrww
-section did not work immediately, so I
decided to copy the missing pieces.
VE_DGREATERZERO_E0:
.dw $ff05
.db "d0>e0",0
.dw VE_HEAD
.set VE_HEAD = VE_DGREATERZERO_E0
XT_DGREATERZERO_E0:
.dw PFA_DGREATERZERO_E0
PFA_DGREATERZERO_E0:
cp tosl, zerol
cpc tosh, zeroh
loadtos
cpc tosl, zerol
cpc tosh, zeroh
brlt PFA_ZERO_EW1 ; test negative flag
brbs 1, PFA_ZERO_EW1 ; test zero flag
rjmp PFA_TRUE_EW1
;;; FALSE
PFA_ZERO_EW1:
movw tosl, zerol
jmp_ DO_NEXT
;;; TRUE
PFA_TRUE_EW1:
ser tosl
ser tosh
jmp_ DO_NEXT
This code could be assembled and loaded. Test cases for d0>e0 did produce the same errors as the original d0> — so we were good to go.
Unveiling the Error¶
Reading the AVR Instruction Set Document did not immediately reveal,
why things went wrong. It occured to me that maybe loading the lower
half of the argument later was somehow producing an undesired effect.
So I copied the most significant word into temporary registers
temp0
and temp1
, then called loadtos
. Now all four bytes
were available for inspection.
Then I did the comparison against zerol
of all bytes, but in a
different order: from least significant byte to most significant byte.
This was a change from the original function!
VE_DGREATERZERO_E0:
.dw $ff05
.db "d0>e0",0
.dw VE_HEAD
.set VE_HEAD = VE_DGREATERZERO_E0
XT_DGREATERZERO_E0:
.dw PFA_DGREATERZERO_E0
PFA_DGREATERZERO_E0:
mov temp1, tosh ; copy high word to temp space
mov temp0, tosl
loadtos ; load low word
cp tosl, zerol ; compare against zero, start from LSByte
cpc tosh, zeroh ; . order is significant
cpc temp0, zerol ; . because we test "less than" (brlt)
cpc temp1, zeroh ; .
brlt PFA_ZERO_EW1 ; if the MSBit of d:arg is set (negative), we are done (false).
brbs 1, PFA_ZERO_EW1 ; if all 4 Bytes of d:arg are zero, we are done (false).
rjmp PFA_TRUE_EW1 ; if we get this far, d:arg was positive! (true)
;;; FALSE
PFA_ZERO_EW1:
movw tosl, zerol
jmp_ DO_NEXT
;;; TRUE
PFA_TRUE_EW1:
ser tosl
ser tosh
jmp_ DO_NEXT
And to my surprise and relief, this function passed all tests! But why?
Well, after some more staring it dawned on me. The original code did
inspect the four bytes in the order word_H.l word_H.h word_L.l
word_L.h
. The last byte inspected would determine, whether the MSBit
was set or not. If it was set, then the argument was negative, right?
The last byte inspected originally was word_L.h
— that explains
the error.
Testing the zero flag
does not depend on the order of inspection,
but testing the less than flag
does.
But can we do better?¶
Now we could commit this function and be done. However: copying the
high word seems like a waste of cycles somehow, doesn’t it? Yes it
does. If we just inspect word_H.h
and see if that is negative, we
are done already, right? Yes. So can’t we exit prematurely then? Of
course, we can.
...
PFA_DGREATERZERO_E1:
cp tosh, zeroh
brlt PFA_ZERO_EW1 ; if the MSBit of d:arg ist negative, we are done (false).
...
Well — the test cases produced funny results, of course. That is why they are repeatable with almost no effort! While we can certainly decide on the MSBit, we should clean up the stack before exiting.
...
PFA_DGREATERZERO_E1:
cp tosh, zeroh
brlt PFA_DGREATERZERO_E2 ; if the MSBit of d:arg ist negative, we are done (false).
...
PFA_DGREATERZERO_E2:
loadtos
rjmp PFA_ZERO_EW1
This works, the new branch corresponds to drop 0
.
But then Bernd came along and said: Why don’t you use zero nip
instead? Well, yes I could indeed. In the end, I counted the
instructions and decided for that.
...
PFA_DGREATERZERO_E2:
movew tosl, zerol
rjmp PFA_NIP_EW1
;;; NIP
PFA_NIP_EW1:
adiw yl, 2
jmp_ DO_NEXT
Fixing avr8/words/g-greaterzero.asm
¶
So, what does d0> really need to do?
- If the highest bit of the double word argument on the stack is
set, this number is negative and we are done with the result
false
. Well almost — we either need todrop zero
or tozero nip
to get the stack right. - Else If all (four) bytes of the double word argument are zero,
then the argument was zero, the answer is
false
and we are done. - Else we have a positive argument and the result is
true
.
So the changed version looks like this now:
; ( d -- flag )
; Compare
; compares if a double double cell number is greater 0
VE_DGREATERZERO:
.dw $ff03
.db "d0>",0
.dw VE_HEAD
.set VE_HEAD = VE_DGREATERZERO
XT_DGREATERZERO:
.dw PFA_DGREATERZERO
PFA_DGREATERZERO:
cp tosh, zeroh
brlt PFA_DGREATERZERO_FALSE ; if MSBit is set, d:arg is negative, we are done (false).
cpc tosl, zerol
loadtos
cpc tosl, zerol
cpc tosh, zeroh
brbs 1, PFA_ZERO1 ; if all 4 Bytes of d:arg are zero, we are done (false).
rjmp PFA_TRUE1 ; if we get this far, d:arg was positive! (true)
PFA_DGREATERZERO_FALSE:
movw tosl, zerol ; ZERO
rjmp PFA_NIP ; NIP
This roughly corresponds to a Forth version like this
: d0> ( d -- f )
dup $8000 and if
drop false nip \ d is negative
else
0= swap 0= and if
false \ d is zero
else
true \ d is positive
then
then
;
Epilogue¶
As usual: Afterwards, everything is obvious!
I would like to thank Martin Nicholas for reporting this, Tristan for adding a few observations, Bernd and Anton for helpful comments. This code is going to be the first commit on the AmForth repository as the new maintainer.