Fixing D0>

So here I am, the new maintainer of AmForth. I will do my very best.

Error Report

In August 2019 the mailing list received this simple error report.

From: Martin Nicholas via Amforth-devel
Subject: [Amforth] Missing DU<
...
Also, a bug in D0>:
Hmmm, something wrong here I feel:

> (ATmega2560)> decimal  1553994000. d0> .   1572137999. d0> .
> -1 0  ok

At the time I did not have a good idea, on how to handle this, let alone how to fix any error uncovered. But times are achanging. I decided to publish my path, attempts, insights, and decisions in the hope that interested folks can see it’s not rocket science at all. And I opted for English text in the hope to reach a larger audience. All errors in this text are mine, I’m afraid.

Reproducing the Error

When preparing for the annual (german speaking) Forth Tagung 2020 (see also Tagungen) — which was replaced by a video conference like so many others — I started to dig in. A few things were understood quickly:

  • There is an assembly version of d0>, which exhibits the bug.
  • There is a pure Forth version, which works correctly.
  • The sign of the lower word was apparently used to derive the answer, which seemed odd.

At this point I understood the assembly code only partly.

Adding Test Cases

One way to better understand misbehaviour comes through the addition of a useful set of test cases. In order to compare the Forth word with the assembly function, we add the Forth word under a different name:

\ file: lib/forth2012/double/d-greater-zero.frt
\ #require d-less-zero.frt
: d0< nip 0< ;
: d0>v1 ( d -- f )
  2dup or >r     \ not equal zero
  d0<  0= r> and \ and not less zero
  0= 0=          \ normalize to 0/-1 flag
;

There are simpler ways to declare d0>, however, we are not going to change two things at a time, do we? Thanks. Then we include the Hayes Tester

#include lib/forth2012/tester/tester-amforth.frt

The testcases I came up with, since we had already observed that the sign of the lower word did influence the result, are these:

TESTING d0>

t{         0. d0> ->  0 }t
t{         1. d0> -> -1 }t
t{     $7FFF. d0> -> -1 }t
t{     $8000. d0> -> -1 }t
t{     $8001. d0> -> -1 }t
t{    $10000. d0> -> -1 }t

t{ $00000000. d0> ->  0 }t
t{ $00000008. d0> -> -1 }t
t{ $00000080. d0> -> -1 }t
t{ $00000800. d0> -> -1 }t
t{ $00008000. d0> -> -1 }t
t{ $00080000. d0> -> -1 }t
t{ $00800000. d0> -> -1 }t
t{ $08000000. d0> -> -1 }t
t{ $80000000. d0> ->  0 }t

t{ $80000000. d0> ->  0 }t
t{ $80000008. d0> ->  0 }t
t{ $80000080. d0> ->  0 }t
t{ $80000800. d0> ->  0 }t
t{ $80008000. d0> ->  0 }t
t{ $80080000. d0> ->  0 }t
t{ $80800000. d0> ->  0 }t
t{ $88000000. d0> ->  0 }t

t{ $FFFFFFFF. d0> ->  0 }t
t{ $FFFF7FFF. d0> ->  0 }t

These testcases were repeated substituting d0> with d0>v1 or whatever word was going to be inspected. The result was as expected: Failed tests wherever the MostSignificantBit of both halfs of the double word argument were set.

\ somewhat edited for fewer lines
> ver
amforth 6.8 ATmega644P ok
> TESTING d0>                ok
> t{         0. d0> ->  0 }t ok
> t{         1. d0> -> -1 }t ok
> t{     $7FFF. d0> -> -1 }t ok
> t{     $8000. d0> -> -1 }t INCORRECT RESULT: t{     $8000. d0> -> -1 }t ok
> t{     $8001. d0> -> -1 }t INCORRECT RESULT: t{     $8001. d0> -> -1 }t ok
> t{    $10000. d0> -> -1 }t ok
> t{ $00000000. d0> ->  0 }t ok
> t{ $00000008. d0> -> -1 }t ok
> t{ $00000080. d0> -> -1 }t ok
> t{ $00000800. d0> -> -1 }t ok
> t{ $00008000. d0> -> -1 }t INCORRECT RESULT: t{ $00008000. d0> -> -1 }t ok
> t{ $00080000. d0> -> -1 }t ok
> t{ $00800000. d0> -> -1 }t ok
> t{ $08000000. d0> -> -1 }t ok
> t{ $80000000. d0> ->  0 }t INCORRECT RESULT: t{ $80000000. d0> ->  0 }t ok
> t{ $80000000. d0> ->  0 }t INCORRECT RESULT: t{ $80000000. d0> ->  0 }t ok
> t{ $80000008. d0> ->  0 }t INCORRECT RESULT: t{ $80000008. d0> ->  0 }t ok
> t{ $80000080. d0> ->  0 }t INCORRECT RESULT: t{ $80000080. d0> ->  0 }t ok
> t{ $80000800. d0> ->  0 }t INCORRECT RESULT: t{ $80000800. d0> ->  0 }t ok
> t{ $80008000. d0> ->  0 }t ok
> t{ $80080000. d0> ->  0 }t INCORRECT RESULT: t{ $80080000. d0> ->  0 }t ok
> t{ $80800000. d0> ->  0 }t INCORRECT RESULT: t{ $80800000. d0> ->  0 }t ok
> t{ $88000000. d0> ->  0 }t INCORRECT RESULT: t{ $88000000. d0> ->  0 }t ok
> t{ $FFFFFFFF. d0> ->  0 }t ok
> t{ $FFFF7FFF. d0> ->  0 }t INCORRECT RESULT: t{ $FFFF7FFF. d0> ->  0 }t ok
time:  9.46132898331  seconds

Adding a new Function and the Joys of rjmp

So I set out to add another assembly function d0>e0 to my AmForth-System, starting with a copy of d0>. I created a new file words/ew-d-greaterzero.asm and added its name to dict_appl.inc. The first round of error messages:

.../avr8\words/d-greaterzero.asm(4): error: Duplicate label: 'VE_DGREATERZERO'
.../avr8\words/d-greaterzero.asm(9): error: Duplicate label: 'XT_DGREATERZERO'
.../avr8\words/d-greaterzero.asm(11): error: Duplicate label: 'PFA_DGREATERZERO'

This is ok, because these labels are now used twice. So we rename them in the additional definition. The second round of error messages is a little more subtle:

words/ew-d-greaterzero.asm(17): error: Relative branch out of reach
words/ew-d-greaterzero.asm(18): error: Relative branch out of reach
words/ew-d-greaterzero.asm(19): error: Relative branch out of reach

Oh my! After staring at it for a bit it dawned on me, that the tail call optimization, i.e. rjmp PFA_ZERO1 did not work, because the new word was included too far away for the available address range of rjmp; it could not reach PFA_ZERO1 or PFA_TRUE1. I solved this by copying the relevant code and changing the labels. Including this function into the nrww-section did not work immediately, so I decided to copy the missing pieces.

VE_DGREATERZERO_E0:
    .dw $ff05
    .db "d0>e0",0
    .dw VE_HEAD
    .set VE_HEAD = VE_DGREATERZERO_E0
XT_DGREATERZERO_E0:
    .dw PFA_DGREATERZERO_E0
PFA_DGREATERZERO_E0:
    cp tosl, zerol
    cpc tosh, zeroh
    loadtos
    cpc tosl, zerol
    cpc tosh, zeroh
    brlt PFA_ZERO_EW1           ; test negative flag
    brbs 1, PFA_ZERO_EW1        ; test zero flag
    rjmp PFA_TRUE_EW1

;;; FALSE
PFA_ZERO_EW1:
    movw tosl, zerol
    jmp_ DO_NEXT

;;; TRUE
PFA_TRUE_EW1:
    ser tosl
    ser tosh
    jmp_ DO_NEXT

This code could be assembled and loaded. Test cases for d0>e0 did produce the same errors as the original d0> — so we were good to go.

Unveiling the Error

Reading the AVR Instruction Set Document did not immediately reveal, why things went wrong. It occured to me that maybe loading the lower half of the argument later was somehow producing an undesired effect. So I copied the most significant word into temporary registers temp0 and temp1, then called loadtos. Now all four bytes were available for inspection.

Then I did the comparison against zerol of all bytes, but in a different order: from least significant byte to most significant byte. This was a change from the original function!

VE_DGREATERZERO_E0:
    .dw $ff05
    .db "d0>e0",0
    .dw VE_HEAD
    .set VE_HEAD = VE_DGREATERZERO_E0
XT_DGREATERZERO_E0:
    .dw PFA_DGREATERZERO_E0
PFA_DGREATERZERO_E0:

    mov temp1, tosh             ; copy high word to temp space
    mov temp0, tosl
    loadtos                     ; load low word
    cp  tosl,  zerol            ; compare against zero, start from LSByte
    cpc tosh,  zeroh            ; . order is significant
    cpc temp0, zerol            ; . because we test "less than" (brlt)
    cpc temp1, zeroh            ; .

    brlt PFA_ZERO_EW1           ; if the MSBit of d:arg is set (negative), we are done (false).
    brbs 1, PFA_ZERO_EW1        ; if all 4 Bytes of d:arg are zero, we are done (false).

    rjmp PFA_TRUE_EW1           ; if we get this far, d:arg was positive! (true)

;;; FALSE
PFA_ZERO_EW1:
    movw tosl, zerol
    jmp_ DO_NEXT

;;; TRUE
PFA_TRUE_EW1:
    ser tosl
    ser tosh
    jmp_ DO_NEXT

And to my surprise and relief, this function passed all tests! But why?

Well, after some more staring it dawned on me. The original code did inspect the four bytes in the order word_H.l word_H.h word_L.l word_L.h. The last byte inspected would determine, whether the MSBit was set or not. If it was set, then the argument was negative, right? The last byte inspected originally was word_L.h — that explains the error.

Testing the zero flag does not depend on the order of inspection, but testing the less than flag does.

But can we do better?

Now we could commit this function and be done. However: copying the high word seems like a waste of cycles somehow, doesn’t it? Yes it does. If we just inspect word_H.h and see if that is negative, we are done already, right? Yes. So can’t we exit prematurely then? Of course, we can.

...
PFA_DGREATERZERO_E1:
        cp  tosh, zeroh
        brlt PFA_ZERO_EW1       ; if the MSBit of d:arg ist negative, we are done (false).
...

Well — the test cases produced funny results, of course. That is why they are repeatable with almost no effort! While we can certainly decide on the MSBit, we should clean up the stack before exiting.

...
PFA_DGREATERZERO_E1:
        cp  tosh, zeroh
        brlt PFA_DGREATERZERO_E2 ; if the MSBit of d:arg ist negative, we are done (false).
...

PFA_DGREATERZERO_E2:
        loadtos
        rjmp PFA_ZERO_EW1

This works, the new branch corresponds to drop 0.

But then Bernd came along and said: Why don’t you use zero nip instead? Well, yes I could indeed. In the end, I counted the instructions and decided for that.

...
PFA_DGREATERZERO_E2:
        movew tosl, zerol
        rjmp PFA_NIP_EW1

;;; NIP
PFA_NIP_EW1:
    adiw yl, 2
    jmp_ DO_NEXT

Fixing avr8/words/g-greaterzero.asm

So, what does d0> really need to do?

  1. If the highest bit of the double word argument on the stack is set, this number is negative and we are done with the result false. Well almost — we either need to drop zero or to zero nip to get the stack right.
  2. Else If all (four) bytes of the double word argument are zero, then the argument was zero, the answer is false and we are done.
  3. Else we have a positive argument and the result is true.

So the changed version looks like this now:

; ( d -- flag )
; Compare
; compares if a double double cell number is greater 0
VE_DGREATERZERO:
    .dw $ff03
    .db "d0>",0
    .dw VE_HEAD
    .set VE_HEAD = VE_DGREATERZERO
XT_DGREATERZERO:
    .dw PFA_DGREATERZERO
PFA_DGREATERZERO:
    cp tosh, zeroh
    brlt PFA_DGREATERZERO_FALSE ; if MSBit is set, d:arg is negative, we are done (false).
    cpc tosl, zerol
    loadtos
    cpc tosl, zerol
    cpc tosh, zeroh
    brbs 1, PFA_ZERO1           ; if all 4 Bytes of d:arg are zero, we are done (false).
    rjmp PFA_TRUE1              ; if we get this far, d:arg was positive! (true)
PFA_DGREATERZERO_FALSE:
    movw tosl, zerol            ; ZERO
    rjmp PFA_NIP                ; NIP

This roughly corresponds to a Forth version like this

: d0> ( d -- f )
    dup $8000 and if
        drop false nip     \ d is negative
    else
        0= swap 0= and if
            false          \ d is zero
        else
            true           \ d is positive
        then
    then
;

Epilogue

As usual: Afterwards, everything is obvious!

I would like to thank Martin Nicholas for reporting this, Tristan for adding a few observations, Bernd and Anton for helpful comments. This code is going to be the first commit on the AmForth repository as the new maintainer.