.. _NewMaintainer_fixing-dgreaterzero: Fixing :command:`D0>` ===================== So here I am, the new maintainer of AmForth. I will do my very best. Error Report ............ In August 2019 the mailing list received this simple error report. :: From: Martin Nicholas via Amforth-devel Subject: [Amforth] Missing DU< ... Also, a bug in D0>: Hmmm, something wrong here I feel: > (ATmega2560)> decimal 1553994000. d0> . 1572137999. d0> . > -1 0 ok At the time I did not have a good idea, on how to handle this, let alone how to fix any error uncovered. But times are achanging. I decided to publish my path, attempts, insights, and decisions in the hope that interested folks can see *it's not rocket science at all*. And I opted for English text in the hope to reach a larger audience. All errors in this text are mine, I'm afraid. Reproducing the Error ..................... When preparing for the annual (german speaking) `Forth Tagung 2020 <http://wiki.forth-ev.de/doku.php>`__ (see also `Tagungen <http://wiki.forth-ev.de/doku.php/events:start>`__) --- which was replaced by a video conference like so many others --- I started to dig in. A few things were understood quickly: * There is an assembly version of :command:`d0>`, which exhibits the bug. * There is a pure Forth version, which works correctly. * The sign of the lower word was apparently used to derive the answer, which seemed odd. At this point I understood the assembly code only partly. Adding Test Cases ................. One way to better understand misbehaviour comes through the addition of a useful set of test cases. In order to compare the Forth word with the assembly function, we add the Forth word under a different name: .. code-block:: forth \ file: lib/forth2012/double/d-greater-zero.frt \ #require d-less-zero.frt : d0< nip 0< ; : d0>v1 ( d -- f ) 2dup or >r \ not equal zero d0< 0= r> and \ and not less zero 0= 0= \ normalize to 0/-1 flag ; There are simpler ways to declare :command:`d0>`, however, we are not going to change two things at a time, do we? Thanks. Then we include the Hayes Tester .. code-block:: forth #include lib/forth2012/tester/tester-amforth.frt The testcases I came up with, since we had already observed that the sign of the lower word did influence the result, are these: .. code-block:: forth TESTING d0> t{ 0. d0> -> 0 }t t{ 1. d0> -> -1 }t t{ $7FFF. d0> -> -1 }t t{ $8000. d0> -> -1 }t t{ $8001. d0> -> -1 }t t{ $10000. d0> -> -1 }t t{ $00000000. d0> -> 0 }t t{ $00000008. d0> -> -1 }t t{ $00000080. d0> -> -1 }t t{ $00000800. d0> -> -1 }t t{ $00008000. d0> -> -1 }t t{ $00080000. d0> -> -1 }t t{ $00800000. d0> -> -1 }t t{ $08000000. d0> -> -1 }t t{ $80000000. d0> -> 0 }t t{ $80000000. d0> -> 0 }t t{ $80000008. d0> -> 0 }t t{ $80000080. d0> -> 0 }t t{ $80000800. d0> -> 0 }t t{ $80008000. d0> -> 0 }t t{ $80080000. d0> -> 0 }t t{ $80800000. d0> -> 0 }t t{ $88000000. d0> -> 0 }t t{ $FFFFFFFF. d0> -> 0 }t t{ $FFFF7FFF. d0> -> 0 }t These testcases were repeated substituting :command:`d0>` with :command:`d0>v1` or whatever word was going to be inspected. The result was as expected: Failed tests wherever the MostSignificantBit of both halfs of the double word argument were set. .. code-block:: none \ somewhat edited for fewer lines > ver amforth 6.8 ATmega644P ok > TESTING d0> ok > t{ 0. d0> -> 0 }t ok > t{ 1. d0> -> -1 }t ok > t{ $7FFF. d0> -> -1 }t ok > t{ $8000. d0> -> -1 }t INCORRECT RESULT: t{ $8000. d0> -> -1 }t ok > t{ $8001. d0> -> -1 }t INCORRECT RESULT: t{ $8001. d0> -> -1 }t ok > t{ $10000. d0> -> -1 }t ok > t{ $00000000. d0> -> 0 }t ok > t{ $00000008. d0> -> -1 }t ok > t{ $00000080. d0> -> -1 }t ok > t{ $00000800. d0> -> -1 }t ok > t{ $00008000. d0> -> -1 }t INCORRECT RESULT: t{ $00008000. d0> -> -1 }t ok > t{ $00080000. d0> -> -1 }t ok > t{ $00800000. d0> -> -1 }t ok > t{ $08000000. d0> -> -1 }t ok > t{ $80000000. d0> -> 0 }t INCORRECT RESULT: t{ $80000000. d0> -> 0 }t ok > t{ $80000000. d0> -> 0 }t INCORRECT RESULT: t{ $80000000. d0> -> 0 }t ok > t{ $80000008. d0> -> 0 }t INCORRECT RESULT: t{ $80000008. d0> -> 0 }t ok > t{ $80000080. d0> -> 0 }t INCORRECT RESULT: t{ $80000080. d0> -> 0 }t ok > t{ $80000800. d0> -> 0 }t INCORRECT RESULT: t{ $80000800. d0> -> 0 }t ok > t{ $80008000. d0> -> 0 }t ok > t{ $80080000. d0> -> 0 }t INCORRECT RESULT: t{ $80080000. d0> -> 0 }t ok > t{ $80800000. d0> -> 0 }t INCORRECT RESULT: t{ $80800000. d0> -> 0 }t ok > t{ $88000000. d0> -> 0 }t INCORRECT RESULT: t{ $88000000. d0> -> 0 }t ok > t{ $FFFFFFFF. d0> -> 0 }t ok > t{ $FFFF7FFF. d0> -> 0 }t INCORRECT RESULT: t{ $FFFF7FFF. d0> -> 0 }t ok time: 9.46132898331 seconds Adding a new Function and the Joys of ``rjmp`` .............................................. So I set out to add another assembly function :command:`d0>e0` to my AmForth-System, starting with a copy of :command:`d0>`. I created a new file ``words/ew-d-greaterzero.asm`` and added its name to ``dict_appl.inc``. The first round of error messages: .. code-block:: none .../avr8\words/d-greaterzero.asm(4): error: Duplicate label: 'VE_DGREATERZERO' .../avr8\words/d-greaterzero.asm(9): error: Duplicate label: 'XT_DGREATERZERO' .../avr8\words/d-greaterzero.asm(11): error: Duplicate label: 'PFA_DGREATERZERO' This is ok, because these labels are now used twice. So we rename them in the additional definition. The second round of error messages is a little more subtle: .. code-block:: none words/ew-d-greaterzero.asm(17): error: Relative branch out of reach words/ew-d-greaterzero.asm(18): error: Relative branch out of reach words/ew-d-greaterzero.asm(19): error: Relative branch out of reach Oh my! After staring at it for a bit it dawned on me, that the *tail call optimization*, i.e. ``rjmp PFA_ZERO1`` did not work, because the new word was included too far away for the available address range of ``rjmp``; it could not reach ``PFA_ZERO1`` or ``PFA_TRUE1``. I solved this by copying the relevant code and changing the labels. Including this function into the ``nrww``-section did not work immediately, so I decided to copy the missing pieces. .. code-block:: asm VE_DGREATERZERO_E0: .dw $ff05 .db "d0>e0",0 .dw VE_HEAD .set VE_HEAD = VE_DGREATERZERO_E0 XT_DGREATERZERO_E0: .dw PFA_DGREATERZERO_E0 PFA_DGREATERZERO_E0: cp tosl, zerol cpc tosh, zeroh loadtos cpc tosl, zerol cpc tosh, zeroh brlt PFA_ZERO_EW1 ; test negative flag brbs 1, PFA_ZERO_EW1 ; test zero flag rjmp PFA_TRUE_EW1 ;;; FALSE PFA_ZERO_EW1: movw tosl, zerol jmp_ DO_NEXT ;;; TRUE PFA_TRUE_EW1: ser tosl ser tosh jmp_ DO_NEXT This code could be assembled and loaded. Test cases for :command:`d0>e0` did produce the same errors as the original :command:`d0>` --- so we were good to go. Unveiling the Error ................... Reading the AVR Instruction Set Document did not immediately reveal, why things went wrong. It occured to me that maybe loading the lower half of the argument later was somehow producing an undesired effect. So I copied the most significant word into temporary registers ``temp0`` and ``temp1``, then called ``loadtos``. Now all four bytes were available for inspection. Then I did the comparison against ``zerol`` of all bytes, but in a different order: from least significant byte to most significant byte. This was a change from the original function! .. code-block:: asm VE_DGREATERZERO_E0: .dw $ff05 .db "d0>e0",0 .dw VE_HEAD .set VE_HEAD = VE_DGREATERZERO_E0 XT_DGREATERZERO_E0: .dw PFA_DGREATERZERO_E0 PFA_DGREATERZERO_E0: mov temp1, tosh ; copy high word to temp space mov temp0, tosl loadtos ; load low word cp tosl, zerol ; compare against zero, start from LSByte cpc tosh, zeroh ; . order is significant cpc temp0, zerol ; . because we test "less than" (brlt) cpc temp1, zeroh ; . brlt PFA_ZERO_EW1 ; if the MSBit of d:arg is set (negative), we are done (false). brbs 1, PFA_ZERO_EW1 ; if all 4 Bytes of d:arg are zero, we are done (false). rjmp PFA_TRUE_EW1 ; if we get this far, d:arg was positive! (true) ;;; FALSE PFA_ZERO_EW1: movw tosl, zerol jmp_ DO_NEXT ;;; TRUE PFA_TRUE_EW1: ser tosl ser tosh jmp_ DO_NEXT And to my surprise and relief, this function passed all tests! But why? Well, after some more staring it dawned on me. The original code did inspect the four bytes in the order ``word_H.l word_H.h word_L.l word_L.h``. The last byte inspected would determine, whether the MSBit was set or not. If it was set, then the argument was negative, right? The last byte inspected originally was ``word_L.h`` --- that explains the error. Testing the ``zero flag`` does not depend on the order of inspection, but testing the ``less than flag`` does. But can we do better? ..................... Now we could commit this function and be done. However: copying the high word seems like a waste of cycles somehow, doesn't it? Yes it does. If we just inspect ``word_H.h`` and see if that is negative, we are done already, right? Yes. So can't we exit prematurely then? Of course, we can. .. code-block:: asm ... PFA_DGREATERZERO_E1: cp tosh, zeroh brlt PFA_ZERO_EW1 ; if the MSBit of d:arg ist negative, we are done (false). ... Well --- the test cases produced funny results, of course. That is why they are repeatable with almost no effort! While we can certainly decide on the MSBit, we should clean up the stack before exiting. .. code-block:: asm ... PFA_DGREATERZERO_E1: cp tosh, zeroh brlt PFA_DGREATERZERO_E2 ; if the MSBit of d:arg ist negative, we are done (false). ... PFA_DGREATERZERO_E2: loadtos rjmp PFA_ZERO_EW1 This works, the new branch corresponds to ``drop 0``. But then Bernd came along and said: Why don't you use ``zero nip`` instead? Well, yes I could indeed. In the end, I counted the instructions and decided for that. .. code-block:: asm ... PFA_DGREATERZERO_E2: movew tosl, zerol rjmp PFA_NIP_EW1 ;;; NIP PFA_NIP_EW1: adiw yl, 2 jmp_ DO_NEXT Fixing ``avr8/words/g-greaterzero.asm`` ....................................... So, what does :command:`d0>` really need to do? #. **If** the highest bit of the double word argument on the stack is set, this number is negative and we are done with the result ``false``. Well almost --- we either need to ``drop zero`` or to ``zero nip`` to get the stack right. #. **Else If** all (four) bytes of the double word argument are zero, then the argument was zero, the answer is ``false`` and we are done. #. **Else** we have a positive argument and the result is ``true``. So the changed version looks like this now: .. code-block:: asm ; ( d -- flag ) ; Compare ; compares if a double double cell number is greater 0 VE_DGREATERZERO: .dw $ff03 .db "d0>",0 .dw VE_HEAD .set VE_HEAD = VE_DGREATERZERO XT_DGREATERZERO: .dw PFA_DGREATERZERO PFA_DGREATERZERO: cp tosh, zeroh brlt PFA_DGREATERZERO_FALSE ; if MSBit is set, d:arg is negative, we are done (false). cpc tosl, zerol loadtos cpc tosl, zerol cpc tosh, zeroh brbs 1, PFA_ZERO1 ; if all 4 Bytes of d:arg are zero, we are done (false). rjmp PFA_TRUE1 ; if we get this far, d:arg was positive! (true) PFA_DGREATERZERO_FALSE: movw tosl, zerol ; ZERO rjmp PFA_NIP ; NIP This roughly corresponds to a Forth version like this .. code-block:: forth : d0> ( d -- f ) dup $8000 and if drop false nip \ d is negative else 0= swap 0= and if false \ d is zero else true \ d is positive then then ; Epilogue ........ As usual: *Afterwards, everything is obvious!* I would like to thank Martin Nicholas for reporting this, Tristan for adding a few observations, Bernd and Anton for helpful comments. This code is going to be the first commit on the AmForth repository as the new maintainer.