Architecture

Overview

amforth is an indirect threading Forth. The dictionary is on a memory that allows execution of code. The RAM contains buffers, variables and the stacks. Depending on the platform, either the EEPROM or a special section of the flash is used for vital data such as pointers or configuration settings.

The compiler is a classic compiler without any optimization support.

Amforth uses all of the CPU registers internally: The data stack pointer, the instruction pointer, the user pointer, and the Top-Of-Stack cell. The hardware stack is used as the return stack. Some registers are used for temporary data in primitives.

Core System

Threading Model

AmForth implements the classic indirect threaded variant of forth. The registers and their mappings are shown in table Register Mapping.

Inner Interpreter

For the indirect threading model an inner interpreter is needed. The inner interpreter does the interrupt handling too. It repeatedly reads the cell, the IP points to, takes this number as the address for the next code segment and jumps to that code. It is expected that this code segment does a jump back to the inner interpreter (NEXT). The IP is incremented by 1 just before the jumps are done to get the next cell.

Check_Interrupt
W  <- [IP]   ; read at IP
IP <- IP+1   ; advance IP
X  <- [W]    ; EXECUTE phase, W points to execution token
JMP [X]      ; read execution token and execute its code

EXECUTE

This operation is the JUMP. It reads the content of the cell the W register points to. The result is stored in a scratch pad register The data in it is the address of the machine code to be executed in the last step. This step is used by the forth command EXECUTE too. The forth command does not get the address of the next destination from the current IP but from the data stack.

This last step finally jumps to the machine code pointed to by the scratch pad register.

DO COLON

DO COLON (aka NEST) is the subroutine call. It pushes the IP onto the return stack. It then increments W by one cell, so that it points to the body of the (colon) word, and sets IP to that value. Then it continues with NEXT, which begins executing the words in the body of the (parent) colon word. Note that W points to the execution token of the current word, so the next cell points to the parameter field (body) of the forth word.

push IP
IP <- W+1
JMP NEXT

EXIT

The code for EXIT (aka UNNEST) is the return from a subroutine. It is defined in the forth word EXIT in the dictionary. It reads the IP from the return stack and jumps to NEXT.

pop IP
JMP NEXT

Text Interpreter

The interpreter is a line based command interpreter. It based upon :REFILL to acquire the next line of characters, located at a position SOURCE points to. While processing the line, the pointer >IN is adjusted accordingly. Both words REFILL and SOURCE are USER based deferred words which allows to use any input source on a thread specific level. The interpreter itself does not use any static buffers or variables (>IN is a USER variable as well).

A given string is handled by INTERPRET which splits it into whitespace delimited words. Every word is processed using a list of recognizers. Processing ends either when the string end is reached or an exception occurs.

SOURCE and REFILL

SOURCE provides an addr/len string pair that does not change during processing. The task of REFILL is to fill the string buffer, SOURCE points to when finished.

There is one default input source: The terminal input buffer. This buffer gets filled with REFILL-TIB that reads from the serial input buffers (KEY). SOURCE points to the Terminal Input Buffer itself. Another input source are plain strings, used by EVALUATE.

Recognizer

Recognizer are a part of the text (command) interpreter. They are responsible for analyzing a single word. The result consists of two elements: The actual data (if any) and an object like identifier connected with certain methods.

digraph Recognizer {
"Interpret" -> "Get Next Word"
"Get Next Word" -> "Recognize" [label="Got one"]
"Get Next Word" -> "End" [label="No More Words"];
"Recognize" -> "Check State"
"Check State" -> "Compile" [label="Compile"];
"Check State" -> "Execute" [label="Interpret"];
"Compile" -> "Get Next Word"
"Execute" -> "Get Next Word"
}

The Forth text interpreter reads from the input source and splits it into whitespace delimited words. Each word is fed into a list of actions which parse it. If the parsing is successful (e.g. it is a number or a word from the dictionary) the recognizer leaves the data and an method table to deal with it. Depending on the interpreter state one of the methods is executed to finally process the data. The first method is called in interpreter state. It is usually a noop, since the recognizer has done all the work already.

The 2nd method is responsible to perform the compile time semantics. That usually means to write it into the dictioanary or to execute immediate words.

The third method is used by :command`postpone` to compile the compilation semantics. It honors the immediate flags as well.

Recognize is an iteration over a recognizer stack until the first parsing methods returns something different than DT:NULL. If the recognizer stack is exhausted without a match, the DT:NULL return value is generated. The string location that is passed to the parse actions is preserved and is restored for every iteration cycle.

digraph Recognize {
"Get Recognizer Stack" -> "Rec-Stack Exhausted?"
"Rec-Stack Exhausted?" -> "RECTYPE-NULL"  [label="Yes"]
"Rec-Stack Exhausted?" -> "Call Parse Action"  [label="Consume Rec-TOS"]
"Call Parse Action" -> "Rec-Stack Exhausted?" [label="RECTYPE-NULL"]
"Call Parse Action" -> "End" [label="Success"]
"RECTYPE-NULL" -> "End"
}

A recognizer consists of a few words that work together. To ease maintenance, a naming convention is used: The recognizer itself is named with the prefix rec-. The method table name gets the prefix rectype- followed by the same name as the recognizer.

POSTPONE serialises the parsed data as literals and adds the compile action from the method table. This an almost generic operation, it depends only on the number of cells from the parsing actions.

Recognizer Stack

The interpreter uses a default stack of recognizers. It is managed with the words get-recognizers and set-recognizers.

The entries in the list are called in order until the first one returns a different result but RECTYPE-NULL. If the list is exhausted and no one succeeds, the RECTYPE-NULL is delivered nevertheless and leads to the error reactions.

The standard recognizer stack is defined as follows

: default-recs
  ['] rec:intnum ['] rec:word
  2 forth-recognizer set-recognizers
;

The standard word marker resets the recognizer list as well.

INTERPRET

The interpreter is responsible to split the source into words and to call the recognizers. It also maintains the state.

: interpret
  begin
    parse-name ?dup if drop exit then
    forth-recognizer recognize ( addr len -- i*x r:table )
    state @ if i-cell+ then \ get compile time action
    @i execute ?stack
  again
;

recognize always returns a valid method table. If no recognizer succeeds, the RECTYPE-NULL is returned with the addr/len of the unknown-to-handle word.

API

Every recognizer has a method table for the interpreter to handle the data and a word to check (and convert) whether a string matches the criteria for a certain data type.

\ order is important!
:noname ... ;  \ interpret action
:noname ... ;  \ compile action
:noname ... ;  \ postpone action
rectype: rectype-foo

: rec:foo ( addr len -- i*x rectype-foo | RECTYPE-NULL ) ... ;

The word rec-foo is the actual parsing action of the recognizer. It analyzes the string it gets. There are two results possible: Either the word is recognized and the address of the data token is returned or the NULL data token is used which is actually a predefined method table named RECTYPE-NULL.

The calling parameters to rec-foo are the address and the length of a word in RAM. The recognizer must not change it. The result (i*x) is the parsed and converted data and the method table to deal with it.

There is a standard method table that does not require additional data (i*x is empty) and which is used to communicate the “not-recognized” information: RECTYPE-NULL. Its method table entries throw the exception -13 if called.

Other pre-defined method tables are rectype-num to deal with single cell numeric data, rectype-dnum to work with double cell numerics and rectype-xt to execute, compile and postpone execution tokens XT from the dictionary.

The words in the method tables get the output of the recognizer as input on the data stack. They are excpected to consume them during their work.

Default (NULL)

This is a special system level recognizer. It is never called actually but its data token (RECTYPE-NULL) is used as both a error flag and for the final error actions. Its methods get the addr/len of a single word. They consume it by printing the string and throwing an exception when called. The effect is to get back to the command prompt if catched inside the quit loop.

:noname type -13 throw ; dup dup
rectype: RECTYPE-NULL

\ this definition is never called actually
: rec-null ( addr len -- rectype-null)
  2drop rectype-null
;

NUMBER

The number recognizer identifies numeric data in both single and double precision. Depending on the actual data width, two different data tokens are returned.

The postpone action follows the standard definitions by not allowing to postpone numbers. Instead the number is printed and an exception is thrown.

' noop
' literal
:noname . -48 throw ; \ subject to disput
recognizer: rectype-num

' noop
' 2literal
:noname d. -48 throw ; \ subject to dispute
recognizer: rectype-dnum

: rec:intnum ( addr len -- n rectype-num | d rectype-dnum | rectype-null )
  number if
   1 = if rectype-num else rectype-dnum then
  else
    rectype-null
  then
;

FIND

This recognizer tries to find the word in the dictionary. If sucessful, the execution token and the flags are returned. The data token contains words to execute and correctly deal with immediate words for compiling and postponing.

( XT flags -- )
:noname drop execute ;
:noname 0> if compile, else execute then ;
:noname 0> if postpone [compile] then , ;
recognizer: rectype-xt

: rec-find ( addr len -- XT flags rectype-xt | rectype-null )
  find-name ?dup if
    rectype-xt
  else
    rectype-null
  then
;

Multitasking

amforth does not implement multitasking directly. It provides the basic functionality however. Within IO words the deferred word PAUSE is called whenever possible. This word is initialized to do nothing (NOOP).

Exceptions

Amforth uses and supports exceptions as specified in the ANS wordset. It provides the CATCH and THROW commands. The outermost catch frame is located at the interpreter level in the word QUIT. If an exception with a negative value is catched, QUIT will print a message with this number and and re-start itself. Positive values silently restart QUIT.

The next table lists the exceptions, amforth uses itself.

Exception Meaning Thrown in
-1 silent abort ABORT
-2 abort with message ABORT”
-4 stack underflow ?STACK
-13 undefined word rec-notfound, tick
-16 Invalid word (create)
-50 search order exhausted SET-ORDER

Memory Allocation

The ANS 94 standard defines three major data regions: name space, code space and data space. The amforth system architecture maps these memory types to the built-in ones: Flash, RAM and (if available) EEPROM. These three memory types have their own address space independently from the others. Amforth does not unify these address spaces into one.

Amforth uses the flash memory as the location for all standard data spaces: name, code and data space. Contrary to the standard some words that should operate on the data space use RAM adresses instead. These words are HERE, @ (fetch), ! (store) and simimliar. Similiarly the so called transient regions are in RAM as well.

Other words like , (comma) operate on the flash address and thus directly in the dictionary.

User Area

The User Area is a special RAM storage area. It contains the USER variables and the User deferred definitions. Access is based upon the value of the user pointer UP. It can be changed with the word UP! and read with UP@ . The UP itself is stored in a register pair.

The size of the user area is determined by the size the system itself uses plus a configurable number at compile time. For self defined tasks this user supplied number can be changed for task local variables.

The first USER area is located at the first data address (usually RAMSTART).

Address offset (bytes) Purpose
0 Multitasker Status
2 Multitasker Follower
4 RP0
6 SP0
8 SP (used by multitasker)
10 HANDLER (exception handling)
12 BASE (number conversion)

The AVR8 and MSP430 support deferred words based in the USER area.

14 EMIT (deferred)
16 EMIT? (deferred)
18 KEY (deferred)
20 KEY? (deferred)
22 SOURCE (deferred)
24 >IN
26 REFILL (deferred)

The command line prompt can be changed with the following defers. More information is in the recipe Prompts

28 .OK (deferred)
30 .ERROR (deferred)
32 .READY (deferred)
34 .INPUT (deferred)

The User Area is used to provide task local information. Without an active multitasker it contains the starting values for the stackpointers, the deferred words for terminal IO, the BASE variable and the exception handler.

The multitasker uses the first 2 cells to store the status and the link to the next entry in the task list. In that situation the user area is/can be seen as the task control block.

The size available to application programs is determined at compile time. This size is set to 0 initially, can be changed in the application master file.