Recognizer¶
The Forth text interpreter is able to work with numbers and command words. Its main purpose is to transform the text representation into a format closer to the system level and deal with them. Numbers are converted to their binary form for the data stack, command words are found in the dictionary and are further dealt with their execution tokens (and header flags).
In standard Forth there is no easy way to add new data types to the text
interpreter and to associate actions with them for the different interpreter
states. For example there are no native string literals. They are mimicked
by using a command word (s"
).
A recognizer fills this gap. It consists of two major parts: A word which does the parsing and converting. And a group of three methods for dealing with the data, the parsing word produces. These methods are used in interpret and compile state, and to postpone the data in colon definitions.
Amforth has recognizers for dealing with numbers and words from the dictionary
built-in. To create and manage more recognizers, the generic words
get/set-order
are used.
The word rectype:
takes three execution tokens and defines the method table.
The word to parse the input stream takes a string as input and leaves either
the method table rectype-null
(and no further data) or some data together with the
method table defined with rectype:
. The interpreter takes care of the rest.
It is possible to modify >in
inside the parsing word if the data contains
whitespace. Debugging such words can be tricky however.
String Literals¶
A string is delimited by two "
symbols. The first one starts
the string and the next one is the end of it. Everything in between
is the string content. A string is denoted by its start address and
its length. When compiling, the string needs to copied to the
dictionary together with a runtime action.
Since a string can contain whitespace, the parsing word needs to deal
with >in
. The string address and length is valid for the lifetime
of the SOURCE buffer only, a refill
will change the content.
' noop
' sliteral
:noname type -48 throw ;
rectype: rectype-string
: rec-string ( addr len -- )
over c@ [char] " <> if 2drop rectype-null exit then
negate 1+ >in +! drop \ expand parse area
[char] " parse \ get trailing delimiter
-1 /string \ remove limiter
rectype-string
;
' rec-string forth-recognizer get-stack 1+ forth-recognizer set-stack
The first line is simply the method table definition. The first two methods are already defined in amforth so nothing special here. The third method is called when the data is beeing postponed. For now, a string cannot be postponed, which would essentially lead to a string copy from the defining word to the new one. Instead an exception -48 is thrown.
The rec-string definition is more complex. The first line
over c@ [char] " <> if 2drop rectype-null exit then
is the check whether the current word start with a "
character.
If it does not, the two arguments are dropped and the special
method table rectype-null
is returned.
If the first character is a "
the main task is to find the delimiting
next "
. Since the >in
needs to be set to the location of this
character as well, we use the word parse
which does this work for us.
negate 1+ >in +! drop \ reset parse area to SOURCE
This line re-adjusts the parsing area to the beginning of the word inside SOURCE. The code
[char] " parse \ get trailing delimiter
scans the whole input for the delimiting "
and returns it. Finally some address
cosmetics has to be done to include the very first character as well.
Finally the rectype-string
method table is returned together with the string itself.
The last command adds the string recognizer to the list of the recognizers the
interpreter uses and activates it this way. Now we can enter strings as native
data without the s"
command.
> "foo" type
foo ok
> " foo" type
foo ok
> " foo" type
foo ok
> " foo" type
foo ok
> " foo bar baz " type
foo bar baz ok
> : test " foo bar " itype ;
ok
> test
foo bar ok
>