.. _Recognizer: ========== Recognizer ========== The Forth text interpreter is able to work with numbers and command words. Its main purpose is to transform the text representation into a format closer to the system level and deal with them. Numbers are converted to their binary form for the data stack, command words are found in the dictionary and are further dealt with their execution tokens (and header flags). In standard Forth there is no easy way to add new data types to the text interpreter and to associate actions with them for the different interpreter states. For example there are no native string literals. They are mimicked by using a command word (``s"``). A recognizer fills this gap. It consists of two major parts: A word which does the parsing and converting. And a group of three methods for dealing with the data, the parsing word produces. These methods are used in interpret and compile state, and to postpone the data in colon definitions. Amforth has recognizers for dealing with numbers and words from the dictionary built-in. To create and manage more recognizers, the generic words ``get/set-order`` are used. The word ``rectype:`` takes three execution tokens and defines the method table. The word to parse the input stream takes a string as input and leaves either the method table ``rectype-null`` (and no further data) or some data together with the method table defined with ``rectype:``. The interpreter takes care of the rest. It is possible to modify ``>in`` inside the parsing word if the data contains whitespace. Debugging such words can be tricky however. String Literals --------------- A string is delimited by two ``"`` symbols. The first one starts the string and the next one is the end of it. Everything in between is the string content. A string is denoted by its start address and its length. When compiling, the string needs to copied to the dictionary together with a runtime action. Since a string can contain whitespace, the parsing word needs to deal with ``>in``. The string address and length is valid for the lifetime of the SOURCE buffer only, a ``refill`` will change the content. .. code-block:: forth ' noop ' sliteral :noname type -48 throw ; rectype: rectype-string : rec-string ( addr len -- ) over c@ [char] " <> if 2drop rectype-null exit then negate 1+ >in +! drop \ expand parse area [char] " parse \ get trailing delimiter -1 /string \ remove limiter rectype-string ; ' rec-string forth-recognizer get-stack 1+ forth-recognizer set-stack The first line is simply the method table definition. The first two methods are already defined in amforth so nothing special here. The third method is called when the data is beeing postponed. For now, a string cannot be postponed, which would essentially lead to a string copy from the defining word to the new one. Instead an exception -48 is thrown. The rec-string definition is more complex. The first line .. code-block:: forth over c@ [char] " <> if 2drop rectype-null exit then is the check whether the current word start with a ``"`` character. If it does not, the two arguments are dropped and the special method table ``rectype-null`` is returned. If the first character is a ``"`` the main task is to find the delimiting next ``"``. Since the ``>in`` needs to be set to the location of this character as well, we use the word ``parse`` which does this work for us. .. code-block:: forth negate 1+ >in +! drop \ reset parse area to SOURCE This line re-adjusts the parsing area to the beginning of the word inside SOURCE. The code .. code-block:: forth [char] " parse \ get trailing delimiter scans the whole input for the delimiting ``"`` and returns it. Finally some address cosmetics has to be done to include the very first character as well. Finally the ``rectype-string`` method table is returned together with the string itself. The last command adds the string recognizer to the list of the recognizers the interpreter uses and activates it this way. Now we can enter strings as native data without the ``s"`` command. .. code-block:: console > "foo" type foo ok > " foo" type foo ok > " foo" type foo ok > " foo" type foo ok > " foo bar baz " type foo bar baz ok > : test " foo bar " itype ; ok > test foo bar ok >