Previous Next Chapter

Parsing Examples

Parsing by Tokenization

Computer programs frequently split a string into its component words or tokens. This is accomplished with a template consisting entirely of variables (targets).

/*Assume "hammer 1 each $600.00" was entered*/`
PULL item qty units cost .

In this example the input line from the PULL instruction is split into words and assigned to the variables in the template. The variable item receives the value "hammer", qty is set to "1", units is set to "each" and cost gets the value "$600.00". The final place holder (.) is given a null value, since there are only four words in the input. However, it forces the preceding variable cost to be given a tokenized value. If the place holder were omitted, the remainder of the parse string would be assigned to cost, which would then have a leading blank.

answer = "Only Amiga makes it possible."
DO forever
PARSE VAR answer first answer
/*Place first word into `first' and the rest into `answer'.*/
IF first =='' THEN LEAVE
/*Stop if there are no more words*/
SAY answer
END

The first word of a string is removed and the remainder is placed back in the string. The process continues until no more words are extracted. The output is:

Amiga makes it possible.
makes it possible.
it possible.
possible.

Parsing by Pattern

Pattern markers extract the desired fields. The "pattern" in this case is very simple - a single character - but could be an arbitrary string of any length. This form of parsing is useful whenever delimiter characters are present in the parse string.

/*Assume an argument string "12, 35.5,1" */
ARG hours `,' rate `,' withhold

The pattern is actually removed from the parse string when a match is found. If the parse string is scanned again from the beginning, the length and structure of the string may be different than at the start of the parsing process. The original source of the string, however, is never modified.

Parsing by Positional Markers

Parsing with positional markers is used whenever the files of interest are known to be in certain positions in a string.

/* Records look like: */
/* Start: 1-5 */
/* Length: 6-10 */
/* Name: @ (start,length)*/
PARSE value record with 1 start +5 length +5 =start name +length

The records being processed contain a variable length field. The starting position and length of the field are given in the first part of the record with a variable positional marker used to extract the desired field.

The "=start" sequence is an absolute marker whose value is the position placed in the variable start earlier in the scan. The ""length""sequence supplies the effective length of the field.

Multiple Templates

More than one template can be specified with an instruction by separating the templates with a comma. The ARG instruction (or PARSE UPPER ARG) accesses the argument strings provided when the program was called. Each template accesses the succeeding argument string. For example:

/*Assume arguments are (`one two',12,sort)*/
ARG first second,amount,action,option

The first template consists of the variables first and second, which are set to the values "one" and "two". In the next two templates, amount gets the value "12" and action is set to "SORT". The last template consists of the variable "option", which is set to the null string, since only three arguments were available.

When multiple templates are used with the EXTERNAL or PULL source options, each additional template requests an additional line of input from the user:

/*Read last, first, and middle names and ssn*/
PULL last `,' first middle,ssn

Two lines of input are read. The first input line is expected to have three words which are assigned to the variables "last", "first", and "middle": The first variable is followed by a comma. The entire second input line is assigned to the variable "ssn".

Multiple templates can be useful even with a source option that returns the identical parse string. If the first template included pattern markers that altered the parse string, the subsequent templates could still access the original string. Subsequent parse strings obtained from the VALUE source do not cause the expression to be re-evaluated, but only retrieve the prior result.

Top Previous Next Chapter