Copyright 1997 by Robert M. Keller, all rights reserved.
Tautology checker:
(a + b)' = a' * b' ?
a + a'*b = a + b ?
a + a* b = a * b
Decomposition:
An applet demonstrating a tautology checker may
be found at
http://www.cs.hmc.edu/~keller/javaExamples/taut.
Type logical expression into the left text area. The applet will
tell you whether or not it is a tautology. Identifiers in this
version of the tautology checker are limited to single characters.
The operators are:
Parentheses may be used.
a + a'*b = a + b
The process of producing such a tree from an input sequence of characters is called
parsing.
An applet demonstrating parsing to create trees may be found at http://www.cs.hmc.edu/~keller/javaExamples/Parser. Enter arithmetic expressions involving multi-character variables, +, *, and parentheses and watch the tree being constructed.
Please note that syntax trees such as this are usually just an intermediate form used to achieve some other end, such as:
In a grammar, there are three types of symbols:
a
a + b
a + b + c
a + a + d
....
Here there are two syntactic categories:
V: identifier or "variable"
a, b, c, ...., z (say)
A: additive expressions themselves
Because it is additive expresions in which we are interested, we say that A is the root category.
For the additive expressions, two productions suffice:
(i) A -> V { '+' V}
read
"an A is a V followed by 0 or more occurrences of '+' then a V."
or more verbosely
"an additive expression is a variable followed by 0 or more occurrences of '+' then a variable".
(ii) V -> 'a' | 'b' | 'c' | .... | 'z'
read
"a V is an 'a' or a 'b' or a 'c' or .... or a 'z'
Note: .... is not actually part of the grammar, but is just meant to abbreviate the letters between 'c' and 'z'.
analogous to => (expression replacement, in rex)
(i) A -> V { '+' V}
(ii) V -> 'a' | 'b' | 'c' | .... | 'z'
Start with the root symbol
A
Apply(i):
V '+' V '+' V
2 occurrences of '+' V
Apply(ii):
'a' '+' 'b' '+' 'c'
i.e.
"a+b+c"
is the resulting string
Example productions
(i) A -> V { '+' V}
(ii) V -> 'a' | 'b' | 'c' | .... | 'z'
Form one "function" (or procedure or method, depending on setup) for each auxiliary symbol (designating syntactic categories).
A
V
Each function is responsible for recognizing that category in an input string.
The input string should be scanned
left-to-right.
(i) A -> V { '+' V}
says
"To scan an A:
scan a V (if no V, fail).
Repeat until there is no '+':
If there is next a '+', scan another V (if none, fail)."
(ii) V -> 'a' | 'b' | 'c' | .... | 'z'
says
"To scan a V:
see if there is an 'a' or a 'b' or a 'c' or .....
If none, fail."
Our parse functions will return Objects, either
success:
a String, representing a variable (a leaf of tree)
OR
a List, representing a non-leaf tree
failure:
a ParseFailure object
We will construct the lists using Poly; they will print as S expressions.
// PARSE FUNCTION for A
-> V { '+' V }
Object A()
{
Object result; Object V1 = V(); if( isFailure(V1) ) return failure; result = V1; while( peek() == '+' ) { nextChar(); Object V2 = V(); if( isFailure(V2) ) return failure; result = Poly.List.list("+", result, V2); } return result; }
Explanation: Each parse function (A or V in this case) scans characters left-to-right from the input stream, access of which is not shown explicitly. It returns either failure or a tree, which can be either a single leaf, represented by a string, or other, represented by a Poly.List.
In A(), V() is first called, corresponding to the first syntactic category on the right-hand side of the production A -> V { '+' V }. If the result is a failure, then the call to A() is a failure. Otherwise, the result value is started with the value of V(). The program then checks to see whether the next character is a '+'. If not, then the production for A is fulfilled, since {....} allows 0 occurrences of what is inside. In this case, the result is just returned. However, if there is a '+', then we absorb that '+' by calling nextChar() (the call to peek() only looked at the character, but did not take it from the input stream). We then call V() again. If that is a failure, the call to A must fail, since we have a '+' not followed by a V. If it succeeds, we build up the tree by forming a new tree with "+" as the root, the former result as the left sub-tree, and the value of V() as the right sub-tree. This build up continues as long as there are '+'s in the input stream.
// PARSE FUNCTION for V -> a|b|c|d|e|f|g|h|i|j|k|l|m|n|o|p|q|r|s|t|u|v|w|x|y|z Object V() { if( isVar(peek()) ) { return (new StringBuffer(1).append(nextChar())).toString(); } return failure; }
In V() we look at the next character and if it qualifies as a variable, we make a String out of it, using the Java incantation shown:
StringBuffer is a class representing a modifiable string (objects in class String cannot be modified once created).
We create a StringBuffer (1 character long), append the next character of the input to it, and return the contents of the StringBuffer as a String.
If the next character is not a variable, then we return failure to so indicate.
The best way to check whether a character is any one of a specific set is to use a switch statement as shown.
// isVar indicates whether its argument is a variable boolean isVar(char c) { switch( c ) { case 'a': case 'b': case 'c': case 'd': case 'e': case 'f': case 'g': case 'h': case 'i': case 'j': case 'k': case 'l': case 'm': case 'n': case 'o': case 'p': case 'q': case 'r': case 's': case 't': case 'u': case 'v': case 'w': case 'x': case 'y': case 'z': return true; default: return false; } }
Certain sets of characters are pre-defined in the libraries. If we wanted to allow any "letter" as a variable, we could use, in place of isVar(c):
Character.isJavaLetter(c)
The call to A() is embedded in a method parse()
which checks to see if there is any garbage remaining on the input
line after A() is called. If so, it informs the user, but still
regards the result returned by A() as something useable if it is not
an error.
Object parse() // TOP LEVEL: parse with check for residual input { Object result = A(); skipWhitespace(); if( position < lastPosition ) { System.out.print("*** Residual characters after input: "); while( !eof ) { char c = nextChar(); System.out.print(c); } System.out.println(); } return result; }
The complete program, which includes the support methods such as peek() and nextChar() is in http://www.cs.hmc.edu/~keller/javaExamples/Parse/additive.java.
We use the productions to enforce precedence, e.g. we want * to take precedence over + (or bind more tightly than +).
In the syntax tree, this means that we want + to be closer to the root, i.e. we want
a + b * c
to parse as the tree on the left, not the one on the right below:
The syntactic categories in this case are:
A: Additive expressions
M: Multiplicative expressions
V: Variables
The corresponding Productions are:
A -> M { '+' M }
M -> V { '*' V }
V -> 'a' | 'b' | 'c' | .... | 'z'
Coding of a parser for additive and
multiplicative expressions:
Note the analogy between multiplicative and
additive in the current grammar:
A is to M
as
M is to V
Here is the code for the parse functions:
// PARSE FUNCTION for A -> V { '+' V } Object A() { Object result; Object M1 = M(); if( isFailure(M1) ) return failure; result = M1; while( peek() == '+' ) { nextChar(); Object M2 = M(); if( isFailure(M2) ) return failure; result = Poly.List.list("+", result, M2); } return result; } // PARSE FUNCTION for M -> V { '*' V } Object M() { Object result; Object V1 = V(); if( isFailure(V1) ) return failure; result = V1; while( peek() == '*' ) { nextChar(); Object V2 = V(); if( isFailure(V2) ) return failure; result = Poly.List.list("*", result, V2); } return result; }
The complete program, which includes the support methods such as peek() and nextChar() is in http://www.cs.hmc.edu/~keller/javaExamples/Parse/addMult.java.
A -> M { '+' M }
is left grouping
A -> { M '+' } M
would be right grouping.
("grouping" is sometimes called "associativity" but really is independent of whether the operator is an associative operator or not.
To see that
A -> M { '+' M }
is really left grouping and not right, at might be first inferred, consider the parsing of an expression
a + b + c + d
The a is first parsed as an M (after first parsing it as a V). Then the b is parsed and added to the first, then the c is parsed and added to that, and so on.
( expression )
where the expression in the group should function as if a single variable.
space: ' '
tab: '\t'
form-feed '\f' (control-L)
Whitespace is usually not indicated explicitly in grammars, although it could be:
W -> ' ' | '\t' | '\f'
is a production for a single whitespace character. Thus
{ W }
denotes any number of whitespace characters, e.g.
A -> {W}V {W} {'+' {W} V}
would allow whitespace to be inserted before or after any variable.
In most languages, whitespace is allowed between most syntactic units, except within identifiers.
An exception is FORTRAN where
DO 10 I
could be a variable DO10I or the start of a DO statement.
[ .... ]
means 0 or 1 occurence of ...., i.e. that .... is optional.
Example:
U is unsigned numerals
N is optionally signed numerals
D is a digit
Then the productions are:
N -> ['+' | '-'] U
U -> D {D}
Give a grammar for the floating-point numerals, e.g.
123.
.456
123.456
1e-10
123.456e10
etc.
Write the grammar and parser for the tautology checker. The operator symbol precedence is:
' (not) tightest
* (and)
+ (or)
> (implies)
= (equals, if-and-only-if)
Make * and + associative and > and = non-associative (i.e. a>b>c is not allowed; it must be either a>(b>c) or (a>b)>c).
Parentheses are allowed. Variables are single letters. Constants are 0 and 1.
Sometimes { } is replaced by recursive productions:
{A}
is the same as
B
where
B -> empty string
B -> A B
Sometimes superscript * denotes { }, e.g.
(R | S)*
is the same as
{R | S}