The Notation Editor - Specification of the Parser

A few words about the lexical scanner (aka tokenizer):

Whitespace is used to delimit tokens. If the keywords «A», «B» and «AB» are defined, and the scanner encounters the character sequence «A B», then this is recognised as the token «A» followed by the token «B».

The scanner becomes greedy, if no whitespace is present between a sequence of characters. It tries to assign as many characters to a token as possible. i.e. if the keywords «A», «B» and «AB» are defined, and the scanner encounters the character sequence «ABB», then this is recognised as the token «AB» followed by the token «B».

The scanner also automatically recognises comments. Two types of comments are supported: C-style line end comments starting with two slashes and ending with a new-line character, and C++-style multi-line comments starting with slash and star and ending with star and slash.

Formal specification of the grammar:

The following productions written using EBNF ISO/IEC 1477 show the formal specification of the grammar used to build the parser.

The grammar shown here is LALR(1). That is, all productions can be determined by a Look Ahead Left Recursive algorithm, which uses a look ahead of only one (1) token.

Script = {Expression} ;

Expression = StmtDelimiter | ({Prefix} , Statement , {Suffix}) ;

StmtDelimiter = Keyword ;
Prefix = PrefixConjugator | PrefixCommutator | PrefixInvertor | PrefixRepetitor | PrefixReflector ;
Statement = Twist | Macro | (GroupingBegin | CngrPrefix | CmtrPrefix) Grouping | PermBegin Permutation ;
Suffix = SuffixConjugator | SuffixCommutator | SuffixInvertor | SuffixRepetitor | SuffixReflector ;

PrefixConjugator = SuffixConjugator = (CngrBegin , Expression , {Expression} , CngrEnd) | CngrTransformation ;
PrefixCommutator = SuffixCommutator = (CmtrBegin , Expression , {Expression} , CmtrEnd) | CmtrTransformation ;
PrefixInvertor = SuffixInvertor = Keyword ;
PrefixRepetitor = SuffixRepetitor = [RptrBegin] , Integer , [RptrEnd] ;
PrefixReflector = SuffixReflector = Keyword ;
Twist = Identifier ;
Macro = Identifier ;
Grouping = [(HeaderConjugator | HeaderCommutator) , Expression , {Expression} , (GroupingEnd | CngrEnd | CmtrEnd) ;
HeaderConjugator = Expression , {Expression} , CngrDelimiter ;
HeaderCommutator = Expression , {Expression} , CmtrDelimiter ;
Permutation = [ (SidePerms| EdgePerms| CornerPerms) ], PermEnd ;
Keyword = Character , {Character} ;
Character = ? any non-white space character specified in the Unicode character set ? ;

CmtrBegin = Keyword ;
CmtrEnd = Keyword ;
CmtrDelimiter = Keyword ;
CngrBegin = Keyword ;
CngrEnd = Keyword ;
CngrDelimiter = Keyword ;
RptrBegin = Keyword ;
Integer = Digit , {Digit} ;
Digit = '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' ;
RptrEnd = Keyword ;
Identifier = Keyword ;
GroupingBegin = Keyword ;
GroupingEnd = Keyword ;
PermBegin = Keyword ;
Sign = Identifier ;
SidePerms = [Sign] , SidePerm , {PrmDelimiter , [Sign] , SidePerm} ;
EdgePerms = [Sign] , EdgePerm , {PrmDelimiter , EdgePerm} ;
CornerPerms = [Sign] , CornerPerm , {PrmDelimiter , CornerPerm} ;
PermEnd = Keyword ;

SidePerm = Face ;
PrmDelimiter = Keyword ;
EdgePerm = Face Face ;
CornerPerm = Face Face Face ;

Face = Identifier ;