Expression Parser

This expression parser was a little pet-project of mine in 2000. It understands expressions (see explanation below). I've been using it for years to generate small reports.

Syntax

It is now Sunday 30-7-2017 at 14:20:50
Give an expression ('help' for help): GrIdentifier(help)
help = These expressions can do the following:

Types
- Integer, real and string expression types
- Boolean, date and time types are simulated
- Integer and real literals are denoted in decimal notation
- Real literals are denoted in fixed-point notation
- String literals are denoted between double-quotes ( " )
- Within string literals, the following can be used:
    - all characters, except double-quote ( " )
    - \t is interpreted as a horizontal tab
    - \n is interpreted as a newline
    - \" is interpreted as a double-quote character
    - \\ is interpreted as a backslash character
- Array literals are denoted between brackets ( [] )
- Integer-to-real, integer-to-string, real-to-string conversions

Operators and functions
- Arithmetic operators ( ** * / div mod % + - ) with their normal precedence and associativity
- Comparison operators ( == != <> < <= > >= ) with left-to-right associativity
- Boolean operators ( not ! and && or || ) with left-to-right associativity
- Assignment operators ( = ) with a right-to-left associativity
- String concatenation operator ( + ) with left-to-right associativity
- Array Indexer ( [] ) with left-to-right associativity
- Subexpressions (denoted between parenthesis)
- Compound (sub)expressions: expressions separated by ( ; ). The last value is the final result. Example: (1+2; 1)
- The following date/time functions are supported:
    - time(expr) returns a time string of the expression (expr in seconds)
    - date(expr) returns a date string of the expression (expr in days)
    - timeval(expr) returns number of seconds for "h:m:s" formated expression string
    - dateval(expr) returns number of days for "d-m-y" formated expression string
    The above functions use current date/time if argument is left empty.
    - datepart(dateval, part): part is one of:
      yyyy: 4-digit year
      yy: 2-digit year
      y: year without zero padding
      mmmm: full month-name
      mmm: abbreviated month-name
      mm: 2-digit month number
      m: month number without zero padding
      DDDD: full day-of-the-week name
      DDD: abbreviated day-of-the-week name
      D: day-of-the-week number (1=Monday, 7=Sunday, ISO-8601)
      dd: 2-digit day-of-the-month number
      d: day-of-the-month number
      w: week-of-the-year (1=week 1, ISO-8601)
    - timepart(timeval, part): Part is one of:
      HH: 2-digit hour in 24-hour format
      H: hour in 24-hour format
      hh: 2-digit hour in 12-hour format
      h: hour in 12-hour format
      ap: am or pm
      mm: 2-digit minutes
      m: minutes
      ss: 2-digit seconds
      s: seconds
    There are predefined special variables "__monthnameN" (N=1..12) and "__daynameN" (N=1..7) that contain the names to use, like:
      __monthname1="January;Jan"; __monthname2="February;Feb"; ...
      __dayname1="Monday;Mon"; __dayname2="Tuesday;Tue"; ...
- The following other functions are supported:
    - if(condition, then-expr, else-expr) returns the value of the then-expr if condition evaluates to non-zero, else-expr otherwise
    - for(initialize, condition, step) body executes expression while condition is true and returns the addition of all bodies
    - eval(expression) calculates the expression, evaluates that and returns that value
    - parsetree(compound expression) doesn't calculate the expression, but displays the parsetree
    - interval(argument0, ..., argumentN-1) returns the index (0..N-1) of the first argument that is true or N if none are true
    - table(expression0, ..., expressionN-1) creates an N-dimensional dictionary where the keys are determined by the expressions
      The expressions act as lambda expressions with identifier _ as placeholder for the actual argument.
      The expressions should return an integer, real or string. A real will be rounded to the nearest integer.
      An expression consisting of just _ is an ordinary dictionary, i.e. the actual argument functions as key.
      The _ expression is the default.
      The interval() function can be used to split a continuous value into discrete intervals.
    - size(array) returns the sizes of an array.
    - keys(array) returns the keys of an array.
    - sudoku_randomgame(difficulty): Generate and return a 9x9 Sudoku game. Default difficulty is 15.
    - sudoku_hint(game): Deduce one cell in the Sudoku game and return result.
    - sudoku_solve(game): Solve the entire Sudoku game and return result.
- Operator precedence is:
    1: functions
    2: subexpressions: (...)
    3: arrays: []
    4: unary operators: not ! + -
    5: power: **
    6: multiplication: * / div mod %
    7: addition: + -
    8: relative: == != <> < <= > >=
    9: boolean: and && or ||
   10: assignment: =
   11: compound expression: ;

Arrays
- Array literals are denoted between brackets: [1,2,3]
- Array elements can have any type, even arrays: [1,"a",1.5,[1,2]] is valid
- Array indexes are denoted between brackets: a[1] is valid
- Array indexes should be positive integers: a[1], but not a[1.5]
- Array indexes start at 0: a=[1,2,3] assigns a[0]==1, a[1]==2 and a[2]==3
- Arrays can have multiple dimensions: a[3,5] is a 2-dimensional array
- Array elements are created automatically: a[2,2]=1 creates a 3x3 matrix with all elements but a[2,2] uninitialized
- Array slices can be defined by leaving out indexes: a[0] is 1-dimensional array with elements of first row and a[,0] is 1-dimensional array with elements of first column
- Array slices can be assigned: a[0]=[1,2,3,4] assign first row and increase a to 3x4 matrix
- Arrays can be compared for equality and inequality ( == != )
- Copy array data into table while keeping table-expressions: table[]=array

Miscellaneous
- Value can be stored in global variables
- There are predefined variables that can be used:
    - help; evaluates to this help page
    - vars; evaluates to a list of all variables
    - __precision; can be set to an integer, used to render the number of decimal positions of reals
    - __monthnameN; names of the months to use (see above)
    - __daynameN; names of the days to use (see above)
- Comments are text between /* and */ and are ignored

Examples
    Loop that simulates an array:
  for (i=0, i<10, i=i+1) eval("variable"+i+"=i*10")
    Use of a table:
  a=table(_, if(_>=0, _, -_), interval(_<10, _<20, _==20));
  a["A", 1] = [ "less than 10", "less than 20", "equal to 20", "greater than 20" ];
  a["A", 1, 15];  /* "less than 20" */
  a["A", -1, 20];  /* "equal to 20" */
    Conditional string concatenation:
  if (totalTime!=0, "total time is: "+time(totalTime)+"\n", "")
    Faculty: Loop x times and multiply. Void each term and add final result:
  x=5; for (n=(fac=1)+1, n<=x, n=n+1) fac=fac*n; fac
    Number e: Iterate 15 times and add the inverse of the faculty to the total:
  __precision=8; for (e=n=0, n<15, n=n+1) (for (m=(fac=1)+1, m<=n, m=m+1) fac=fac*m; 1/fac);
    Friday 13th for the next 10 years:
  for (end=(d=dateval();d=d-datepart(d,"D")+5)+365*10,d<=end,d=d+7) if(datepart(d,"d")==13, date(d)+"\n", "");

Give an expression ('help' for help):

Download

These are the available binaries:

Development

Below are some ideas that can be added in the future

Additions:
* Logical operators:
  - xor, ^: +-priority
* for() can be extended with the following 2 syntaxes:
  - for(string): The string gets parsed at runtime and should contain an ExpressionList with 3 elements. This can be conveniently used to register iteration as just one variable string.
  - for(item in array): Should this iterate through all elements of array or just the first dimension, so that item is a slice? We could also loop the size() and keys().
* Function definitions (user functions) (for this, the IdentTable probably needs to be scoped).
* Currently, external variables (variable values set by the caller) are readonly. This could be made writable as well (this will need a change in the API). If external variables are writable, they can be used to simulate function calls (much like hardware registers). Writing a value (possibly a concatenation of values) triggers the call and reading it yields the result.
However, a convenient way to register external functions would provide this functionality as well.
These mechanisms can also be used to read/write external arrays.
* In addition to the parsetree() function, also make an xmltree() function that generates the same tree, but in XML form. (In strings, escape all control characters and non-ASCII with &#dddd;)

* When saving the parse-tree in the client, we get a problem with the global IdentTable. This can be solved in different ways, e.g.:
- Make functions that a client can call to enter and exit a scope. One could also make functions to (re-)initialize the global table to the initial state.
- Automatically enter a scope in a compound subexpression. Also the {} could be used for that. This means that the expressions should all be inside a big compound expression (as a block).
- Don't use the global IdentTable, but give a table with each evaluation.
* Take care that the evaluations are serialized (e.g. by mutexes), because results are kept in the parse-tree and should not be shared between threads.

** To overcome the above two points, the best course would be to separate the function (parse-tree) and data (node-value). These node-values can be maintained in an object, which acts like a key-value dictionary. The keys can be ids that identify the nodes in the parse-tree.
The (root-)IdentTable is still global (due to the global scope of the variables), but access to it is provided through this data-object. If necessary, locking must be performed by either the data-object or the IdentTable.
The execution will now be as follows:
1) Parse the expression.
2) Initialize an id generator visitor and send that through the parse-tree.
3) Initialize a data-object with a reference to the IdentTable.
4) Execute the parse-tree with the data-object as input. Each node can access the value of its inputs via their ids in the data-object.
The data-object could contain an hierarchical IdentTable. I.e. a global part and a local part where the local part is local to this data-object and is only used for this evaluation. There are two additional issues with this. Firstly, the update of existing variables. This should always be in the level where the original variable was found. Secondly, the creation of new variables. This could be 'always local' or 'always global' or 'dependent on a keyword, like 'var' (change default global to local) in JavaScript or 'global' (change default local to global) in PHP'.
What about a three-level IdentTable: global, thread and function/block (currently, function/block does not yet exist)?
It might be more convenient to use prefixes, like vim does (g: for global, etc).