Instead of giving a formal (and presumably rather boring) language specification, the most important and interesting aspects of MOSTflexiPL shall be introduced in the sequel with various examples, ranging from very simple operator definitions to rather advanced applications.
A simple operator such as the square operator mentioned in “Background and Motivation” can be defined as follows:
["n" : int] parameter list n "²" : int signature and result type { n * n } implementationSuch an operator declaration consists of a parameter list enclosed in square brackets, a signature given before the colon, a result type written afterwards, and an implementation enclosed in curly braces.
The signature is made up of parameters and string literals
and defines the operator's syntax,
i. e., the syntactic appearance of its applications:
Every parameter is a placeholder for an operand,
i. e., a subexpression of the corresponding type,
while a string literal (that might contain arbitrary characters)
defines a name of the operator
that must be written literally (but without the quotation marks)
in applications.
Therefore,
5²
and (2+3)²
will be legal applications of the square operator defined above,
because 5
and (2+3)
are subexpressions with type int
and ²
is the operator's name.
The parameter list consists of parameter declarations
(separated by semicolons if necessary),
which are actually simple operator declarations, too,
consisting (for the moment) only of a signature and a result type.
Therefore,
every occurrence of the parameter n
is actually an application of the nullary operator
whose signature is "n"
and whose result type is int
.
Finally,
the implementation is an arbitrary expression,
whose type must be equal to the result type
and that computes the value returned by an operator application.
For example,
the value of the expression (2+3)²
is computed
by initializing the parameter n
with the value of the corresponding operand (2+3)
(i. e., 5)
and afterwards evaluating the implementation n * n
,
yielding the int
value 25.
An operator |•|
returning the absolute value of its operand
(whose position is indicated by •
)
can be defined as follows:
["x" : int] "|" x "|" : int { if x >= 0 then x else −x end }Here, the implementation consists of an application of the predefined conditional operator
if • then • else • end
that returns either the value of its second or its third operand
depending on the truth value of its first operand.
Because the signature consists of a literal vertical bar,
the int
parameter x
,
and another literal vertical bar,
|0|
and |2−5|
are examples of applications of this operator.
To give another example,
the following declaration defines a •!
operator
that recursively computes the factorial of n
:
["n" : int] n "!" : int { if n <= 1 then 1 else (n−1)! * n end }Similar to recursion in other programming languages, it is possible to use the operator just being defined in its own implementation.
By combining the operators defined so far,
|2−5|!²
would be a legal expression
yielding the int
value 36.
The predefined left-associative infix operator •;•
denotes sequential composition of subexpressions,
i. e., it evaluates its operands from left to right
and returns the value of the second operand.
Because declarations are expressions, too, the semicolon can also be used to separate multiple successive declarations or to intermix declarations with expressions, e. g.:
["n" : int] n "²" : int { n * n }; ["x" : int] "|" x "|" : int { if x >= 0 then x else −x end }; ["n" : int] n "!" : int { if n <= 1 then 1 else (n−1)! * n end }; print |2−5|!²The above lines constitute a complete MOSTflexiPL program consisting of the three operator declarations introduced above and an application of the predefined
print•
operator
printing the value of its operand.
Because the semicolon operator is an infix operator that separates multiple subexpressions instead of a postfix operator terminating individual subexpressions (as in many other languages), there is no final semicolon at the end of the program.
The expression |2−5|!²
shown before
has been carefully chosen
as an example containing several different operators
– both predefined and user-defined ones –
while still possessing a unique interpretation.
Many other expressions involving these and other operators, however,
are ambiguous,
i. e., they have two or more possible interpretations.
For example,
2−5!
might either mean (2−5)!
or 2−(5!)
,
while −3²
might be interpreted as either (−3)²
or −(3²)
.
To rule out one of the possibilities in each case,
it can be placed in an exclude declaration as follows:
excl (2−5)! end; excl (−3)² endGenerally speaking, the operand inside
excl • end
is a prototypical expression
(i. e., the actual values of its leaf operands do not matter)
whose hierarchical structure as an operator tree
can only be constructed by using explicit parentheses.
Put differently,
if the same expression would be written without parentheses,
the compiler is not allowed to interpret it in the given way.
Because the operand of excl • end
might be any expression involving any number of parenthesized subexpressions,
it is possible to specify multiple exclude rules within a single declaration,
e. g.:
excl (0+0)²; (0−0)²; (0*0)²; (0/0)²; (0%0)²; (−0)² end; excl (0+0)!; (0−0)!; (0*0)!; (0/0)!; (0%0)!; (−0)! endAfter these declarations, every possible combination of the new square and factorial operators with any of the predefined arithmetic operators should have a unique interpretation, where the postfix operators bind more tightly than prefix and infix operators according to usual mathematical habits.
It should be noted
that many other expressions
do not require explicit exclude declarations to make them unique.
For example,
the only meaningful interpretation of |2−5|!²
is ((|(2−5)|)!)²
,
because the circumfix operator |•|
acts itself as a kind of parentheses,
while multiple postfix operators
can only be applied from left to right.
For expressions such as 0 < 3²
,
which might be interpreted
as either (0 < 3)²
or 0 < (3²)
from a purely syntactical point of view
(that is taken by most other languages),
the immediate consideration of typing rules
helps to rule out semantically meaningless interpretations
such as (0 < 3)²
in this example,
because the comparison operator •<•
has result type bool
,
but the square operator •²
requires an operand of type int
.
A constant is a nullary operator whose application always returns the same value. Its declaration consists of a signature (solely made up of string literals because there are no parameters), an optional result type, and an initialization, separated by a colon and an equals sign. If the result type is omitted, it is automatically deduced from the type of the initialization, e. g.:
"N" : int = 10; result type given explicitly "M" := N² result type deduced from initialization
If the initialization (and the equals sign) is omitted, the constant is initialized with an automatically generated unique value, e. g.:
"Color" : type; "red" : Color; "green" : Color; "blue" : ColorIn the first line,
Color
is defined as a unique constant of the predefined meta-type type
,
i. e., it becomes a new type.
In the subsequent lines,
red
, green
, and blue
are each defined as a unique constant of this new type Color
,
i. e., they can be used similar to enumeration values in other languages.
Viewed differently, unique constants of a particular type can also be used similar to objects in other languages, e. g.:
"Person" : type; "first" "person" : Person; "second" "person" : PersonAgain,
Person
denotes a new type,
while first person
and second person
(note that signatures might consist of multiple string literals)
denote unique instances of this type.
At the moment, however,
these instances are not very useful,
because Person
does not possess any attributes yet.
These will be defined later on in “Open Types.”
A variable is a constant whose type is a variable type
(denoted by a postfix question mark •?
)
and whose value is a unique mutable storage cell,
e. g.:
"x" : int?A variable's content is queried with a prefix question mark
?•
and changed with the assignment operator •=•
,
e. g.:
x = 10; assign 10 to the storage cell denoted by x x = ?x + 1 assign the value of x plus 1 (i. e., 11) to xBecause using the prefix question mark is rather tedious, there is an implicit type conversion that automatically converts a variable of type
T?
to its value of type T
on demand,
similar to the implicit L-value to R-value conversion
known from C and other languages.
Therefore,
x = ?x + 1
can be written more conveniently as x = x + 1
.
On the other hand, the explicit designation of variable types with the postfix question mark allows, for example, to pass variables as parameters and result values of operators and to clearly distinguish them from non-variable values, e. g.:
["x" : int?] "++" x : int? { x = x + 1; x }; ["x" : int?] x "++" : int { "xx" : int = x; ++x; xx }Similar to the increment operators of C, the prefix
++
operator increments the value of the variable x
and returns the variable itself
(in C it returns an L-value),
while the postfix operator also increments the variable's value,
but returns its original value xx
(in C it returns an R-value).
Therefore,
applying the prefix operator multiple times (e. g., ++ ++ x
)
would be type-correct,
because the type of the subexpression ++x
is again int?
,
while applying the postfix operator multiple times (e. g., x ++ ++
)
would be erroneous,
because the type of the subexpression x++
is int
while the operator requires an operand of type int?
.
Applying both operators at once, e. g., ++x++
,
would be uniquely interpreted as (++x)++
,
because ++(x++)
would not be type-correct
(even though a C compiler only tries to interpret it the latter way,
because postfix operators
generally bind more tightly than prefix operators there,
and therefore reports an error).
To give another example,
the following •↔•
operator
swaps the contents of two int
variables x
and y
and returns the second variable:
["x" : int?; "y" : int?] x "↔" y : int? { "z" : int = x; x = y; y = z; y }For simple applications such as
a ↔ b
,
the operator's result type and value is irrelevant.
But to allow chained applications
such as a ↔ b ↔ c ↔ d
,
it is necessary that the result type is again int?
.
And because the actual result value is the second variable,
the overall effect of such a chain
will be a circular right-shift of all variable values.
However,
to make chained applications of the •↔•
operator
syntactically unique,
the operator must be defined either left- or right-associative,
i. e., a ↔ b ↔ c ↔ d
should be either interpreted as ((a ↔ b) ↔ c) ↔ d
or as a ↔ (b ↔ (c ↔ d))
.
(Interestingly,
the kind of associativity does not influence the effect of chained applications.
To achieve a left- instead of a right-shift,
the operator had to return the first instead of the second variable.)
This can also be achieved with an exclude declaration,
e. g., to make the operator left-associative
by excluding the right-to-left grouping:
"x" : int?; "y" : int?; "z" : int?; excl x ↔ (y ↔ z) end
The swap operator •↔•
can easily be turned into a generic operator
that works for variables of any type T?
by extending it with a deduced parameter T
:
["T" : type; "x" : T?; "y" : T?] x "↔" y : T? { "z" : T = x; x = y; y = z; y }To express that the type of the parameters
x
and y
as well as the operator's result type
shall be T?
for some arbitrary type T
,
T
is defined beforehand as a parameter with type type
.
However,
in contrast to the other parameters
(which will be called explicit parameters now),
T
does not appear anywhere in the operator's signature,
which implies that its value will not be passed as an explicit operand
in applications of the operator.
Instead,
T
appears in the type of one or more other parameters
(x
and y
in this example)
and therefore its value can be automatically deduced
from the types of the corresponding operands.
For that reason,
parameters of this kind
(which are quite similar to, e. g., template parameters in C++
and type parameters in Java)
are called deduced parameters.
Therefore,
if a
and b
are variables of type int?
,
T
will be deduced as int
in the operator application a ↔ b
.
If they would have type float?
,
T
would accordingly be deduced as float
.
If they had different types
– or if their type would not match the “pattern” T?
–,
the expression a ↔ b
would be erroneous.
If a variable has not been assigned any value yet, its storage cell is logically empty, i. e., it does neither contain an undefined random value (as, e. g., local variables in C) nor a specific default value (as, e. g., zero for instance variables of numeric type in Java). Instead, it actually contains nothing or nil, which is different from any real value.
Nevertheless,
querying the content of such a variable
(either explicitly by applying the prefix question mark operator
or implicitly by means of the type conversion mentioned above)
is well-defined and actually returns nil.
More generally,
nil is always returned in situations
where no sensible real value is available.
For example,
an operator whose implementation is empty,
always returns nil.
Similarly,
an application of the one-way conditional operator if • then • end
returns nil,
if its condition is false.
Furthermore,
semantically undefined computations such as integer division by zero
return nil.
(For other situations,
cf. “Sequences.”)
In some sense, the concept of nil is similar to the null pointer or reference in other languages, but it is not restricted to pointer or reference types (which are not available as such in MOSTflexiPL anyway), but is applicable to all (predefined and user-defined) types. Furthermore, there is no predefined generic constant that directly returns nil; if it is explicitly required as a “value,” it can be obtained indirectly from one of the situations mentioned above.
To find out whether the value of an expression is a real value or nil,
it can be used as the first operand of one of the conditional operators
if • then • end
and if • then • else • end
,
which treat all real values (including, e. g., 0
) of any type
as being “true”
and nil as being “false”
(which implies that the bool
value false
is identical to nil).
For example:
["x" : int] "is" "nil" x : bool { if x then false else true end }; "a" : int?; is nil ?a; true a = 0; is nil ?a false
Because the value of expressions of almost any kind might be nil, operators must handle this case appropriately. For the predefined operators, the following rules apply:
If at least one operand of an arithmetic operation is nil, the result will be nil, i. e., nil values propagate through arithmetic operations.
Nil is incomparable with real values,
i. e., it is neither less than nor equal to nor greater than any real value,
i. e., all respective comparisons will return false.
However,
nil is equal to itself,
which distinguishes it from the otherwise similar concept of NaN
(not a number),
where NaN is considered different from NaN.
Querying the content of a nil variable returns nil,
while assigning to a nil variable has no effect.
This means that a nil variable
behaves similar to the Unix null device /dev/null
,
which always returns EOF (i. e., nothing or nil) when being read
and discards everything being written into it.
Printing a nil value with the predefined print•
operator
prints nothing
(which is more logical than a surrogate word such as nil
or null
).
Further rules can be found in “Sequences.”
A parameterless operator can be defined by simply omitting the parameter list (and the square brackets) in an operator declaration, e. g.:
"a" := 257; "b" := 17; "m" := 256 * 256; "x" : int?; x = 1; "???" : int { x = (a * x + b) % m }; print ???Every application of the operator
???
returns a pseudo-random number x
computed by the formula used in its implementation
from the previous value of x
and the constants a
, b
, and m
.
(As in C,
the assignment operator returns the assigned value.)
Even though the application of this operator
looks basically the same as the use of a constant such as m
,
there is of course a significant difference:
The expression used to initialize a constant
is evaluated once when the constant is defined,
and every subsequent use of the constant returns the value computed there.
In contrast,
the expression used as the implementation of an operator
is not evaluated when the operator is defined,
but rather each time the operator is applied afterwards.
The intuitive substring syntax mentioned in “Background and Motivation” is an example of an operator possessing multiple parameters and multiple names:
["s" : string; "i" : int; "j" : int] s "[" i ".." j "]" : string { "i'" := if i >= 1 then i else 1 end; "j'" := if j <= #s then j else #s end; "t" : string?; t = ""; "k" : int?; k = i'; while k <= j' do t = t, s[k]; ++k end; t }Its implementation uses several predefined operators not introduced so far: For some string
s
,
#s
yields its length,
i. e., the number of characters it contains,
while s[k]
returns its k
-th character counted from 1
(because humans usually start counting there).
The •,•
operator can be used (amongst others)
to concatenate a string with a single character.
while • do • end
denotes a loop
that repeatedly evaluates its operands in turn
until the first operand's value becomes nil.
The constants i'
and j'
(demonstrating again that operator names might contain arbitrary characters)
are used to make the substring operator
robust against out of range index values,
i. e., substrings starting before position 1
actually start at the first character,
while substrings exceeding the length of s
actually stop at its last character.
Based on this operator, it is easily possible to define the variations also mentioned in “Background and Motivation”, e. g.:
["s" : string; "i" : int; "j" : int] s "[" i ".." j ")" : string { s[i..j−1] }Even further variations with rather self-explanatory meaning are also possible, e. g.:
["s" : string; "j" : int] s "[" ".." j "]" : string { s[1..j] }; ["s" : string] s "(" ".." "]" : string { s[2..#s] }
A static operator is an operator whose result value is not computed by a user-defined implementation, but rather internally by the run-time system according to the following rule: Two applications of static operators will return the same value if and only if the operators are the same and they are applied to the same operand values. This implies that two static expressions, i. e., expressions containing only static operators, will return the same value if and only if they are structurally equal, i. e., if they have the same top-level operator and operands which are also pairwise structurally equal.
Constants without initialization (cf. “Constants and Types”) are a simple special case of static operators, which can be used, amongst others, to define new atomic types and objects or values of these types, e. g.:
"Color" : type; "red" : Color; "green" : Color; "blue" : Color; "Person" : type; "first" "person" : Person; "second" "person" : PersonAccording to the above rules,
Color
and Person
denote different types,
because they are different static operators.
Likewise,
red
, green
, and blue
represent different Color
values,
while first person
and second person
denote different Person
values
(or “objects”).
Static operators with parameters can be used, amongst others, as type constructors to define parameterized or structured types, e. g.:
["T" : type] "List" T : type; ["T" : type; "M" : int; "N" : int] T "{" M "x" N "}" : typeAccording to the above rules, the static operator
List•
maps each type T
to a unique other type List T
,
e. g., List int
, List string
, or List List Person
.
In the same way,
the static operator •{•x•}
maps each combination of some type T
and some int
values M
and N
to some unique type T{MxN}
intended to represent MxN
matrices with elements of type T
,
e. g., int{3x2}
or float{3x4}
.
In contrast to static operators, operators possessing a user-defined implementation (in curly braces) are called dynamic operators.
Even though it is possible in principle
to define dynamic operators with result type type
,
this is not very useful in practice,
because expressions that shall be used as types
(e. g., as the parameter or result types of operators)
must be static expressions
in order to make equality of types decidable at compile time
by simply comparing type expressions for structural equality (cf. above).
As another example of static operators,
the following operator
maps each pair of types X
and Y
to a unique type X → Y
,
intended to represent mappings from X
to Y
or attributes of type X
with target type Y
:
["X" : type; "Y" : type] X "→" Y : typeSimilarly, the following static operator maps each combination of an object or value
x
of some type X
and an attribute a
of this type with some target type Y
to a unique variable with content type Y
:
["X" : type; "Y" : type; "x" : X; "a" : X → Y] x "." a : Y?These operators can be used to define flexibly extensible data structures, e. g.:
"Person" : type; "name" : Person → string; "p1" : Person; "p2" : Person; p1.name = "Alice"; p2.name = "Bob"; print p2.name; "father" : Person → Person; p1.father = p2; print p1.father.nameBecause every parameterless static operator has a unique result value,
Person
denotes a unique type,
name
denotes a unique attribute with type Person → string
,
and p1
and p2
denote unique objects of type Person
.
According to the definition of the dot operator,
the subexpression p1.name
returns a unique variable with type string?
,
which is assigned the string literal "Alice"
.
Likewise,
p2.name
returns another unique variable with type string?
,
which is assigned the string literal "Bob"
.
Generally speaking,
for every person object p
,
p.name
returns a different unique variable with type string?
,
which can be used to store the name of person p
.
On the other hand,
because a static operator always returns the same value
when being applied to the same operand value(s),
repeated invocations of p.name
for the same person p
will always return the same variable.
Therefore,
printing (the content of) the variable returned by p2.name
will print the string Bob
that has been previously assigned to this variable.
Similarly,
father
denotes a unique attribute with type Person → Person
,
and thus p1.father
returns a unique variable with type Person?
,
which is assigned the value (returned by the operator) p2
.
Therefore,
printing p1.father.name
will also print Bob
.
As the example demonstrates, it is not necessary to define the attributes of a type all at once, because it is always possible to add new attributes later on. Therefore, data types defined that way are not “closed” like records, structs, or classes in most other languages, but “open” for modular, incremental extensions, just like “open types” in the predecessor project APPLEs.
Because a newly created variable that has not been assigned any value yet
contains nil,
querying an attribute of an object that has not been assigned yet
(e. g., p2.father
) also returns nil.
Likewise,
querying an attribute of a nil object (e. g., p2.father.father
)
returns nil.
If one does not like the dot operator for accessing attributes, one might define arbitrary other operators possessing the same behaviour, e. g.:
["X" : type; "Y" : type; "x" : X; "a" : X → Y] x "'s" a : Y? { x.a }; ["X" : type; "Y" : type; "x" : X; "a" : X → Y] "the" a "of" x : Y? { x.a }Using these operators, one might write, e. g.,
print the name of p1's father
instead of print p1.father.name
.
It is also possible to define constructor-like operators with arbitrary syntax, e. g.:
["name" : string] "new" "person" "with" "name" name : Person { "p" : Person; p.name = name; p }; "p3" := new person with name "Claire"
For some type T
,
the type T*
denotes sequences with element type T
,
in the same way the Kleene closure Σ*
denotes the set of all finite sequences (or “words”)
of elements from the set Σ
.
The type string
that has been used in “Substrings”
is actually just a synonym for char*
,
i. e., it denotes sequences of characters,
and the operators for strings mentioned there
can be used for other types of sequences, too.
More specifically,
for elements x
and y
of some type T
,
sequences s
and t
of the corresponding sequence type T*
,
and an int
value i
:
#s
returns the length of s
,
i. e., the number of elements it contains.
s[i]
returns the i
-th element of s
counted from 1.
If i
is not between 1 and #s
(both inclusively),
s[i]
returns nil.
x, y
returns a new sequence of length 2
containing the elements x
and y
.
s, y
returns a new sequence of length #s + 1
containing the elements of s
followed by the element y
.
x, t
returns a new sequence of length 1 + #t
containing the element x
followed by the elements of t
.
s ++ t
returns a new sequence of length #s + #t
containing the elements of s
followed by the elements of t
.
By combining the first two comma operators,
sequences can be easily constructed by enumerating their elements,
e. g., 2, 3, 5, 7
.
In this expression,
the first comma combines the int
values 2
and 3
into a sequence of type int*
,
while the other commas append the int
value 5
resp. 7
to the previously constructed sequence.
(The expression is unambiguous,
because the operators are left-associative.)
An empty sequence s
can be obtained by a constant declaration "s" : T*
,
and a one-element sequence can then be constructed as s, x
or x, s
.
The concatenation of two sequences
is not provided by another overloaded comma operator,
because then s, t
would be ambiguous:
It could mean the concatenation of s
and t
returning a sequence of the same type T*
,
or it could mean the combination of s
and t
into a two-element sequence of type T**
.
In particular,
this ambiguity would arise for strings,
e. g., "hello", "world"
,
because strings are sequences of characters.
A nil sequence is almost indistinguishable from an empty sequence, i. e., the length of a nil sequence is 0 (zero, which is a real value, not nil), and concatenating a nil sequence with another sequence or an element in either order returns the other sequence or a sequence containing this element, respectively. Nil elements are treated just like other elements in concatenations, i. e., nil does not propagate through concatenations as it does through arithmetic operations. However, querying any element of a nil sequence returns nil.
The only noticeable difference between an empty and a nil sequence is the fact that an empty sequence is a real value that is treated as “true” by the conditional operators, while a nil sequence is treated as “false” (cf. “Nil”).
To give an example,
the following operator [•..•]
recursively constructs and returns a sequence
that contains all integers between i
and j
, both inclusively:
["i" : int; "j" : int] "[" i ".." j "]" : int* { if i <= j then i, [i+1..j] end }If
i
is less than or equal to j
,
the result sequence [i..j]
consists of the element i
followed by the elements of the sequence [i+1..j]
.
Otherwise,
the one-way conditional operator returns nil,
i. e., a sequence without elements.
Similar to the substring operators defined earlier,
it would be easy to add further operators
[•..•)
, (•..•]
, and (•..•)
to construct sequences where one or both of the border elements are excluded.
By adding a deduced parameter T
and replacing all occurrences of string
with T*
,
the substring operators defined in “Substrings”
can easily be generalized to generic subsequence operators,
e. g.:
["T" : type; "s" : T*; "i" : int; "j" : int] s "[" i ".." j "]" : T* { "i'" := if i >= 1 then i else 1 end; "j'" := if j <= #s then j else #s end; "t" : T*?; "k" : int?; k = i'; while k <= j' do t = t, s[k]; k end; t }; ["T" : type; "s" : T*; "i" : int; "j" : int] s "[" i ".." j ")" : T* { s[i..j−1] }; ["T" : type; "s" : T*; "j" : int] s "[" ".." j "]" : T* { s[1..j] }; ["T" : type; "s" : T*] s "(" ".." "]" : T* { s[2..#s] }Replacing
string
with T*
results in t
having type T*?
,
i. e., it is a variable whose content is a sequence of elements of type T
.
Therefore,
the assignment t = ""
has been removed,
because the string literal ""
with type char*
cannot be assigned to this variable.
In fact,
the assignment is not necessary at all,
because the initial value of t
is nil,
i. e., a sequence without elements.
Quite similar to functions in functional languages, operators are “first-class values” which can be passed as parameters to other operators, returned as the result of operator applications, stored in variables, and so forth. Therefore, it is possible to define “higher-order operators,” e. g.:
[ "X" : type; "Y" : type; "s" : X*; "t" := ["x" : X] "trans" x : Y {} ] s "map" t : Y* { if #s > 0 then trans s[1], s(..] map t end }According to its signature, the operator
•map•
is applied to a sequence s
of some type X*
and an operator t
,
which is actually an alias for a unary operator trans•
with parameter type X
and some result type Y
.
Note that a simple parameter such as s
can be used directly in an operator's signature,
because s
constitutes a complete and correct subexpression,
viz an application of the nullary operator s
(cf. “Square”).
In contrast,
a parameter such as trans•
that has itself parameters
cannot be used that way,
because trans
without an operand
would be an incomplete and therefore erroneous expression.
Therefore,
the alias t
is required
to reference the parameter from within the signature.
Formally,
t
is a constant
(i. e., a nullary operator, cf. “Constants and Types”)
that is initialized
with the value of the declaration ["x" : X] "trans" x : Y {}
,
and since a declaration is an expression
that returns the operator declared by it,
applications of t
actually yield the operator trans•
.
The curly braces at the end of its declaration
designate this operator as a dynamic in contrast to a static operator
(cf. “Static Operators and Expressions”);
if they were omitted,
•map•
would not allow dynamic operators
as actual values for this parameter.
Because a declaration returns the operator declared by it,
a sample application of map
might look as follows:
(p1, p2) a sequence of Persons map ["p" : Person] "name" p : string { p.name } an operator that maps Persons to stringsThe result of
•map•
is a sequence of type Y*
(where Y
represents the result type
of the actual transformation operator trans•
),
whose elements are computed from the elements of s
by applying the operator trans•
to them.
Therefore,
the above application of •map•
returns a sequence
containing the names of p1
and p2
,
i. e., the string
values "Alice"
and "Bob"
(cf. “Open Types”).
The result sequence is computed recursively
by applying the sequence concatenation operator •,•
(cf. “Sequences”)
to trans s[1]
,
i. e., the result of applying trans•
to the first element s[1]
,
and s(..] map t
,
i. e., the result of applying •map•
to the remaining elements s(..]
(cf. “Subsequences”)
and the same operator t
.
(Here,
the alias t
,
whose application returns the actual operator trans•
,
is useful once more.)
If the length of the input sequence s
is zero,
the one-way conditional operator if • then • end
returns a nil sequence of type Y*
(cf. “Nil”),
whose length is zero, too.
In a similar way,
higher-order operators filter
and fold
can be defined as follows:
[ "T" : type; "s" : T*; "p" := ["x" : T] "pred" x : bool {} ] s "filter" p : T* { if #s > 0 then "t" := s(..] filter p; if pred s[1] then s[1], t else t end end }; [ "T" : type; "s" : T*; "z" : T; "c" := ["x" : T; "y" : T] x "comb" y : T {} ] s "fold" z c : T { if #s > 0 then s(..] fold (z comb s[1]) c else z end }
filter
is applied to a sequence s
of some type T*
and a predicate pred•
and returns a sequence of the same type
that contains all elements of s
satisfying the predicate.
fold
is also applied to a sequence s
of some type T*
,
a “zero value” z
of type T
,
and a binary combination operator •comb•
.
It effectively computes and returns the value of
z comb s[1] comb s[2] comb ... comb s[#s]
(if •comb•
would be left-associative).
As well-known from functional programming,
map
, filter
, and fold
typically unfold their maximum power when used in combination,
e. g.:
[1..100] filter ["x" : int] "odd" x : bool { x % 2 == 1 } map ["x" : int] "square" x : int { x² } fold 0 ["x" : int; "y" : int] x "add" y : int { x + y }This computes the sum of the squares of the odd numbers from 1 to 99.
The following sort
operator
is another example of a higher-order operator
that also uses some of the subsequence operators introduced earlier:
[ "T" : type; "s" : T*; "lt" := ["x" : T; "y" : T] x "<" y : bool {} ] s "sort" lt : T* { if #s <= 1 then s else ["s" : T*; "t" : T*] s "merge" t : T* { if #s == 0 then t else if #t == 0 then s else if s[1] < t[1] then s[1], (s(..] merge t) else t[1], (t(..] merge s) end end end }; "m" := #s/2; (s[..m] sort lt) merge (s(m..] sort lt) end }In the non-trivial case, the sequence
s
is sorted
by splitting it into (almost) equally sized parts s[..m]
and s(m..]
,
which are sorted recursively and merged afterwards
by a local auxiliary operator •merge•
that is also implemented recursively.
In the subexpression s[1] < t[1]
,
the parameter •<•
of the enclosing sort
operator is applied
to compare the first elements of the sequences s
and t
.
The following example employs the sort
operator
to sort a sequence of strings in descending order of their lengths:
("This", "is", "a", "sequence", "of", "strings") sort ["x" : string; "y" : string] x "cmp" y : bool { #x > #y }
If a sequence s
of numbers shall be sorted in the “natural way”
according to the predefined •<•
operator,
there are different possibilities,
none of which is really convenient:
s sort ["x" : int; "y" : int] x "cmp" y : bool { x < y }; s sort ^ ["x" : int; "y" : int] x "<" y : bool; "lt" := ^ ["x" : int; "y" : int] x "<" y : bool; s sort ltIn the first case, an auxiliary operator
•cmp•
is used
that simply calls the predefined •<•
operator.
In the second case,
the predefined •<•
operator
is directly referenced by the redeclaration
^ ["x" : int; "y" : int] x "<" y : bool
.
In general,
a redeclaration is a declaration without an implementation (curly braces)
that is prefixed with a caret;
it does not declare a new operator,
but rather returns an operator with the same syntax and type
that has been declared earlier.
Therefore,
redeclarations can be used to uniquely identify existing operators.
In the third case,
the redeclaration of the predefined •<•
operator
is used to initialize the constant lt
whose value can then be passed to the sort operator.
However,
the most convenient way of sorting a sequence s
according to the natural order of its element type
would be to simply write s sort
.
In that case,
the appropriate comparison operator •<•
should be passed implicitly.
This can be achieved by replacing the explicit parameter •<•
of the sort operator with an implicit one as follows:
[ "T" : type; "s" : T*; ["x" : T; "y" : T] x "<" y : bool {} ] s "sort" : T* { if #s <= 1 then s else ...... local operator merge as before "m" := #s/2; (s[..m] sort) merge (s(m..] sort) end }Because the parameter
•<•
(or an alias for it)
does not appear in the operator's signature anymore,
it is no longer an explicit parameter of the operator
and therefore it need and must not be passed as an explicit operand anymore.
In contrast to a deduced parameter, however,
that appears in the type of one or more other parameters,
the value of this parameter cannot be deduced from the type of other operands.
In fact,
it constitutes a third kind of parameters called an implicit parameter.
When an operator possessing an implicit parameter is applied,
the compiler looks for an operator
that has the same syntax and type as the parameter
and that is visible at the point of application
and passes it as the actual value of the parameter.
(If no such operator exists,
the entire application would be erroneous.)
Therefore,
if s
denotes a sequence of type int*
,
s sort
is a correct application of the new sort operator
that implicitly receives the predefined •<•
operator for int
values.
If it would denote a sequence of floating point numbers,
s sort
would implicitly receive
the predefined •<•
operator for float
values.
If s
would be of type Person*
,
s sort
would be erroneous,
because there is no •<•
operator with parameters of type Person
that could be passed implicitly.
To provide a maximum degree of flexibility,
it is in fact advisory to provide both kinds of sort operators,
one with an implicit parameter
and another one with an explicit parameter.
However,
to avoid nasty code duplication,
one of the operators should simply call the other one.
The most convenient way to do this,
is to call the one with the implicit parameter
from the one with the explicit parameter,
because inside the latter's implementation
there is a suitable •<•
operator that can be passed implicitly,
viz the parameter aliased to lt
:
[ "T" : type; "s" : T*; ["x" : T; "y" : T] x "<" y : bool {} ] s "sort" : T* { ...... complete implementation as above }; [ "T" : type; "s" : T*; "lt" := ["x" : T; "y" : T] x "<" y : bool {} ] s "sort" lt : T* { s sort simply call the other sort operator }
The syntactic flexibility of MOSTflexiPL actually removes the distinction between language designers and users: Every user of the language can become a language designer if he wants. Every operator declaration introduces new syntax along the way and therefore constitutes a – more or less significant – language extension.
For advanced applications it is necessary
to precisely control the visibility (and thus the applicability) of operators
resp. language extensions.
For example,
we might want to define an operator let • in • end
,
where operators declared in its first operand
shall be visible in its second operand,
but not beyond an application of the operator.
Operators declared in its second operand, however,
should be visible afterwards,
e. g.:
let "n" : int?; n = 0 in "inc" : int { n = n + 1 }; "dec" : int { n = n − 1 } end; inc; decThe operators
inc
and dec
declared between in
and end
shall be visible after the end
(i. e., they shall be “public”),
while the variable n
declared between let
and in
shall only be visible between in
and end
(i. e., it shall be “private”).
To achieve that aim, import and export declarations can be included into the parameter list of an operator declaration, e. g.:
[ "X" : type; "Y" : type; deduced parameters "x" : X; "y" : Y; explicit parameters y :+ ^ | x; import declaration for y ^ :+ y export declaration ] "let" x "in" y "end" : Y { y }To precisely understand the meaning of these declarations, the following definitions are required:
Every (sub)expression of a program imports particular operators from its “parent” or from its preceding “siblings,” and it exports particular operators to its parent.
The operators imported by an expression are exactly those which are visible and thus applicable in this expression.
The top-level expression of a program imports all predefined operators.
A subexpression of another expression usually imports the same operators as the enclosing expression, i. e., imported operators are usually passed down to subexpressions.
For applications of a particular operator such as let • in • end
,
this default behaviour can be changed
by including one ore more import declarations
(denoted by the predefined infix operator •:+•
)
into the parameter list of its declaration,
e. g.:
y :+ ^ | xThis declares that the second operand (denoted by the corresponding parameter
y
)
of an application of let • in • end
imports the union
(denoted by the predefined infix operator •|•
)
of the operators imported by the entire application
(denoted by the predefined nullfix operator ^
)
and the operators exported by the first operand
(denoted again by the corresponding parameter x
).
A declaration exports the operator declared by it.
Other expressions usually export nothing.
For applications of a particular operator,
this default behaviour can be changed
by including an export declaration
(also denoted by the operator •:+•
)
into the parameter list of its declaration,
e. g.:
^ :+ ySimilar to an import declaration, this declares that the entire operator application (denoted by
^
,
this time appearing on the left hand side of •:+•
)
exports the operators exported by the second operand (y
).
let • in • end
as follows:
[ "X" : type; "Y" : type; "x" : X; "y" : Y; y :+ ^ | x; ^ :+ x | y ] x ";" y : Y { y }The only difference to
let • in • end
is the fact that applications of the semicolon operator
export the operators exported by both of its operands.
Therefore,
chained applications of the semicolon operator
actually “accumulate” all operators declared “earlier,”
e. g. (the artificial indentation shall visualize the actual operator tree):
"a" : int? ; a = 1 ; "b" : int? ; b = aThe declaration
"a" : int?
exports the operator declared by it,
i. e., the variable a
.
The first semicolon application passes it
(due to its import declaration y :+ ^ | x
)
from its left to its right operand
so that it can be used there.
Furthermore,
it exports it (due to the export declaration ^ :+ x | y
)
to its parent,
i. e., to the second semicolon application,
which in turn passes it from its left to its right operand,
where it is not used, however.
There,
the declaration "b" : int?
exports the variable b
.
Because a semicolon application
exports the operators exported by both of its operands,
the second semicolon application exports a
and b
.
Finally,
the third semicolon application
passes these operators from its left to its right operand
so that both a
and b
can be used there.
To give yet another example, the predefined control structure operators possess the following import (and no export) declarations:
[ "X" : type; "Y" : type; "x" : X; "y" : Y {}; y :+ ^ | x ] "if" x "then" y "end" : Y { ...... }; [ "X" : type; "Y" : type; "x" : X; "y1" : Y {}; "y2" : Y {}; y1 :+ ^ | x; y2 :+ ^ | x ] "if" x "then" y1 "else" y2 "end" : Y { ...... }; [ "X" : type; "Y" : type; "x" : X {}; "y" : Y {}; y :+ ^ | x ] "while" x "do" y "end" : int { ...... }
As already mentioned in “Approach,” control structures are operators, too, that can be easily defined by users.
For example,
an iteration for • in • do • end
through the elements of a sequence
can be defined and used as follows:
["T" : type; "var" : T?; "seq" : T*; "X" : type; "body" : X {}] "for" var "in" seq "do" body "end" : int { "i" : int?; i = 1; while i <= #seq do var = seq[i]; body; ++i end }; "x" : int?; for x in [1..100] do print x endBecause every operator must have a result type and yield a value,
int
is usually used as the result type of loop operators,
and by convention they return the number of iterations that have been performed.
Because the predefined while
loop adheres to this convention,
its value can be directly used as the result value of the new for
loop.
Because the loop body shall be re-evaluated in every iteration,
it cannot be passed by value as the other parameters.
Instead,
it must be passed as an unevaluated expression
that is evaluated each time
the parameter body
is used in the operator's implementation.
In fact,
a “call by value” parameter such as seq
corresponds to a constant that is initialized once per operator application
with the corresponding operand,
while a “call by expression” parameter such as body
corresponds to a parameterless dynamic operator
whose implementation is the corresponding operand.
Due to this similarity,
“call by expression” parameters
are designated with a trailing pair of curly braces.
(This usage of {}
is similar,
but not identical to that mentioned earlier in “Higher-Order Operators,”
where it has been used with parameters having themselves parameters,
while here it is used with a parameterless parameter.)
The operator's implementation
uses an auxiliary int
variable i
that is initialized to 1.
Then,
in every iteration of the predefined while
loop,
the actual loop variable var
(i. e., x
in the example application)
is set to the i
-th element of seq
(i. e., of the sequence constructed by [1..100]
,
cf. “Sequences”).
Afterwards,
the loop body (i. e., print x
) is evaluated,
and the auxiliary variable is incremented.
The loop terminates
when i
becomes greater than the length of seq
.
Because all names appearing in the actual loop body
(i. e., print
and x
)
are resolved and bound to the corresponding operators
before the expression is passed to the loop operator,
there is no danger of inadvertent name capture or “unhygiene.”
The following operator select • from • in • where • end
can be used to combine filtering and mapping the elements of a sequence
with a syntax resembling SQL.
Its implementation uses the loop operator defined above:
[ "X" : type; "Y" : type; "expr" : X {}; "var" : Y?; "seq" : Y*; "cond" : bool {} ] "select" expr "from" var "in" seq "where" cond "end" : X* { "res" : X*?; for var in seq do if cond then res = res, expr end end; res }; "x" : int?; select x² from x in [1..100] where x % 2 == 1 endThis operator can in turn be used to implement the “destructive” remove operator mentioned in “Background and Motivation” (the predefined prefix operator
~•
denotes logical negation):
[ "T" : type; "var" : T?; "seq" : T*?; "cond" : bool {} ] "remove" var "from" seq "where" cond "end" : T* { seq = select var from var in seq where ~cond end }; "s" : int*?; s = [1..100]; "x" : int?; remove x from s where x % 2 == 1 end