Language Tutorial with Examples

Instead of giving a formal (and presumably rather boring) language specification, the most important and interesting aspects of MOSTflexiPL shall be introduced in the sequel with various examples, ranging from very simple operator definitions to rather advanced applications.

Square

A simple operator such as the square operator mentioned in “Background and Motivation” can be defined as follows:

    ["n" : int]                     parameter list
    n "²" : int                     signature and result type
    { n * n }                       implementation
Such an operator declaration consists of a parameter list enclosed in square brackets, a signature given before the colon, a result type written afterwards, and an implementation enclosed in curly braces.

The signature is made up of parameters and string literals and defines the operator's syntax, i. e., the syntactic appearance of its applications: Every parameter is a placeholder for an operand, i. e., a subexpression of the corresponding type, while a string literal (that might contain arbitrary characters) defines a name of the operator that must be written literally (but without the quotation marks) in applications. Therefore, and (2+3)² will be legal applications of the square operator defined above, because 5 and (2+3) are subexpressions with type int and ² is the operator's name.

The parameter list consists of parameter declarations (separated by semicolons if necessary), which are actually simple operator declarations, too, consisting (for the moment) only of a signature and a result type. Therefore, every occurrence of the parameter n is actually an application of the nullary operator whose signature is "n" and whose result type is int.

Finally, the implementation is an arbitrary expression, whose type must be equal to the result type and that computes the value returned by an operator application. For example, the value of the expression (2+3)² is computed by initializing the parameter n with the value of the corresponding operand (2+3) (i. e., 5) and afterwards evaluating the implementation n * n, yielding the int value 25.

Absolute Value

An operator |•| returning the absolute value of its operand (whose position is indicated by ) can be defined as follows:

    ["x" : int]
    "|" x "|" : int
    { if x >= 0 then x else −x end }
Here, the implementation consists of an application of the predefined conditional operator if • then • else • end that returns either the value of its second or its third operand depending on the truth value of its first operand. Because the signature consists of a literal vertical bar, the int parameter x, and another literal vertical bar, |0| and |2−5| are examples of applications of this operator.

Factorial

To give another example, the following declaration defines a •! operator that recursively computes the factorial of n:

    ["n" : int]
    n "!" : int
    { if n <= 1 then 1 else (n−1)! * n end }
Similar to recursion in other programming languages, it is possible to use the operator just being defined in its own implementation.

By combining the operators defined so far, |2−5|!² would be a legal expression yielding the int value 36.

Sequential Composition

The predefined left-associative infix operator •;• denotes sequential composition of subexpressions, i. e., it evaluates its operands from left to right and returns the value of the second operand.

Because declarations are expressions, too, the semicolon can also be used to separate multiple successive declarations or to intermix declarations with expressions, e. g.:

    ["n" : int]
    n "²" : int
    { n * n };
 
    ["x" : int]
    "|" x "|" : int
    { if x >= 0 then x else −x end };
 
    ["n" : int]
    n "!" : int
    { if n <= 1 then 1 else (n−1)! * n end };
 
    print |2−5|!²
The above lines constitute a complete MOSTflexiPL program consisting of the three operator declarations introduced above and an application of the predefined print• operator printing the value of its operand.

Because the semicolon operator is an infix operator that separates multiple subexpressions instead of a postfix operator terminating individual subexpressions (as in many other languages), there is no final semicolon at the end of the program.

Exclude Declarations

The expression |2−5|!² shown before has been carefully chosen as an example containing several different operators – both predefined and user-defined ones – while still possessing a unique interpretation. Many other expressions involving these and other operators, however, are ambiguous, i. e., they have two or more possible interpretations. For example, 2−5! might either mean (2−5)! or 2−(5!), while −3² might be interpreted as either (−3)² or −(3²). To rule out one of the possibilities in each case, it can be placed in an exclude declaration as follows:

    excl (2−5)! end;
    excl (−3)² end
Generally speaking, the operand inside excl • end is a prototypical expression (i. e., the actual values of its leaf operands do not matter) whose hierarchical structure as an operator tree can only be constructed by using explicit parentheses. Put differently, if the same expression would be written without parentheses, the compiler is not allowed to interpret it in the given way.

Because the operand of excl • end might be any expression involving any number of parenthesized subexpressions, it is possible to specify multiple exclude rules within a single declaration, e. g.:

    excl (0+0)²; (0−0)²; (0*0)²; (0/0)²; (0%0)²; (−0)² end;
    excl (0+0)!; (0−0)!; (0*0)!; (0/0)!; (0%0)!; (−0)! end
After these declarations, every possible combination of the new square and factorial operators with any of the predefined arithmetic operators should have a unique interpretation, where the postfix operators bind more tightly than prefix and infix operators according to usual mathematical habits.

It should be noted that many other expressions do not require explicit exclude declarations to make them unique. For example, the only meaningful interpretation of |2−5|!² is ((|(2−5)|)!)², because the circumfix operator |•| acts itself as a kind of parentheses, while multiple postfix operators can only be applied from left to right.

For expressions such as 0 < 3², which might be interpreted as either (0 < 3)² or 0 < (3²) from a purely syntactical point of view (that is taken by most other languages), the immediate consideration of typing rules helps to rule out semantically meaningless interpretations such as (0 < 3)² in this example, because the comparison operator •<• has result type bool, but the square operator •² requires an operand of type int.

Constants and Types

A constant is a nullary operator whose application always returns the same value. Its declaration consists of a signature (solely made up of string literals because there are no parameters), an optional result type, and an initialization, separated by a colon and an equals sign. If the result type is omitted, it is automatically deduced from the type of the initialization, e. g.:

    "N" : int = 10;                 result type given explicitly
    "M" := N²                       result type deduced from initialization

If the initialization (and the equals sign) is omitted, the constant is initialized with an automatically generated unique value, e. g.:

    "Color" : type;
    "red" : Color;
    "green" : Color;
    "blue" : Color
In the first line, Color is defined as a unique constant of the predefined meta-type type, i. e., it becomes a new type. In the subsequent lines, red, green, and blue are each defined as a unique constant of this new type Color, i. e., they can be used similar to enumeration values in other languages.

Viewed differently, unique constants of a particular type can also be used similar to objects in other languages, e. g.:

    "Person" : type;
    "first" "person" : Person;
    "second" "person" : Person
Again, Person denotes a new type, while first person and second person (note that signatures might consist of multiple string literals) denote unique instances of this type. At the moment, however, these instances are not very useful, because Person does not possess any attributes yet. These will be defined later on in “Open Types.”

Variables

A variable is a constant whose type is a variable type (denoted by a postfix question mark •?) and whose value is a unique mutable storage cell, e. g.:

    "x" : int?
A variable's content is queried with a prefix question mark ?• and changed with the assignment operator •=•, e. g.:
    x = 10;                         assign 10 to the storage cell denoted by x
    x = ?x + 1                      assign the value of x plus 1 (i. e., 11) to x
Because using the prefix question mark is rather tedious, there is an implicit type conversion that automatically converts a variable of type T? to its value of type T on demand, similar to the implicit L-value to R-value conversion known from C and other languages. Therefore, x = ?x + 1 can be written more conveniently as x = x + 1.

On the other hand, the explicit designation of variable types with the postfix question mark allows, for example, to pass variables as parameters and result values of operators and to clearly distinguish them from non-variable values, e. g.:

    ["x" : int?]
    "++" x : int?
    {
        x = x + 1;
        x
    };
 
    ["x" : int?]
    x "++" : int
    {
        "xx" : int = x;
        ++x;
        xx
    }
Similar to the increment operators of C, the prefix ++ operator increments the value of the variable x and returns the variable itself (in C it returns an L-value), while the postfix operator also increments the variable's value, but returns its original value xx (in C it returns an R-value). Therefore, applying the prefix operator multiple times (e. g., ++ ++ x) would be type-correct, because the type of the subexpression ++x is again int?, while applying the postfix operator multiple times (e. g., x ++ ++) would be erroneous, because the type of the subexpression x++ is int while the operator requires an operand of type int?. Applying both operators at once, e. g., ++x++, would be uniquely interpreted as (++x)++, because ++(x++) would not be type-correct (even though a C compiler only tries to interpret it the latter way, because postfix operators generally bind more tightly than prefix operators there, and therefore reports an error).

To give another example, the following •↔• operator swaps the contents of two int variables x and y and returns the second variable:

    ["x" : int?; "y" : int?]
    x "↔" y : int?
    {
        "z" : int = x; x = y; y = z;
        y
    }
For simple applications such as a ↔ b, the operator's result type and value is irrelevant. But to allow chained applications such as a ↔ b ↔ c ↔ d, it is necessary that the result type is again int?. And because the actual result value is the second variable, the overall effect of such a chain will be a circular right-shift of all variable values.

However, to make chained applications of the •↔• operator syntactically unique, the operator must be defined either left- or right-associative, i. e., a ↔ b ↔ c ↔ d should be either interpreted as ((a ↔ b) ↔ c) ↔ d or as a ↔ (b ↔ (c ↔ d)). (Interestingly, the kind of associativity does not influence the effect of chained applications. To achieve a left- instead of a right-shift, the operator had to return the first instead of the second variable.) This can also be achieved with an exclude declaration, e. g., to make the operator left-associative by excluding the right-to-left grouping:

    "x" : int?; "y" : int?; "z" : int?;
    excl x ↔ (y ↔ z) end

Deduced Parameters

The swap operator •↔• can easily be turned into a generic operator that works for variables of any type T? by extending it with a deduced parameter T:

    ["T" : type; "x" : T?; "y" : T?]
    x "↔" y : T?
    {
        "z" : T = x; x = y; y = z;
        y
    }
To express that the type of the parameters x and y as well as the operator's result type shall be T? for some arbitrary type T, T is defined beforehand as a parameter with type type. However, in contrast to the other parameters (which will be called explicit parameters now), T does not appear anywhere in the operator's signature, which implies that its value will not be passed as an explicit operand in applications of the operator. Instead, T appears in the type of one or more other parameters (x and y in this example) and therefore its value can be automatically deduced from the types of the corresponding operands. For that reason, parameters of this kind (which are quite similar to, e. g., template parameters in C++ and type parameters in Java) are called deduced parameters.

Therefore, if a and b are variables of type int?, T will be deduced as int in the operator application a ↔ b. If they would have type float?, T would accordingly be deduced as float. If they had different types – or if their type would not match the “pattern” T? –, the expression a ↔ b would be erroneous.

Nil

If a variable has not been assigned any value yet, its storage cell is logically empty, i. e., it does neither contain an undefined random value (as, e. g., local variables in C) nor a specific default value (as, e. g., zero for instance variables of numeric type in Java). Instead, it actually contains nothing or nil, which is different from any real value.

Nevertheless, querying the content of such a variable (either explicitly by applying the prefix question mark operator or implicitly by means of the type conversion mentioned above) is well-defined and actually returns nil. More generally, nil is always returned in situations where no sensible real value is available. For example, an operator whose implementation is empty, always returns nil. Similarly, an application of the one-way conditional operator if • then • end returns nil, if its condition is false. Furthermore, semantically undefined computations such as integer division by zero return nil. (For other situations, cf. “Sequences.”)

In some sense, the concept of nil is similar to the null pointer or reference in other languages, but it is not restricted to pointer or reference types (which are not available as such in MOSTflexiPL anyway), but is applicable to all (predefined and user-defined) types. Furthermore, there is no predefined generic constant that directly returns nil; if it is explicitly required as a “value,” it can be obtained indirectly from one of the situations mentioned above.

To find out whether the value of an expression is a real value or nil, it can be used as the first operand of one of the conditional operators if • then • end and if • then • else • end, which treat all real values (including, e. g., 0) of any type as being “true” and nil as being “false” (which implies that the bool value false is identical to nil). For example:

    ["x" : int]
    "is" "nil" x : bool
    { if x then false else true end };
 
    "a" : int?;
    is nil ?a;              true
    a = 0;
    is nil ?a               false

Because the value of expressions of almost any kind might be nil, operators must handle this case appropriately. For the predefined operators, the following rules apply:

Further rules can be found in “Sequences.”

Random Numbers

A parameterless operator can be defined by simply omitting the parameter list (and the square brackets) in an operator declaration, e. g.:

    "a" := 257;
    "b" := 17;
    "m" := 256 * 256;
    "x" : int?; x = 1;
 
    "???" : int
    { x = (a * x + b) % m };
 
    print ???
Every application of the operator ??? returns a pseudo-random number x computed by the formula used in its implementation from the previous value of x and the constants a, b, and m. (As in C, the assignment operator returns the assigned value.)

Even though the application of this operator looks basically the same as the use of a constant such as m, there is of course a significant difference: The expression used to initialize a constant is evaluated once when the constant is defined, and every subsequent use of the constant returns the value computed there. In contrast, the expression used as the implementation of an operator is not evaluated when the operator is defined, but rather each time the operator is applied afterwards.

Substrings

The intuitive substring syntax mentioned in “Background and Motivation” is an example of an operator possessing multiple parameters and multiple names:

    ["s" : string; "i" : int; "j" : int]
    s "[" i ".." j "]" : string
    {
        "i'" := if i >= 1 then i else 1 end;
        "j'" := if j <= #s then j else #s end;
 
        "t" : string?; t = "";
        "k" : int?; k = i';
        while k <= j' do
            t = t, s[k];
            ++k
        end;
        t
    }
Its implementation uses several predefined operators not introduced so far: For some string s, #s yields its length, i. e., the number of characters it contains, while s[k] returns its k-th character counted from 1 (because humans usually start counting there). The •,• operator can be used (amongst others) to concatenate a string with a single character. while • do • end denotes a loop that repeatedly evaluates its operands in turn until the first operand's value becomes nil.

The constants i' and j' (demonstrating again that operator names might contain arbitrary characters) are used to make the substring operator robust against out of range index values, i. e., substrings starting before position 1 actually start at the first character, while substrings exceeding the length of s actually stop at its last character.

Based on this operator, it is easily possible to define the variations also mentioned in “Background and Motivation”, e. g.:

    ["s" : string; "i" : int; "j" : int]
    s "[" i ".." j ")" : string
    { s[i..j−1] }
Even further variations with rather self-explanatory meaning are also possible, e. g.:
    ["s" : string; "j" : int]
    s "[" ".." j "]" : string
    { s[1..j] };
 
    ["s" : string]
    s "(" ".." "]" : string
    { s[2..#s] }

Static Operators and Expressions

A static operator is an operator whose result value is not computed by a user-defined implementation, but rather internally by the run-time system according to the following rule: Two applications of static operators will return the same value if and only if the operators are the same and they are applied to the same operand values. This implies that two static expressions, i. e., expressions containing only static operators, will return the same value if and only if they are structurally equal, i. e., if they have the same top-level operator and operands which are also pairwise structurally equal.

Constants without initialization (cf. “Constants and Types”) are a simple special case of static operators, which can be used, amongst others, to define new atomic types and objects or values of these types, e. g.:

    "Color" : type; "red" : Color; "green" : Color; "blue" : Color;
    "Person" : type; "first" "person" : Person; "second" "person" : Person
According to the above rules, Color and Person denote different types, because they are different static operators. Likewise, red, green, and blue represent different Color values, while first person and second person denote different Person values (or “objects”).

Static operators with parameters can be used, amongst others, as type constructors to define parameterized or structured types, e. g.:

    ["T" : type]
    "List" T : type;
 
    ["T" : type; "M" : int; "N" : int]
    T "{" M "x" N "}" : type
According to the above rules, the static operator List• maps each type T to a unique other type List T, e. g., List int, List string, or List List Person. In the same way, the static operator •{•x•} maps each combination of some type T and some int values M and N to some unique type T{MxN} intended to represent MxN matrices with elements of type T, e. g., int{3x2} or float{3x4}.

In contrast to static operators, operators possessing a user-defined implementation (in curly braces) are called dynamic operators.

Even though it is possible in principle to define dynamic operators with result type type, this is not very useful in practice, because expressions that shall be used as types (e. g., as the parameter or result types of operators) must be static expressions in order to make equality of types decidable at compile time by simply comparing type expressions for structural equality (cf. above).

Open Types

As another example of static operators, the following operator maps each pair of types X and Y to a unique type X → Y, intended to represent mappings from X to Y or attributes of type X with target type Y:

    ["X" : type; "Y" : type]
    X "→" Y : type
Similarly, the following static operator maps each combination of an object or value x of some type X and an attribute a of this type with some target type Y to a unique variable with content type Y:
    ["X" : type; "Y" : type; "x" : X; "a" : X → Y]
    x "." a : Y?
These operators can be used to define flexibly extensible data structures, e. g.:
    "Person" : type;
    "name" : Person → string;
 
    "p1" : Person;
    "p2" : Person;
    p1.name = "Alice";
    p2.name = "Bob";
    print p2.name;
 
    "father" : Person → Person;
    p1.father = p2;
 
    print p1.father.name
Because every parameterless static operator has a unique result value, Person denotes a unique type, name denotes a unique attribute with type Person → string, and p1 and p2 denote unique objects of type Person. According to the definition of the dot operator, the subexpression p1.name returns a unique variable with type string?, which is assigned the string literal "Alice". Likewise, p2.name returns another unique variable with type string?, which is assigned the string literal "Bob". Generally speaking, for every person object p, p.name returns a different unique variable with type string?, which can be used to store the name of person p. On the other hand, because a static operator always returns the same value when being applied to the same operand value(s), repeated invocations of p.name for the same person p will always return the same variable. Therefore, printing (the content of) the variable returned by p2.name will print the string Bob that has been previously assigned to this variable.

Similarly, father denotes a unique attribute with type Person → Person, and thus p1.father returns a unique variable with type Person?, which is assigned the value (returned by the operator) p2. Therefore, printing p1.father.name will also print Bob.

As the example demonstrates, it is not necessary to define the attributes of a type all at once, because it is always possible to add new attributes later on. Therefore, data types defined that way are not “closed” like records, structs, or classes in most other languages, but “open” for modular, incremental extensions, just like “open types” in the predecessor project APPLEs.

Because a newly created variable that has not been assigned any value yet contains nil, querying an attribute of an object that has not been assigned yet (e. g., p2.father) also returns nil. Likewise, querying an attribute of a nil object (e. g., p2.father.father) returns nil.

Natural Language Gimmicks

If one does not like the dot operator for accessing attributes, one might define arbitrary other operators possessing the same behaviour, e. g.:

    ["X" : type; "Y" : type; "x" : X; "a" : X → Y]
    x "'s" a : Y?
    { x.a };
 
    ["X" : type; "Y" : type; "x" : X; "a" : X → Y]
    "the" a "of" x : Y?
    { x.a }
Using these operators, one might write, e. g., print the name of p1's father instead of print p1.father.name.

It is also possible to define constructor-like operators with arbitrary syntax, e. g.:

    ["name" : string]
    "new" "person" "with" "name" name : Person
    {
        "p" : Person;
        p.name = name;
        p
    };
 
    "p3" := new person with name "Claire"

Sequences

For some type T, the type T* denotes sequences with element type T, in the same way the Kleene closure Σ* denotes the set of all finite sequences (or “words”) of elements from the set Σ.

The type string that has been used in “Substrings” is actually just a synonym for char*, i. e., it denotes sequences of characters, and the operators for strings mentioned there can be used for other types of sequences, too.

More specifically, for elements x and y of some type T, sequences s and t of the corresponding sequence type T*, and an int value i:

By combining the first two comma operators, sequences can be easily constructed by enumerating their elements, e. g., 2, 3, 5, 7. In this expression, the first comma combines the int values 2 and 3 into a sequence of type int*, while the other commas append the int value 5 resp. 7 to the previously constructed sequence. (The expression is unambiguous, because the operators are left-associative.) An empty sequence s can be obtained by a constant declaration "s" : T*, and a one-element sequence can then be constructed as s, x or x, s.

The concatenation of two sequences is not provided by another overloaded comma operator, because then s, t would be ambiguous: It could mean the concatenation of s and t returning a sequence of the same type T*, or it could mean the combination of s and t into a two-element sequence of type T**. In particular, this ambiguity would arise for strings, e. g., "hello", "world", because strings are sequences of characters.

A nil sequence is almost indistinguishable from an empty sequence, i. e., the length of a nil sequence is 0 (zero, which is a real value, not nil), and concatenating a nil sequence with another sequence or an element in either order returns the other sequence or a sequence containing this element, respectively. Nil elements are treated just like other elements in concatenations, i. e., nil does not propagate through concatenations as it does through arithmetic operations. However, querying any element of a nil sequence returns nil.

The only noticeable difference between an empty and a nil sequence is the fact that an empty sequence is a real value that is treated as “true” by the conditional operators, while a nil sequence is treated as “false” (cf. “Nil”).

To give an example, the following operator [•..•] recursively constructs and returns a sequence that contains all integers between i and j, both inclusively:

    ["i" : int; "j" : int]
    "[" i ".." j "]" : int*
    {
        if i <= j then
            i, [i+1..j]
        end
    }
If i is less than or equal to j, the result sequence [i..j] consists of the element i followed by the elements of the sequence [i+1..j]. Otherwise, the one-way conditional operator returns nil, i. e., a sequence without elements.

Similar to the substring operators defined earlier, it would be easy to add further operators [•..•), (•..•], and (•..•) to construct sequences where one or both of the border elements are excluded.

Subsequences

By adding a deduced parameter T and replacing all occurrences of string with T*, the substring operators defined in “Substrings” can easily be generalized to generic subsequence operators, e. g.:

    ["T" : type; "s" : T*; "i" : int; "j" : int]
    s "[" i ".." j "]" : T*
    {
        "i'" := if i >= 1 then i else 1 end;
        "j'" := if j <= #s then j else #s end;
 
        "t" : T*?;
        "k" : int?; k = i';
        while k <= j' do
            t = t, s[k];
            k
        end;
        t
    };
 
    ["T" : type; "s" : T*; "i" : int; "j" : int]
    s "[" i ".." j ")" : T*
    { s[i..j−1] };
 
    ["T" : type; "s" : T*; "j" : int]
    s "[" ".." j "]" : T*
    { s[1..j] };
 
    ["T" : type; "s" : T*]
    s "(" ".." "]" : T*
    { s[2..#s] }
Replacing string with T* results in t having type T*?, i. e., it is a variable whose content is a sequence of elements of type T. Therefore, the assignment t = "" has been removed, because the string literal "" with type char* cannot be assigned to this variable. In fact, the assignment is not necessary at all, because the initial value of t is nil, i. e., a sequence without elements.

Higher-Order Operators

Quite similar to functions in functional languages, operators are “first-class values” which can be passed as parameters to other operators, returned as the result of operator applications, stored in variables, and so forth. Therefore, it is possible to define “higher-order operators,” e. g.:

    [
        "X" : type; "Y" : type; "s" : X*;
        "t" := ["x" : X] "trans" x : Y {}
    ]
    s "map" t : Y*
    {
        if #s > 0 then
            trans s[1], s(..] map t
        end
    }
According to its signature, the operator •map• is applied to a sequence s of some type X* and an operator t, which is actually an alias for a unary operator trans• with parameter type X and some result type Y.

Note that a simple parameter such as s can be used directly in an operator's signature, because s constitutes a complete and correct subexpression, viz an application of the nullary operator s (cf. “Square”). In contrast, a parameter such as trans• that has itself parameters cannot be used that way, because trans without an operand would be an incomplete and therefore erroneous expression. Therefore, the alias t is required to reference the parameter from within the signature. Formally, t is a constant (i. e., a nullary operator, cf. “Constants and Types”) that is initialized with the value of the declaration ["x" : X] "trans" x : Y {}, and since a declaration is an expression that returns the operator declared by it, applications of t actually yield the operator trans•. The curly braces at the end of its declaration designate this operator as a dynamic in contrast to a static operator (cf. “Static Operators and Expressions”); if they were omitted, •map• would not allow dynamic operators as actual values for this parameter.

Because a declaration returns the operator declared by it, a sample application of map might look as follows:

    (p1, p2)                                        a sequence of Persons
    map
    ["p" : Person] "name" p : string { p.name }     an operator that maps Persons to strings
The result of •map• is a sequence of type Y* (where Y represents the result type of the actual transformation operator trans•), whose elements are computed from the elements of s by applying the operator trans• to them. Therefore, the above application of •map• returns a sequence containing the names of p1 and p2, i. e., the string values "Alice" and "Bob" (cf. “Open Types”).

The result sequence is computed recursively by applying the sequence concatenation operator •,• (cf. “Sequences”) to trans s[1], i. e., the result of applying trans• to the first element s[1], and s(..] map t, i. e., the result of applying •map• to the remaining elements s(..] (cf. “Subsequences”) and the same operator t. (Here, the alias t, whose application returns the actual operator trans•, is useful once more.) If the length of the input sequence s is zero, the one-way conditional operator if • then • end returns a nil sequence of type Y* (cf. “Nil”), whose length is zero, too.

In a similar way, higher-order operators filter and fold can be defined as follows:

    [
        "T" : type; "s" : T*;
        "p" := ["x" : T] "pred" x : bool {}
    ]
    s "filter" p : T*
    {
        if #s > 0 then
            "t" := s(..] filter p;
            if pred s[1] then s[1], t else t end
        end
    };
 
    [
        "T" : type; "s" : T*; "z" : T;
        "c" := ["x" : T; "y" : T] x "comb" y : T {}
    ]
    s "fold" z c : T
    {
        if #s > 0 then
            s(..] fold (z comb s[1]) c
        else
            z
        end
    }
filter is applied to a sequence s of some type T* and a predicate pred• and returns a sequence of the same type that contains all elements of s satisfying the predicate.

fold is also applied to a sequence s of some type T*, a “zero value” z of type T, and a binary combination operator •comb•. It effectively computes and returns the value of z comb s[1] comb s[2] comb ... comb s[#s] (if •comb• would be left-associative).

As well-known from functional programming, map, filter, and fold typically unfold their maximum power when used in combination, e. g.:

    [1..100]
    filter ["x" : int] "odd" x : bool { x % 2 == 1 }
    map ["x" : int] "square" x : int { x² }
    fold 0 ["x" : int; "y" : int] x "add" y : int { x + y }
This computes the sum of the squares of the odd numbers from 1 to 99.

Mergesort

The following sort operator is another example of a higher-order operator that also uses some of the subsequence operators introduced earlier:

    [
        "T" : type; "s" : T*;
        "lt" := ["x" : T; "y" : T] x "<" y : bool {}
    ]
    s "sort" lt : T*
    {
        if #s <= 1 then s else
            ["s" : T*; "t" : T*]
            s "merge" t : T*
            {
                if #s == 0 then t else
                    if #t == 0 then s else
                        if s[1] < t[1] then
                            s[1], (s(..] merge t)
                        else
                            t[1], (t(..] merge s)
                        end
                    end
                end
            };
 
            "m" := #s/2;
            (s[..m] sort lt) merge (s(m..] sort lt)
        end
    }
In the non-trivial case, the sequence s is sorted by splitting it into (almost) equally sized parts s[..m] and s(m..], which are sorted recursively and merged afterwards by a local auxiliary operator •merge• that is also implemented recursively. In the subexpression s[1] < t[1], the parameter •<• of the enclosing sort operator is applied to compare the first elements of the sequences s and t.

The following example employs the sort operator to sort a sequence of strings in descending order of their lengths:

    ("This", "is", "a", "sequence", "of", "strings")
    sort
    ["x" : string; "y" : string] x "cmp" y : bool { #x > #y }

Implicit Parameters

If a sequence s of numbers shall be sorted in the “natural way” according to the predefined •<• operator, there are different possibilities, none of which is really convenient:

    s sort ["x" : int; "y" : int] x "cmp" y : bool { x < y };
 
    s sort ^ ["x" : int; "y" : int] x "<" y : bool;
 
    "lt" := ^ ["x" : int; "y" : int] x "<" y : bool;
    s sort lt
In the first case, an auxiliary operator •cmp• is used that simply calls the predefined •<• operator.

In the second case, the predefined •<• operator is directly referenced by the redeclaration ^ ["x" : int; "y" : int] x "<" y : bool. In general, a redeclaration is a declaration without an implementation (curly braces) that is prefixed with a caret; it does not declare a new operator, but rather returns an operator with the same syntax and type that has been declared earlier. Therefore, redeclarations can be used to uniquely identify existing operators.

In the third case, the redeclaration of the predefined •<• operator is used to initialize the constant lt whose value can then be passed to the sort operator.

However, the most convenient way of sorting a sequence s according to the natural order of its element type would be to simply write s sort. In that case, the appropriate comparison operator •<• should be passed implicitly. This can be achieved by replacing the explicit parameter •<• of the sort operator with an implicit one as follows:

    [
        "T" : type; "s" : T*;
        ["x" : T; "y" : T] x "<" y : bool {}
    ]
    s "sort" : T*
    {
        if #s <= 1 then s else
            ......                  local operator merge as before
 
            "m" := #s/2;
            (s[..m] sort) merge (s(m..] sort)
        end
    }
Because the parameter •<• (or an alias for it) does not appear in the operator's signature anymore, it is no longer an explicit parameter of the operator and therefore it need and must not be passed as an explicit operand anymore. In contrast to a deduced parameter, however, that appears in the type of one or more other parameters, the value of this parameter cannot be deduced from the type of other operands. In fact, it constitutes a third kind of parameters called an implicit parameter.

When an operator possessing an implicit parameter is applied, the compiler looks for an operator that has the same syntax and type as the parameter and that is visible at the point of application and passes it as the actual value of the parameter. (If no such operator exists, the entire application would be erroneous.) Therefore, if s denotes a sequence of type int*, s sort is a correct application of the new sort operator that implicitly receives the predefined •<• operator for int values. If it would denote a sequence of floating point numbers, s sort would implicitly receive the predefined •<• operator for float values. If s would be of type Person*, s sort would be erroneous, because there is no •<• operator with parameters of type Person that could be passed implicitly.

To provide a maximum degree of flexibility, it is in fact advisory to provide both kinds of sort operators, one with an implicit parameter and another one with an explicit parameter. However, to avoid nasty code duplication, one of the operators should simply call the other one. The most convenient way to do this, is to call the one with the implicit parameter from the one with the explicit parameter, because inside the latter's implementation there is a suitable •<• operator that can be passed implicitly, viz the parameter aliased to lt:

    [
        "T" : type; "s" : T*;
        ["x" : T; "y" : T] x "<" y : bool {}
    ]
    s "sort" : T*
    {
        ......                      complete implementation as above
    };
 
    [
        "T" : type; "s" : T*;
        "lt" := ["x" : T; "y" : T] x "<" y : bool {}
    ]
    s "sort" lt : T*
    {
        s sort                      simply call the other sort operator
    }

Visibility Declarations

The syntactic flexibility of MOSTflexiPL actually removes the distinction between language designers and users: Every user of the language can become a language designer if he wants. Every operator declaration introduces new syntax along the way and therefore constitutes a – more or less significant – language extension.

For advanced applications it is necessary to precisely control the visibility (and thus the applicability) of operators resp. language extensions. For example, we might want to define an operator let • in • end, where operators declared in its first operand shall be visible in its second operand, but not beyond an application of the operator. Operators declared in its second operand, however, should be visible afterwards, e. g.:

    let
        "n" : int?; n = 0
    in
        "inc" : int { n = n + 1 };
        "dec" : int { n = n − 1 }
    end;
 
    inc;
    dec
The operators inc and dec declared between in and end shall be visible after the end (i. e., they shall be “public”), while the variable n declared between let and in shall only be visible between in and end (i. e., it shall be “private”).

To achieve that aim, import and export declarations can be included into the parameter list of an operator declaration, e. g.:

    [
        "X" : type; "Y" : type;     deduced parameters
        "x" : X; "y" : Y;           explicit parameters
        y :+ ^ | x;                 import declaration for y
        ^ :+ y                      export declaration
    ]
    "let" x "in" y "end" : Y
    { y }
To precisely understand the meaning of these declarations, the following definitions are required: To give another example, the predefined semicolon operator is defined quite similar to let • in • end as follows:
    [
        "X" : type; "Y" : type;
        "x" : X; "y" : Y;
        y :+ ^ | x;
        ^ :+ x | y
    ]
    x ";" y : Y
    { y }
The only difference to let • in • end is the fact that applications of the semicolon operator export the operators exported by both of its operands. Therefore, chained applications of the semicolon operator actually “accumulate” all operators declared “earlier,” e. g. (the artificial indentation shall visualize the actual operator tree):
                "a" : int?
            ;
                a = 1
        ;
            "b" : int?
    ;
        b = a
The declaration "a" : int? exports the operator declared by it, i. e., the variable a. The first semicolon application passes it (due to its import declaration y :+ ^ | x) from its left to its right operand so that it can be used there. Furthermore, it exports it (due to the export declaration ^ :+ x | y) to its parent, i. e., to the second semicolon application, which in turn passes it from its left to its right operand, where it is not used, however. There, the declaration "b" : int? exports the variable b. Because a semicolon application exports the operators exported by both of its operands, the second semicolon application exports a and b. Finally, the third semicolon application passes these operators from its left to its right operand so that both a and b can be used there.

To give yet another example, the predefined control structure operators possess the following import (and no export) declarations:

    [
        "X" : type; "Y" : type;
        "x" : X; "y" : Y {};
        y :+ ^ | x
    ]
    "if" x "then" y "end" : Y
    { ...... };
 
    [
        "X" : type; "Y" : type;
        "x" : X; "y1" : Y {}; "y2" : Y {};
        y1 :+ ^ | x; y2 :+ ^ | x
    ]
    "if" x "then" y1 "else" y2 "end" : Y
    { ...... };
 
    [
        "X" : type; "Y" : type;
        "x" : X {}; "y" : Y {};
        y :+ ^ | x
    ]
    "while" x "do" y "end" : int
    { ...... }

User-Defined Control Structures

As already mentioned in “Approach,” control structures are operators, too, that can be easily defined by users.

For example, an iteration for • in • do • end through the elements of a sequence can be defined and used as follows:

    ["T" : type; "var" : T?; "seq" : T*; "X" : type; "body" : X {}]
    "for" var "in" seq "do" body "end" : int
    {
        "i" : int?; i = 1;
        while i <= #seq do
            var = seq[i];
            body;
            ++i
        end
    };
 
    "x" : int?;
    for x in [1..100] do
        print x
    end
Because every operator must have a result type and yield a value, int is usually used as the result type of loop operators, and by convention they return the number of iterations that have been performed. Because the predefined while loop adheres to this convention, its value can be directly used as the result value of the new for loop.

Because the loop body shall be re-evaluated in every iteration, it cannot be passed by value as the other parameters. Instead, it must be passed as an unevaluated expression that is evaluated each time the parameter body is used in the operator's implementation. In fact, a “call by value” parameter such as seq corresponds to a constant that is initialized once per operator application with the corresponding operand, while a “call by expression” parameter such as body corresponds to a parameterless dynamic operator whose implementation is the corresponding operand. Due to this similarity, “call by expression” parameters are designated with a trailing pair of curly braces. (This usage of {} is similar, but not identical to that mentioned earlier in “Higher-Order Operators,” where it has been used with parameters having themselves parameters, while here it is used with a parameterless parameter.)

The operator's implementation uses an auxiliary int variable i that is initialized to 1. Then, in every iteration of the predefined while loop, the actual loop variable var (i. e., x in the example application) is set to the i-th element of seq (i. e., of the sequence constructed by [1..100], cf. “Sequences”). Afterwards, the loop body (i. e., print x) is evaluated, and the auxiliary variable is incremented. The loop terminates when i becomes greater than the length of seq.

Because all names appearing in the actual loop body (i. e., print and x) are resolved and bound to the corresponding operators before the expression is passed to the loop operator, there is no danger of inadvertent name capture or “unhygiene.”

Sequence Comprehensions

The following operator select • from • in • where • end can be used to combine filtering and mapping the elements of a sequence with a syntax resembling SQL. Its implementation uses the loop operator defined above:

    [
        "X" : type; "Y" : type;
        "expr" : X {}; "var" : Y?; "seq" : Y*; "cond" : bool {}
    ]
    "select" expr "from" var "in" seq "where" cond "end" : X*
    {
        "res" : X*?;
        for var in seq do
            if cond then
                res = res, expr
            end
        end;
        res
    };
 
    "x" : int?;
    select x² from x in [1..100] where x % 2 == 1 end
This operator can in turn be used to implement the “destructive” remove operator mentioned in “Background and Motivation” (the predefined prefix operator ~• denotes logical negation):
    [
        "T" : type;
        "var" : T?; "seq" : T*?; "cond" : bool {}
    ]
    "remove" var "from" seq "where" cond "end" : T*
    { seq = select var from var in seq where ~cond end };
 
    "s" : int*?; s = [1..100];
    "x" : int?;
    remove x from s where x % 2 == 1 end


Christian Heinlein, 2016-10-12