(Note: This is an incomplete document at this time.)
Gender is a programming language possessing no reserved words or symbols and having no predefined operators, which treats constants and variables interchangeably. Any syntactic object in a Gender program can be redeclared at any time, with certain restrictions. Only the syntax itself is fixed. Gender makes no functional distinctions between constants and variables except in how they are declared, and refers to both simply as objects. All three types of entity available in Gender exhibit context-sensitive semantics. It allows the values of what would in another language be considered literal constants to be effectively redefined, or even undefined. The Gender language exists to answer the following question:
"What would happen if you had a programming language that allowed every semantic object to be redefined at will, and in which nothing at all except syntax was initially defined? Would it be possible to write useful code in such a language?"
(It should be noted that the latter part of this question, in particular, has not yet been fully answered, since no Gender interpreter yet exists, and thus by definition there can be no useful Gender code as yet except as a theoretical demonstration of principles.)
Gender recognizes three classes of entity: objects (which can be either declared or literal), function names (which are also function pointers), and operators. What other languages call variables, Gender calls declared objects; what other languages call constants, Gender calls literal objects. With the exception of literal objects, no Gender entity, once declared, may be redeclared as an entity of a different class.
All objects to be used as variables must be predeclared. A variable object is declared by writing a statement consisting solely of the object name. Declaring a variable object does not assign a type to it. The statement:
ABC
in the absence of any previous declaration of ABC declares an object named ABC possessing no defined type or value. The object named ABC may previously have been a string constant possessing the value ABC.
Any sequence of characters, including a single digit, is a valid object name. Any statement consisting only of a single unbroken sequence of digits -- for instance, the statement:
42
will be interpreted as either declaring or using an object named 42. Anywhere within a specified lexical scope of the program that a syntactic object matching the name of an object previously declared in that or a more global scope appears, it will be preferentially assumed to refer to the declared object.
It will be apparent that in this example, assigning a value to this object will now have the effect on the rest of the program of having redefined the numeric constant 42. (Constants could be, usually accidentally, redefined in a similar manner in FORTRAN.) It is less apparent, but no less true, that declaring an object with the name of a numeric constant, but then never assigning a value to it, has the effect of undefining that constant. (Eat your heart out, FORTRAN.)
Any object can be used as a literal object without first declaring it; doing so does not negate the ability to redeclare it as an assigned object later, and when used in this way, any object's implicit value defaults to its name (i.e, all literal objects are self-referential). Thus, any sequence of characters encountered in a program, not previously declared to be an assigned object or an operator, is effectively treated as either a string or numeric literal object. Note, however, that the undeclared object having the name -1 does not possess the value of the integer having negative sign and magnitude one; it possesses the value of the string consisting of the two characters '-' and '1'.
At this time, Gender assumes all numeric constants to be base-10. Support for arithmetic or numeric constants in bases other than 10 does not currently exist. As a corollary to this, Gender does not at this time support bitfields or bitwise operations.
All assignment is implied. Given the declaration of ABC above, the statement:
ABC 5
assigns the value and type of the object 5 (which is not necessarily the integer 5) to the object ABC. It is not possible for an object to be typed, but not possess a defined value, or vice versa; an object must always have either both a type and a value, or neither. Any declared object takes on the type of the most recent value assigned to it. Its type may be changed at any time by reassigning it a value of a different type. In this example, if the object 5 has not previously been defined, it is interpreted as a literal object having the numeric value 5, and thus this statement sets the value of the object ABC to the integer 5.
String values are assigning in the same manner as numeric values. The following statement will be instantly recognizeable:
ABC "Hello World"
Any object can possess a set value. Sets are much like arrays in other languages, only ... well ... different. Set objects can be declared using the .. construct. For example, the statements:
A A 1..4 B B 1.0..4.0
declare an object A, having as its value an array of integers from 1 to 4, and a similar object B possessing real values. All reals in Gender possess arbitrary precision. If the initial and final objects of a set are such that a range cannot be meaningfully interpolated between them, then the assignment fails, and the type and value of the object are left unchanged (or, as the case may be, undefined).
It is possible to declare an object named 1..4. It need not necessarily possess the value 1..4.
Set objects can also be assigned by enumeration of their values. The following statements:
vowels vowels a e i o u
declare vowels to be a set object whose values are the five unaccented English vowels. Discontinuous ranges can also be assigned in this manner:
foo foo 1..6 8 14 21..30
or this one:
a b c a 0..6 b 50 60 c a b
A set object may possess member values of dissimilar types. In the example above, assuming no other statements prior to the declaration of a exist, then after executing the statement c a b, the object c possesses the value 0..6 50 60. If, however, under the same assumption, the last line were changed to:
c a b d
then the final value of c would be 0..6 50 60 d.
In order to declare any object as a null string, it is first necessary to construct a null string using string operations. That object can then be used either as a value to assign the null-string value to other objects, or to define an operator which assigns the null string value to an object. The null string and the empty string are equivalent and indistinguishable in Gender, but are not the same as an object having no value.
Like objects, operators in Gender must be declared before use. There are no reserved words or symbols in Gender. Operators can be declared in either of two ways: by example, or by derivation. Declaring an operator by example is done using objects whose values are already known, by stating one or more operands, followed by the result of the desired operation on that set of operands, followed by the name to which the operation is to be assigned. For example, the following statement:
2 3 6 mult
declares the word mult to be a multiplication operator. (The statement can be recognized as a declaration of an operator by the fact that it contains more than two objects, but does not contain any syntactic object yet declared to be an operator and does not begin with a declared assigned object.) All operators are assumed to have prefix syntax, in order to permit non-ambiguity between declarations and invocations of unary operators.
Care must be taken to ensure that operator declarations are unambiguous. A properly constructed Gender interpreter should always attempt to apply Occam's razor when evaluating operator declarations, but its behavior in the event that two equally simple operations are equally likely is undefined, as Gender, having no predefined operators, also has no rules of operator precedence. For example, the mult declaration above is unambiguous, but the following declaration:
2 2 4 sum
is equally likely to assign the addition or the multiplication operations to the operator sum.
For another, slightly more complex example of operator assignment:
3 1 2 sub A A sub 0 1 A 1 not
first declares a subtraction operator sub, uses sub to assign the value -1 to a variable A, then uses A and 1 to declare a negation operator not.
Operator declarations can be done using either variables or constants. Thus, the subtraction operator example above could have been written as:
A B C A 1 B 2 C 3 C B A sub
and the negation operator not subsequently declared as:
C sub A B C A not
String operators are declared in the same way:
A A len A 3 len
first declares a variable A, then assigns the string 'len' to it, then declares 'len' to be an operator returning the length of a string. This operation could be simplified in a single statement as follows:
len 3 len
It should be noted that the following statement is precisely equivalent:
abc 3 len
However, if the following two statements have previously been executed:
3 3 c
then the statement:
abc 3 len
could also now result in declaring 'len' to be an operator returning the last character of a string constant. Thus, the meaning of any statement in Gender can be dependent upon the flow of execution in the program and, therefore, which other statements have been executed prior to the statement. This makes it possible (extremely difficult for non-trivial programs, but possible) to write programs in Gender in which the same code performs completely different functions depending upon the initial data supplied to the program.
Sometimes, as in the negation example above, some operators can be declared only by declaring other operators first. The negation operator is a particularly useful one, as it can be used to shortcut the declaration of many other operators, as in the following examples:
4 4 16 mult div not mult 4 16 square sqrt not square 10 2 100 exp root not exp
Note that in this usage, instead of declaring the divide, sqrt and root operators by example, they are declared by applying an operator to another operator. All these are examples of operator declaration by derivation.
Since assignment in Gender is implicit, no equals operator is required. However, if you really miss the equals operator, then once you have a negation operator such as 'not', you can use it to declare an equals operator, in this case 'is':
is not not
The syntax of all operators is assumed to be such that the object following the operator is assumed to be the primary object, acted upon by the operator using the next object. Thus, in the statement:
div A B
B is parsed as the divisor, and the statement is parsed as divide A by B. Likewise the statement:
sub A B
is parsed as subtract B from A, and:
exp A B
is parsed as raise A to the power B.
In the special case of unary operators, there is no second object, of course.
Note that no standard algorithm for declaration by example is provided. Selection of a suitable algorithm is left to the judgement of the interpreter implementor.
Input and output in Gender are performed via a statement consisting solely of a single declared object, not followed by an indented block. This statement has two possible effects.
If the object has been declared, but not yet assigned and typed, then a statement containing just that object causes program execution to be suspended until the user supplies input, causing the value and type of that input to be assigned to the object. It is the programmer's responsibility to verify that the data entered is of the type desired for the object.
If the object already possesses a type and a value at that point in program execution, then the statement causes the program to output the value of the object and continue without pausing.
Thus, the following code:
FOO FOO 42 FOO
causes the program to generate the output 42.
It is not possible to directly output a numeric or string literal in Gender, as the interpreter may be unable to distinguish the output statement from a declaration of an object or an operation on an object. All output must be assigned to an object before being output.
Knowing this, we can now modify the above example slightly and write the complete "Hello World" program in Gender:
HELLO HELLO "Hello world!" HELLO
As can be seen, simple programs in Gender can be extremely terse. (But don't get cocky yet.)
No predefined formatted-output construct such as the C printf() exists in Gender. A Gender programmer requiring formatted output must create it himself, either by hand-assembling the output or by defining a formatted-output operator or function.
Execution of a Gender program begins with the first statement and continues to the last, unless altered by conditionals, loop structures, switch constructs and other control constructs. If the last statement in the program is executed and does not branch execution back elsewhere within the program, execution ends after executing that statement.
All whitespace is significant in Gender, up to a point. Indents denote lexical scoping, while blank lines delimit the beginning and end of blocks of code within a lexical scope. (A closing blank line at the end of a lexically scoped block is optional.) A block of code demarcated by blank lines is either executed completely from beginning to end (unless a branch of execution is performed within the block) or skipped completely. However, the depth of an indent is not significant except relative to other indents; all code in the same lexical scope must be indented to the same depth. Even a single-column difference in indent depth will cause a change of lexical scope.
This can be readily seen in Gender's conditional construct. A conditional in Gender is a statement consisting of either one or two objects, followed immediately by either one or two indented blocks. All conditional tests are implicit depending upon the objects in the conditional statement.
If any different test is desired, it must be separately calculated and the result of the calculation tested.
Whatever the test, the first block immediately following the conditional is executed if the test returns true; if the test returns false, the second block is executed if present. If either test object has not yet been assigned, or if the types of the test objects differ, then the test fails, neither block is executed, and execution passes directly to the next statement following the conditional construct.
The following examples illustrate the conditional construct:
4 A B 99
The above conditional assigns the value of 99 to the object B, if and only if the value of the object A matches the value of the object 4. If it does not, nothing is executed and execution passes to the next statement.
4 A B 99 C + A B D A
In this example, the statement B 99 is executed if the object A has the same value as the object 4. If the values of the two objects differ, but their types are the same, then the statement C + A B will be executed. If either object is untyped or their types differ, execution proceeds directly to the statement D A, assigning the value of A to D.
Compare this to the equivalent Perl or C construct:
Gender: C or Perl: 4 A if (A == 4) { B 99 B = 99; } else { C + A B C = A + B; } D A D = A;
The sense of the test in a conditional can be reversed by preceding the entire comparison statement with a negation operator. In this case, the negation operator is taken to act upon the result of the test as a whole, inverting it so that true becomes false and false, true. See the following example:
not 4 A C + A B B 99 D A
Because the sense of the conditional test is reversed by the negation operator, but the if-true and if-false statements are exchanged, the example above is functionally precisely the same as the previous example.
A more complex but perhaps clearer example of syntactic blocks is seen in the Gender switch construct, which is declared by stating a list of three or more objects, followed by a number of blocks of code. The list of objects may be a set object or contain set objects; the last object in the list is assumed to be the object to be tested against the values of the preceding objects. This final object may also be a set object. The number of code blocks following the switch statement is completely arbitrary, but the useful range is from one, to one plus the number of possible values. If the number of code blocks exceeds the number of possible values of the test object, as determined at runtime based upon the values of the test object and the preceding objects in the value list, then the final code block (which, effectively, matches the switch variable itself at the end of the value list) is used as a fallthrough in the event that no value matches. After executing any block of a switch construct, execution (if not branched elsewhere by the code block) passes to the statement immediately following the last code block of the switch construct.
For example, consider the following code:
pet p o p cat, dog, fish, horse, rabbit. hamster, other? p pet cat dog fish horse rabbit hamster pet o You like cats. o You like dogs. o You like fish. o You like horses. o You like rabbits. o You like hamsters. o I don't know what kind of pets you like. o
This code declares variables pet, p (prompt) and o (output), prompts the user with a list of possible pets by loading the list into p then displaying p, reads the user's input into pet, then chooses exactly one statement about the user's choice of pet, and finally prints it.
In any case where a block of code is expected in a construct, it can be replaced by a blank line. A blank line where a block was expected is interpreted as the null block, or in fact as an arbitrary number of null blocks, thus the construct remains syntactically correct.
Gender allows two basic types of loop constructs: the enumerated loop and the conditional loop. The first type, the enumerated loop, is equivalent to a C or Perl for..next loop. It is constructed as follows: a statement containing only a continuous or discontinuous set of values is followed by an indented block of statements beginning with the declaration of a loop variable object. This object, and any other object declared within that block, is lexically scoped only to the interior of the loop. Any object used, but not declared, within the loop is assumed to be global in scope relative to the loop, if a prior declaration global to the scope of the loop exists; otherwise, it is assumed to be a literal constant. The loop variable object is not persistent between successive executions of the loop block, and thus is effectively redeclared from scratch on each pass through the loop.
For example:
a a 0 1..3 b a + a b a
This simple loop simply calculates the sum of the integers one through three. The loop is executed once for each value of the range specified in its opening statement, each value being automatically assigned in turn to the object b declared in the first statement in the loop body. The range need not be continuous -- the following is also a valid loop, which prints the numbers 1 through 4 and 7 through 10:
1..4 7..10 a a
The types of the values in the set of enumerators need not all be the same. The following is also a valid loop:
0..9 A..F a a
The following short Gender program simply wraps an enumerated loop and uses it to calculate the c'th term of the Fibonacci series whose first two terms are 0 and 1.
1 1 2 + a b c d m m Desired term? m c 1..c e 1 e a 0 b 0 2 e b 1 d + a b a b b d m Result: b m
The second type of loop, the conditional loop, executes for as long as a specified condition remains true. It is the Gender equivalent to a while ... do construct. It is constructed as follows: An indented block of statements after a blank line contains a conditional test, either preceded or followed by a single more deeply indented block of statements which are to be executed so long as the conditional remains true. It can be made to perform in the manner of an until ... do construct by using a negator to reverse the sense of the test. See the following example:
8 4 2 / 2 3 inc 3 2 1 - a b c d a - 2 3 a 1 ! a 1 b 30 c div b a not c 3 inc a c div b a
Experienced Gender programmers will immediately recognize that this is simply a conditional in an isolated, lexically scoped block. When constructed in this way, the Gender interpreter will continue re-executing the conditional until it proves false (i.e, in this case, until the condition "c equals 3" becomes true). In this form, known as a preconditional loop, the loop body will never be executed if the conditional is immediately false when first executed.
In the example above, the loop acts as an until ... do loop. It is equivalent to the C construct:
a = 1; b = 30; until (b/a == 3) { a++; }
To change the syntax to do ... while or do ... until, the conditional is moved from the beginning of the loop to the end, like so:
8 4 2 / 2 3 inc 3 2 1 - a b c d a - 2 3 a 1 ! a 1 b 30 inc a c div b a not c 3
This is known as a postconditional loop, and in this form, the loop body is guaranteed to always be executed at least once.
To declare a Gender function, an object is declared after a blank line. Immediately following this declaration, a list of additional objects in deeper lexical scope is declared. This construct is recognized by the interpreter as a function declaration; the object declared on the first line of the declaration becomes the name of the function, which can also be used as a function pointer, while the following listed objects become the parameter list to the function. Naturally, these parameters are untyped until the function is called, and need not be the same type on subsequent calls.
After the parameter list, a lexical block containing zero or more statements forms the body of the function. It may contain any valid Gender constructs, including declarations of additional objects, which will be lexically scoped to the function body and are not persistent across calls. Finally, after the function body block, an optional final statement is placed in the lexical scope of the parameter list. The value of this statement, if used, will be used as the return value of the function. It can use, but not alter, both parameter list objects and any objects declared in the body block of the function.
The body block of a function may be the null block (a blank line).
The following is a pseudocode illustration of the structure of a Gender function declaration:
name var1 var2 var3 code code code code code returnvalue
Once declared, the function can be called using the following extremely simple syntax:
functionname arg1 arg2 ...
A function name, once declared, cannot be redeclared as another type. However, the same function name can be redeclared with a different body, or, using function names as function pointers, one function can be duplicated to another by assigning one declared function name to another as in the following example:
foo a b + a b bar a b c d c exp a b d / b a + c d foo bar
This looks like a normal assignment, but is in fact a special syntax valid only with function pointers. It can be extended to an arbitrary number of function pointers, and has the effect of duplicating the value (and therefore the function) of each function pointer onto the function pointer preceding it in the list. Functionally, all of these duplication operations are considered to occur simultaneously. Thus, the following statement:
foo bar foo
or the equivalent statement:
bar foo bar
will swap the functions foo and bar, as foo is copied to bar and bar is simultaneously copied to foo. This syntax can be extended to rotate the functions assigned to three, four, five or even more function pointers.
It should be noted when writing an interpreter that this means that whenever a chain of function pointer assignments is encountered, the prior value of each distinct function pointer in the chain must be cached before beginning to execute the statement.
Function pointers can also be assigned to declared objects, as in the following example:
a a foo
The value of the declared object a is now a pointer to the function foo, and the statement:
a 2 3
is equivalent to the function call:
foo 2 3
The declared object a can be a set object containing multiple function pointers, as long as all entities in the list assigned to a are function pointers. In the following example, a closed list of five function pointers to four functions is assigned to the declared object b:
b b foo bar baz frobozz foo
Any statement consisting solely of the declared object b will now cycle the function assignments of the function pointers foo, bar, baz, and frobozz. If, however, the list also contains literal objects or declared objects whose values are not function pointers, then the list will be parsed as a function chain (see below).
Frequently, a programmer needs to execute a series of function calls, passing the results of each to the next. Gender supports this functionality via function chains. A function chain must begin with a function pointer and end with either a literal object or a declared object whose value is defined and is not a function pointer. The chain is evaluated from right to left, each function pointer swallowing the values of all objects to its right as its parameter list. The return value of the function then becomes a virtual parameter to the next function pointer to its left. Consider the following example, using the arbitrary function pointers from the previous example:
b foo a b bar b baz 1 c 2 frobozz 4 bar 6 d
This function chain, written as separate Gender statements, would expand to the following block:
v bar 6 d v frobozz 4 v v baz 1 c 2 v v bar b v b foo a b v
and is equivalent to the following Perl or C code:
b = foo (a, b, bar (b, baz (1, c, 2, frobozz (4, bar (6, d))));
A powerful and unique feature of Gender is that, by omitting the parameters to the last function pointer, the entire chain can be stored in a declared object, as in the following example:
b foo a b bar b baz 1 c 2 frobozz 4 bar
The statement:
b 27 j
is now equivalent to the statement:
b foo a b bar b baz 1 c 2 frobozz 4 bar 27 j
Moreover, these stored chains can be concatenated and chain-evaluated. If the object f contains the chain foo a b bar b baz, and the object g contains the chain baz j k 16 foo, then the statement:
p f m n 12 g m 16 p
is equivalent to the statement:
p foo a b bar b baz m n 12 baz j k 16 foo m 16 p
C programmers will immediately recognize that this functionality is similar to C macros. However, unlike a C macro, a stored function chain can be redefined as often as desired without incurring syntax errors, and can be manipulated and dynamically altered by set operators. It is thus enormously more flexible and powerful than a C macro.
Although functions and operators are in large part interchangeable in Gender, only function pointers can be stored in chains. A stored chain may not contain operators, since any operators present in the declaration of the chain would be resolved at the time of declaration.
It should be noted that Gender requires an extremely complex and sophisticated interpreter. It is not known whether a compiler for Gender can be written, since due to the constant dynamic redeclarations inherent in Gender, any compiled Gender program would in all likelihood have to contain its own compiler, which would therefore have to be written in Gender and would thus in turn have to contain its own compiler...
Implementation of a Gender interpreter is left to the programmer.
No BNF definition of Gender has yet been written. Have at it. Good luck. We will remember your name with honor.
The following are some suggested reference declarations in Gender of some common operators and functions. Some of these need work; I'm quite certain, for example, that there's a better way to define the square and sqrt operators. I just haven't figured out what it is yet.
A B AB concat ABCDEF 2 3 CDE substr 3 3 0 - 3 3 1 / 3 3 6 + 2 3 6 * 3 9 square sqrt ! square 3 3 27 ^ a a - 0 1 1 a ! swap a b c c b a c abs a sqrt square a gt a b a - a b a / a abs a a 1 b 1 b 0 b le ! gt lt a b gt b a ge ! lt
Note the explicit use of the null block in this lt definition.