Programming Style Guidelines

Functional programming is still at an early stage of development and some heterogenity of programming style is therefore inevitable (and desirable). Nevertheless a certain amount is known, and there is no need for every newcomer to functional programming to discover all the pitfalls by trial and error. We give here a series of suggested guidelines for good programming style in Miranda. The list is not meant to be exhaustive.

These rules are also not intended to be followed rigidly in all cases, regardless of conflicting considerations. That is why they are only suggestions for good style and not grammar rules.

Avoid the indiscriminate use of recursion A Miranda script that consists of large number of functions which call each other in an apparently random fashion is no easier to understand than, say, a piece of FORTRAN code which is written as a rat's nest of GOTO statements. An excessive reliance on recursion (especially mutual recursion) can be an indication of a weak programming style. Some pointers:

Use list comprehensions, .. lists, and library functions, in preference to ad-hoc recursion. For example it is probably clearer to define factorial by writing

    fac n = product[1..n]

than to define it from first principles, as

    fac 0 = 1
    fac (n+1) = (n+1) * fac n

and to define the cartesian product of two lists by a list comprehension, thus

    cp x y = [(a,b)|a<-x;b<-y]

is certainly a lot clearer than the recursive definition,

    cp (a:x) y = f y ++ cp x y
                    where
                    f (b:y) = (a,b): f y
                    f [] = []
    cp [] y = []

The standard environment contains a number of useful list processing functions (eg map filter reverse foldr foldl) with whose properties it is worth becoming familiar. They capture common patterns of recursion over lists, and can often be used to simplify your code, and reduce the reliance on ad-hoc recursion. Programs using list comprehensions and standard functions are also likely to run faster (on the current implementation) than equivalent programs using ad-hoc recursion.

The standard environment is only a basic collection of useful general purpose functions. As you get used to programming in Miranda you will probably begin to discover other useful functions that express common patterns of recursion (perhaps over data structures other than lists). It is a good practice to collect such functions in libraries (together with some explanations of their properties) so that you can reuse them, and share them with others. Not all of them will survive the test of time, but it cannot hurt to experiment.

To cause the definitions from such a library to be in scope in another script you would use a %include directive (see manual section on library directives).

Avoid unnecessary nesting of definitions Scripts that get deeply nested in where-clauses are harder to understand, harder to reason about formally, harder to debug (because functions defined inside where's cannot be exercised seperately) slower to compile, and generally more difficult to work with.

A well structured script will consist of a series of top-level definitions, each of which (if it carries a where-clause at all) has a fairly small number of local definitions. A third level of definition (where inside where) should be used only very occasionally. [And if you find yourself getting nested four and five levels deep in block structure you can be pretty sure that your program has gone badly out of control.]

A function should normally be placed inside a where clause only if it is logically necessary to do so (which will be the case when it has a free variable which is not in scope outside the where clause). If your script consists, of say six functions, one of which solves a problem and the other five of which are auxiliary to it, it is probably not a good style to put the five subsidiary functions inside a where clause of the main one. It is usually better to make all six top level definitions, with the important one written first, say.

There are several reasons for this. First that it makes the program easier to read, since it consists of six separate chunks of information rather than one big one. Second that the program is much easier to debug, because each of its functions can be exercised separately, on appropriate test data, within a Miranda session. Third that this program structure is more robust for future development - for example if we later wish to add a second main function that solves a different problem by using the same five auxiliary functions in another way, we can do so without having to restructure any existing code.

There is a temptation to use where to hide information that is not relevant at top-level. This may be misguided (especially if it leads to code with large and complex where-clauses). If you don't wish all of your functions or data structures to be "visible" from outside, the proper way to do this is to include a %export directive in the script.

Note also that (in the current implementation) functions defined inside a "where" clause cannot have their types explicitly specified. This is a further reason to avoid putting structure inside a where clause that does not logically have to be there.

Specify the types of top level identifiers The Milner type discipline is an impressive advance in compiler technology. It is also a trap for the unwary. The fact that the Miranda compiler will accept several hundred lines of code without a single type specification, and correctly infer the types of all the identifiers does NOT mean that it is sensible to write code with no type information. (Compare: compilers will also accept large programs with no comments in, but that doesn't make such programs sensible.)

For other than fairly small scripts it is good style to insert an explicit specification of the type of any top level identifier whose type is not immediately apparent from its definition. Type specifications look like this

    ack::num->num->num

says that ack is a function taking two numbers and returning a number. A type specification can occur anywhere in a script, either before or after the definition of the corresponding identifier, but common sense suggests that the best place for it is just before the corresponding definition.

If in doubt it is always better to put in a type specification than to leave it out. The compiler may not need this extra type information but human beings definitely do. The extra type information becomes particularly important when your code reaches the level of complexity at which you start to make type errors.

If your script contains a type error it is unreasonable to expect the compiler to correctly locate the real source of the error in the absence of explicit type declarations. A type error means different parts of your code are inconsistent with one another in their use of identifiers - if you have not given the compiler any information about the intended use of an identifier, you cannot expect it to know which of several conflicting uses are the wrong ones. In such a case it can only tell you that something is wrong, and indicate the line on which it first deduced an inconsistency - which may be many lines later than the real error. Explicit type declarations make it much more likely that the compiler will spot the real error on the line where it actually occurs.

Code containing explicit type information is also incomparably easier for other people to read.

Use safe layout This is a point to do with the operation of the offside rule. It is most easily explained by means of an example. Consider the following definition, here assumed to be part of a larger script

    hippo = (rhino - swan)/piglet
            where
            piglet = 17
            rhino = 63
            swan = 29

Some time after writing this we carry out a global edit to expand hippo' tohippopotamus`. The definition now looks like this.

    hippopotamus = (rhino - swan)/piglet
            where
            piglet = 17
            rhino = 63
            swan = 29

the where-clause has become offside, and the definition will no longer compile. Worse, it is possible (with a little ingenuity) to construct examples of layout where changing the length of an identifier will move a definition from one level of scope to another, so that the script still compiles but now has a different meaning!!! Replacing an identifier by a shorter one can cause similar difficulties with layout.

The layout of the hippo definition was unsafe, because the level of indentation depended on the length of an identifier. There are several possible styles of safe layout. The basic rule to follow is:

    Whenever a right hand side goes on for more than one line
    (because it consists of a set of guarded  cases, or because it
    carries a where clause, or just because it is an expression too
    big to fit on one line), you should take a newline BEFORE
    starting the rhs, and indent by some standard amount (not
    depending on the width of the lhs).

There are two main styles of safe layout, depending on whether you take the newline before or after the = of the definition. Here are two possible safe layouts for the hippo definition

    hippo = 
    (rhino - swan)/piglet
    where
    piglet = 17
    rhino = 63
    swan = 29

    hippo 
    = (rhino - swan)/piglet
    where
    piglet = 17
    rhino = 63
    swan = 29

The reason that either style can be used is that the boundary, for offside purposes, of a right hand side, is set by the first symbol of the rhs itself, and not by the preceding = sign.

Both of these layouts have the property that the parse cannot be affected by edits which alter the lengths of one or more identifiers. Either of these layout styles also have the advantage that successive levels of indentation can move to the right by a fixed step - this makes code easier to read and lessens the danger that your layout will `fall off' the right hand edge of the screen (although if you follow the advice given earlier about avoiding deeply nested block structure this is in any case unlikely to be a problem).

It would be convenient if there was a program for reformatting Miranda scripts with a standard layout. Apart from ensuring that the layout was safe in the above sense, it might make it easier for people to read each other's code. A layout program of this kind may be provided in later releases of the system.

Acknowledgement: The hippopotamus example (and the problem of unsafe layout) was first pointed out by Mark Longley of the University of Kent.

Write order independent code When defining functions by pattern matching it is best (except in a few cases where it leads to real clumsiness of expression) to make sure the patterns are mutually exclusive, so it does not matter in what order the cases are written.

For the same reason it is better style to use sets of guards which are composed of mutually exclusive boolean expressions. The keyword otherwise sometimes helps to make this less painful.

By way of illustration of some of the issues here is a good definition of a function merge which combines two already sorted lists into a single sorted result, eliminating duplicates in the process

    merge [] y = y
    merge (a:x) [] = (a:x)
    merge (a:x) (b:y)
    =  a:merge x (b:y), if a<b
    =  b:merge (a:x) y, if a>b
    =  a:merge x y, if a=b

First note the use of mutually exclusive sets of patterns (it was tempting to write merge x [] = x as the second case, but the above is probably better style). Note also that we didn't use otherwise as the last guard here because it would have spoiled the symmetry of the three tests.

A related issue to these is that where a function is not everywhere defined on its argument type, it is good practice to insert an explicit error case. For example the definition given in the standard environment for hd, the function which extracts the first element of a list, is

    hd (a:x) = a
    hd [] = error "hd []"

Of course if a function is applied to an argument for which no equation has been given, the Miranda system will print an error message anyway, but one advantage of putting in an explicit call to error is that the programmer gets control of the error message. The other (and perhaps main) advantage is that for someone else reading the script, it explicitly documents the fact that a certain use of the function is considered an error.