Programming Style Guidelines
Functional programming is still at an early stage of development and some heterogenity of programming style is therefore inevitable (and desirable). Nevertheless a certain amount is known, and there is no need for every newcomer to functional programming to discover all the pitfalls by trial and error. We give here a series of suggested guidelines for good programming style in Miranda. The list is not meant to be exhaustive.
These rules are also not intended to be followed rigidly in all cases, regardless of conflicting considerations. That is why they are only suggestions for good style and not grammar rules.
Avoid the indiscriminate use of recursion A Miranda script that consists of large number of functions which call each other in an apparently random fashion is no easier to understand than, say, a piece of FORTRAN code which is written as a rat's nest of GOTO statements. An excessive reliance on recursion (especially mutual recursion) can be an indication of a weak programming style. Some pointers:
Use list comprehensions, ..
lists, and library functions, in
preference to ad-hoc recursion. For example it is probably clearer to
define factorial by writing
fac n = product[1..n]
than to define it from first principles, as
fac 0 = 1
fac (n+1) = (n+1) * fac n
and to define the cartesian product of two lists by a list comprehension, thus
cp x y = [(a,b)|a<-x;b<-y]
is certainly a lot clearer than the recursive definition,
cp (a:x) y = f y ++ cp x y
where
f (b:y) = (a,b): f y
f [] = []
cp [] y = []
The standard environment contains a number of useful list processing
functions (eg map filter reverse foldr foldl) with whose properties it
is worth becoming familiar. They capture common patterns of recursion
over lists, and can often be used to simplify your code, and reduce the
reliance on ad-hoc
recursion. Programs using list comprehensions and
standard functions are also likely to run faster (on the current
implementation) than equivalent programs using ad-hoc recursion.
The standard environment is only a basic collection of useful general purpose functions. As you get used to programming in Miranda you will probably begin to discover other useful functions that express common patterns of recursion (perhaps over data structures other than lists). It is a good practice to collect such functions in libraries (together with some explanations of their properties) so that you can reuse them, and share them with others. Not all of them will survive the test of time, but it cannot hurt to experiment.
To cause the definitions from such a library to be in scope in another
script you would use a %include
directive (see manual section on
library directives).
Avoid unnecessary nesting of definitions Scripts that get deeply nested in where-clauses are harder to understand, harder to reason about formally, harder to debug (because functions defined inside where's cannot be exercised seperately) slower to compile, and generally more difficult to work with.
A well structured script will consist of a series of top-level definitions, each of which (if it carries a where-clause at all) has a fairly small number of local definitions. A third level of definition (where inside where) should be used only very occasionally. [And if you find yourself getting nested four and five levels deep in block structure you can be pretty sure that your program has gone badly out of control.]
A function should normally be placed inside a where clause only if it is logically necessary to do so (which will be the case when it has a free variable which is not in scope outside the where clause). If your script consists, of say six functions, one of which solves a problem and the other five of which are auxiliary to it, it is probably not a good style to put the five subsidiary functions inside a where clause of the main one. It is usually better to make all six top level definitions, with the important one written first, say.
There are several reasons for this. First that it makes the program
easier to read, since it consists of six separate chunks of information
rather than one big one. Second that the program is much easier to
debug, because each of its functions can be exercised separately, on
appropriate test data, within a Miranda session. Third that this
program structure is more robust for future development - for example if
we later wish to add a second main
function that solves a different
problem by using the same five auxiliary functions in another way, we
can do so without having to restructure any existing code.
There is a temptation to use where
to hide information that is not
relevant at top-level. This may be misguided (especially if it leads to
code with large and complex where-clauses). If you don't wish all of
your functions or data structures to be "visible" from outside, the
proper way to do this is to include a %export
directive in the script.
Note also that (in the current implementation) functions defined inside a "where" clause cannot have their types explicitly specified. This is a further reason to avoid putting structure inside a where clause that does not logically have to be there.
Specify the types of top level identifiers The Milner type discipline is an impressive advance in compiler technology. It is also a trap for the unwary. The fact that the Miranda compiler will accept several hundred lines of code without a single type specification, and correctly infer the types of all the identifiers does NOT mean that it is sensible to write code with no type information. (Compare: compilers will also accept large programs with no comments in, but that doesn't make such programs sensible.)
For other than fairly small scripts it is good style to insert an explicit specification of the type of any top level identifier whose type is not immediately apparent from its definition. Type specifications look like this
ack::num->num->num
says that ack
is a function taking two numbers and returning a number.
A type specification can occur anywhere in a script, either before or
after the definition of the corresponding identifier, but common sense
suggests that the best place for it is just before the corresponding
definition.
If in doubt it is always better to put in a type specification than to leave it out. The compiler may not need this extra type information but human beings definitely do. The extra type information becomes particularly important when your code reaches the level of complexity at which you start to make type errors.
If your script contains a type error it is unreasonable to expect the
compiler to correctly locate the real source of the error in the absence
of explicit type declarations. A type error means different parts of
your code are inconsistent with one another in their use of identifiers
- if you have not given the compiler any information about the intended
use of an identifier, you cannot expect it to know which of several
conflicting uses are the wrong
ones. In such a case it can only tell
you that something is wrong, and indicate the line on which it first
deduced an inconsistency - which may be many lines later than the real
error. Explicit type declarations make it much more likely that the
compiler will spot the real error
on the line where it actually
occurs.
Code containing explicit type information is also incomparably easier for other people to read.
Use safe layout This is a point to do with the operation of the offside rule. It is most easily explained by means of an example. Consider the following definition, here assumed to be part of a larger script
hippo = (rhino - swan)/piglet
where
piglet = 17
rhino = 63
swan = 29
Some time after writing this we carry out a global edit to expand
hippo' to
hippopotamus`. The definition now looks like this.
hippopotamus = (rhino - swan)/piglet
where
piglet = 17
rhino = 63
swan = 29
the where-clause has become offside, and the definition will no longer compile. Worse, it is possible (with a little ingenuity) to construct examples of layout where changing the length of an identifier will move a definition from one level of scope to another, so that the script still compiles but now has a different meaning!!! Replacing an identifier by a shorter one can cause similar difficulties with layout.
The layout of the hippo
definition was unsafe, because the level of
indentation depended on the length of an identifier. There are several
possible styles of safe
layout. The basic rule to follow is:
Whenever a right hand side goes on for more than one line
(because it consists of a set of guarded cases, or because it
carries a where clause, or just because it is an expression too
big to fit on one line), you should take a newline BEFORE
starting the rhs, and indent by some standard amount (not
depending on the width of the lhs).
There are two main styles of safe layout, depending on whether you take
the newline before or after the =
of the definition. Here are two
possible safe layouts for the hippo
definition
hippo =
(rhino - swan)/piglet
where
piglet = 17
rhino = 63
swan = 29
hippo
= (rhino - swan)/piglet
where
piglet = 17
rhino = 63
swan = 29
The reason that either style can be used is that the boundary, for
offside purposes, of a right hand side, is set by the first symbol of
the rhs itself, and not by the preceding =
sign.
Both of these layouts have the property that the parse cannot be affected by edits which alter the lengths of one or more identifiers. Either of these layout styles also have the advantage that successive levels of indentation can move to the right by a fixed step - this makes code easier to read and lessens the danger that your layout will `fall off' the right hand edge of the screen (although if you follow the advice given earlier about avoiding deeply nested block structure this is in any case unlikely to be a problem).
It would be convenient if there was a program for reformatting Miranda
scripts with a standard layout. Apart from ensuring that the layout was
safe
in the above sense, it might make it easier for people to read
each other's code. A layout program of this kind may be provided in
later releases of the system.
Acknowledgement: The hippopotamus
example (and the problem of unsafe
layout) was first pointed out by Mark Longley of the University of Kent.
Write order independent code When defining functions by pattern matching it is best (except in a few cases where it leads to real clumsiness of expression) to make sure the patterns are mutually exclusive, so it does not matter in what order the cases are written.
For the same reason it is better style to use sets of guards which are
composed of mutually exclusive boolean expressions. The keyword
otherwise
sometimes helps to make this less painful.
By way of illustration of some of the issues here is a good definition
of a function merge
which combines two already sorted lists into a
single sorted result, eliminating duplicates in the process
merge [] y = y
merge (a:x) [] = (a:x)
merge (a:x) (b:y)
= a:merge x (b:y), if a<b
= b:merge (a:x) y, if a>b
= a:merge x y, if a=b
First note the use of mutually exclusive sets of patterns (it was
tempting to write merge x [] = x
as the second case, but the above is
probably better style). Note also that we didn't use otherwise
as the
last guard here because it would have spoiled the symmetry of the three
tests.
A related issue to these is that where a function is not everywhere
defined on its argument type, it is good practice to insert an explicit
error case. For example the definition given in the standard
environment for hd
, the function which extracts the first element of a
list, is
hd (a:x) = a
hd [] = error "hd []"
Of course if a function is applied to an argument for which no equation
has been given, the Miranda system will print an error message anyway,
but one advantage of putting in an explicit call to error
is that the
programmer gets control of the error message. The other (and perhaps
main) advantage is that for someone else reading the script, it
explicitly documents the fact that a certain use of the function is
considered an error.