My Rules for Fortran Programming
Clarity in programming is very important - computers may get faster every year, but my brain doesn't. Therefore, it's very important that code is written so that I (or someone else) can easily understand it. There are many generic rules for writing clear code (e.g. using whitespace to break up large blocks of code, adding relevant comments and splitting code into resuable procedures). In this page, I'm going to go through some rules which are particularly relevant to Fortran. Like most such lists of rules, these are really guidelines, to be disregarded when appropriate.
All scoping units contain IMPLICIT NONE
This is one that I actually regard as a rule; I consider its absence as good evidence that code is buggy.
Using implicit declaration is highly dangerous, since it means that typos get elevated to the status of bugs.
Most compilers automatically initialise new variables to zero (but this is not guaranteed by the standard), and zero is often not an obviously incorrect value - at least not in the way that -10148 is almost always incorrect.
This means that a mis-entered variable name can wreak all sorts of subtle damage to your calculation, if implicit declaration is turned on.
Have by compiler do your proof reading for you, by using IMPLICIT NONE.
Compilers can often be forced to put local variables on the stack (look for options like -stackvar or -auto), which typically means that new variables end up with random values.
These will cause crashes more often, helping you to home in on the problem
This isn't a substitute for IMPLICIT NONE, but it does help you find variables you declared, but forgot to initialise.
Ideally, compilers would initialise new variables to NaN (change as appropriate for each type of variable), but I've never used one that does.
Everything goes into MODULEs
Doing this makes sure that all the interfaces are explicit, and generated for free by the compiler.
There is a slight cost to this, in that you can end up with recompile cascades if a low level MODULE has its internals altered.
This can happen even if the publically accessible portions of the module are unchanged.
I believe that implementations could avoid this, but in practice they don't bother.
However, waiting a few minutes (at most) for a code to recompile is a small price for knowing that all variable types match.
Declare all argument INTENTs
Wouldn't you like to know if you accidentally try to alter the value of the speed of light in a function call?
By declaring an argument INTENT(IN), you can be sure that the compiler will whinge if you try to alter it.
The compiler should also make sure that the variable is defined before being passed to the routine.
However, this will be a `try' rather than guarantee.
I suspect that making a guarantee would involve solving the Halting Problem.
For example: I think that you can pass an INTENT(IN) argument to another procedure with an undefined INTENT.
This might let you modify the variable - or at least determine how it was passed.
Either of these would make the program non-standard, and can be avoided by declaring all INTENTs.
Arguments which are INTENT(OUT) are slightly more subtle.
Not assigning a value to an INTENT(OUT) argument is not an error (although the compiler will probably issue a warning).
For this reason, a compiler can't always be certain if you use an INTENT(OUT) variable before you define it.
It can catch some cases, but any further procedure calls will prevent the general case being caught.
Furthermore, the standard states that such arguments become undefined upon entry to the procedure.
This means that, if you don't modify the argument, you can't rely on the original value being left there for the calling procedure.
Finally, a declaration of INTENT(INOUT) is not the same as an undeclared INTENT.
If an argument is declared INTENT(INOUT) then it must be both defined upon entry and defineable.
You can't pass an undefined variable as an INTENT(INOUT) argument; nor can you pass a literal constant (since you can't change the value of the number 3).
There are no such restrictions if no INTENT is declared.
Use REAL with a KIND specifier
The KIND specifier itself goes in a module (see my Kinds module), which is USEDd everywhere.
I use this instead of declaring variables DOUBLE PRECISION.
This makes it easy to alter the precision of all variables uniformly, without resorting to compiler options.
FUNCTIONs don't have side effects
Nor do they modify their arguments.
When I write code, I like FUNCTIONs to be as close to mathematical functions as possible.
Notionally, the restrictions I have described mean that all my FUNCTIONs should be PURE with INTENT(IN) arguments.
This doesn't quite work in practice, due to the problems of error handling.
I don't consider WRITE(*,*) or STOP to be side effects.
Technically they are, but they aren't unsafe side effects.
However, incorporating them means that I can't declare all my FUNCTIONs as being PURE.
EXITs always go to a named label
Similarly for CYCLE.
It's often not strictly necessary to have a named label (if you're just exiting the innermost loop), but it helps make your meaning plain.
No GOTO, COMMON or EQUIVALENCE
Use of GOTO can often lead to messy code.
Of course, this is not a certainty, and it is possible to imagine situations where a GOTO is the best option.
One suggestion I've seen is for error handling, given the lack of exceptions in Fortran.
COMMON blocks were the normal means of providing global variables prior to Fortran 90.
However, they are a good source of bugs, since each COMMON block must ought to be the same whenever it is used (the fact that they don't have to be identical is a major source of bugs).
With Fortran 90, a much better option became available (namely MODULE variables).
In codes I've seen, EQUIVALENCE is often used to minimise memory footprint by reusing `work' arrays.
Modern computers have memory systems which remove this need.
Of course, COMMON and EQUIVALENCE will still have uses, since they force a particular layout in memory (it's actually this fact which can lead to all sorts of bugs, if you didn't require it).
This might be useful for low level communication with hardware.
To save a variable, declare SAVE explicitly
This helps me make sure I don't trip myself up.
You don't need to SAVE variables if they don't go out of scope, and the SAVE attribute is implicit if you initialise a variable in its declaration.
However, I prefer to list things out, just to be sure.
All modules have an unqualified PRIVATE
Help cut namespace pollution!
This also helps ensure that your MODULEs remain black boxes, and that you don't accidentally zap the wrong variable.
Only things you want to be public should have a PUBLIC declaration.
MODULE and 'file' are synonyms
This helps keep track of where things are.
I don't always follow it, though.
For example, in my implementation of flux viscosity for FLASH, it was natural for all the flux calculation routines to go in one module.
However, there were rather a lot of them, so I ended up using INCLUDE.
Deciding when to do the split is a matter of taste.
All MODULEs can print out their CVS information
Change to your own revision control system as appropriate. This helps ensure that you can always recreate your runs, because you know what version of the code was used.
USE statements go at MODULE level
Quite a bit of personal taste in this.
Since I think that each MODULE should represent a package of functionality, it's reasonable to assume that each routine in that MODULE will call similar parts of other MODULEs.
Hence, why keep listing out the same information?
USE statements are followed by ONLY
Again, this helps cut down on pollution of the namespace.
It's not always appropriate.
For example, my Kinds module is always going to be used in its entirety, so there's no call for an ONLY clause.
Also, if you have a large MODULE and frequently use a common portion of it, then that subset should probably go in its own MODULE.
Other thoughts
All this said, if I'm given a working code, I'm not going to rewrite it simply to make it conform to these rules. I'll just make sure that any additions I write follow the outline above, possibly with a few extra bridging procedures, to get into my new portions.
Some coding styles I've seen suggest avoiding the array syntax introduced in Fortran 90, for performance reasons.
FORALL is also counselled against.
The reason for this is that in Fortran, the right hand side of expressions is evaluated before the left hand side is examined (at least notionally).
This can cause the compiler to allocate temporary arrays, if there is the potential for dependencies (if the same array appears on both sides of the expression).
While I accept that writing whole array expressions can lead to poor performance, I don't regard this as a reason to avoid them in general.
There are lots of simple where the compiler should be able to see there are no dependency problems - if a(1:n) = b(1:n) takes a performance hit relative to the equivalent loop, I'll be demanding a new compiler!
Furthermore, clarity of code should be paramount.
If writing array expressions helps make the code clearer, then do it.
Later, you can profile the code, and find out if a particular expression is causing a bottleneck.
Finally, I also prefer to avoid preprocessors.
They can make debugging difficult, as you try to figure out what code the compiler actually saw.
You can really have some fun and games when you have a Fortran 77 code (where leading whitespace is rather important) which is trying to use a C preprocessor (which doesn't care about whitespace).
I try to avoid INCLUDE for similar reasons (but note my comments above).
I think that USEing MODULEs is normally a much better solution.