Pretty please: no more lower-case macros !!!

Adrian Bocaniciu a.bocaniciu at computer.org
Thu Jan 6 01:00:50 PST 2005


	As I have already mentioned in a follow-up to my recent bug report, the 
essential ports net/quagga and net/zebra have been totally broken by the 
use of lower-case macros in the DragonFly system headers.

	There is an old C tradition that the identifiers used in macro 
definitions must be written entirely with capital letters and that no 
other identifiers should be written like that.  This tradition has not 
been caused by esthetic reasons but by the need to separate the name 
spaces for macros and for other identifiers.

	The way in which preprocessing is implemented in C dictates that this 
name space separation is mandatory, otherwise it is not possible to 
guarantee the correctness of any C program.  C is neither Lisp nor 
Scheme nor a language that uses a special marker, e.g. $, for macro 
expansions, where the name space overlap does not cause any troubles.

	When you have more than 10,000 ports, written by more than 100,000 
programmers, one can be certain that almost every possible alphanumeric 
string, which is not too long, has already been used by someone, 
somewhere, as the name of a local variable.  Therefore, it is truly 
naive to believe that if a lower-case macro definition is inserted in a 
system header used by all these ports, then no name collision will happen.

	The following pathological example, which is suitable for a C textbook, 
is not made up, it is extracted from the current DragonFly sources :-)

	Let's say that programmer Unlucky, which contributes code to the port 
#13013, writes the following code fragment:

. ..
#include <some_system_header.h>
. ..
{
. ..
struct s1 var1;
struct s2 var2;
struct s3 var3;
. ..
memset(& var2, MAGIC_VALUE, sizeof var2);
. ..
}
. ..
	After testing his program, Unlucky declares proudly that his code is 
foolproof.  A thousand auditors read his source code and they also 
conclude that the code is foolproof.

	Then comes DragonFly and the following macro definition is inserted in 
some_system_header.h:

#define	  var2	  foo[0]

	When the port is recompiled, the compiler will allocate for var2 an 
array of null length, whose address will be thus the same with that of 
one of the adjacent variables.  However, in the sizeof context, foo[0] 
will be an array component of type struct s2, therefore the compiler 
will translate the memset invocation as it would have been:

memset(& var1, MAGIC_VALUE, sizeof(struct s2));

	Thus, unexpectedly, var1 will be destroyed, resulting in more or less 
spectacular crashes.  Moreover, if struct s2 is large enough, the memset 
invocation will also destroy other parts of the stack, including the 
return address, leading to even more spectacular crashes.  Even more 
perverted examples can be easily constructed.

	One solution to avoid the name collisions would be to precede each 
program block with "#undef" directives for all its local identifiers. 
It is obvious that such a stupid solution is not acceptable.  The 
traditional convention about the use of capital letters to enforce the 
name space separation is much more convenient.

	In the case of net/route.h, where I first encountered such macros, they 
were used to provide alternate names for some structure members.  That 
can be done in a perfectly safe way without macros, by using anonymous 
unions.  Even if, for some very stupid reason (as the named unions were 
really a mistake in the original C design) the anonymous unions have not 
made their way yet into the official C standard, gcc and most other C 
compilers support anonymous unions also in C, not only in C++.

	Best regards !





More information about the Bugs mailing list