Portability

Writing Portable Programs · Translation-Time Issues · Character-Set Issues · Representation Issues · Expression-Evaluation Issues · Library Issues · Converting to Standard C · Function-Call Issues · Preprocessing Issues · Library Issues · Quiet Changes · Newer Dialects

A portable program is one that you can move with little or no extra investment of effort to a computer that differs from the one on which you originally developed the program. Writing a program in Standard C does not guarantee that it will be portable. You must be aware of the aspects of the program that can vary among implementations. You can then write the program so that it does not depend critically on implementation-specific aspects.

This document describes what you must be aware of when writing a portable program. It also tells you what to look for when you alter programs written in older dialects of C so that they behave properly under a Standard C implementation. It briefly summarizes the features added with Amendment 1 to the C Standard. And it suggests ways to write C code that is also valid as C++ code.

Writing Portable Programs

Although the language definition specifies most aspects of Standard C, it intentionally leaves some aspects unspecified. The language definition also permits other aspects to vary among implementations. If the program depends on behavior that is not fully specified or that can vary among implementations, then there is a good chance that you will need to alter the program when you move it to another computer.

This section identifies issues that affect portability, such as how the translator interprets the program and how the target environment represents files. The list of issues is not complete, but it does include the common issues that you confront when you write a portable program.

An implementation of Standard C must include a document that describes any behavior that is implementation defined. You should read this document to be aware of those aspects that can vary, to be alert to behavior that can be peculiar to a particular implementation, and to take advantage of special features in programs that need not be portable.

Translation-Time Issues

A program can depend on peculiar properties of the translator.

The filenames acceptable to an include directive can vary considerably among implementations. If you use filenames that consist of other than six letters (of a single case), followed by a dot (.), followed by a single letter, then an implementation can find the name unacceptable. Each implementation defines the filenames that you can create.

How preprocessing uses a filename to locate a file can also vary. Each implementation defines where you must place files that you want to include with an include directive.

If you write two or more of the operators ## within a macro definition, the order in which preprocessing concatenates tokens can vary. If any order produces an invalid preprocessing token as an intermediate result, the program can misbehave when you move it.

A translator can limit the size and complexity of a program that it can translate. Such limits can also depend on the environment in which the translator executes. Thus, no translation unit you write can assuredly survive all Standard C translators. Obey the following individual limits, however, to ensure the highest probability of success:

Nest statements -- such as if and while statements -- no more than fifteen levels deep. The braces surrounding a block add a level of nesting.
Nest conditional directives -- such as if and ifdef directives -- no more than eight levels deep.
Add no more than twelve decorations -- to derive pointer, array, and function types -- to a declarator.
Write no more than 31 nested pairs of parentheses in a declarator.
Write no more than 32 nested pairs of parentheses within an expression.
Ensure that all distinct names differ in their first 31 characters. Also ensure that all characters match for names that the translator should treat as the same.
Ensure that all distinct names with external linkage differ in the first six characters, even if the translator converts all letters to a single case. Also ensure that all characters match for such names that the translator should treat as the same.
Write no more than 511 distinct names with external linkage within a translation unit.
Write no more than 127 distinct names in block-level declarations that share a single name space.
Define no more than 1,024 distinct names as macros at any point within a translation unit.
Write no more than 31 parameters in a function decoration.
Write no more than 31 arguments in a function call.
Write no more than 31 parameters in a macro definition.
Write no more than 31 arguments in a macro invocation.
Write no logical source line that exceeds 509 characters.
Construct no string literal that contains more than 509 characters or wide characters.
Declare no object whose size exceeds 32,767 bytes.
Ensure that include directives nest no more than eight files deep.
Write no more than 257 case labels for any one switch statement. (Case labels within nested switch statements do not affect this limit.)
Write no more than 127 members in any one structure or union.
Write no more than 127 enumeration constants in any one enumeration.
Nest structure or union definitions no more than fifteen deep in any one list of member declarations.

Character-Set Issues

The program can depend on peculiar properties of the character set.

If you write in the source files any characters not in the basic C character set, a corresponding character might not be in another character set, or the corresponding character might not be what you want. The set of characters is defined for each implementation.

Similarly, if the program makes special use of characters not in the basic C character set when it executes, you might get different behavior when you move the program.

If you write a character constant that specifies more than one character, such as 'ab', the result might change when you move the program. Each implementation defines what values it assigns such character constants.

If the program depends on a particular value for one or more character codes, it can behave differently on an implementation with a different character set. The codes associated with each character are implementation defined.

Representation Issues

The program can depend on how an implementation represents objects. All representations are implementation defined.

If the program depends on the representation of an object type (such as its size in bits or whether type char or the plain bitfield types can represent negative values), the program can change behavior when you move it.

If you treat an arithmetic object that has more than one byte as an array of characters, you must be aware that the order of significant bytes can vary among implementations. You cannot write an integer or floating-point type object to a binary stream on one implementation, then later read those bytes into an object of the same type on a different implementation, and portably obtain the same stored value.

The method of encoding integer and floating-point values can vary widely. For signed integer types, negative values have several popular encodings. Floating-point types have numerous popular encodings. This means that, except for the minimum guaranteed range of values for each type, the range of values can vary widely.

Both signed integer and floating-point types can have values that represent an exceptional result on some implementations. Performing an arithmetic operation or a comparison on such a value can report a signal or otherwise terminate execution. Initialize all such objects before accessing them -- and avoid overflow, underflow, or zero divide -- to avoid exceptional results.

The alignment requirements of various object types can vary widely. The placement and size of holes in structures is implementation defined. You can portably determine the offset of a given member from the beginning of a structure, but only by using the offsetof macro.

Each implementation defines how bitfields pack into integer objects and whether bitfields can straddle two or more underlying objects. You can declare bitfields of 16 bits or less in all implementations.

How an implementation represents enumeration types can vary. You can be certain that all enumeration constants can be represented as type int.

Expression-Evaluation Issues

The program can depend on how an implementation evaluates expressions.

The order in which the program evaluates subexpressions can vary widely, subject to the limits imposed by the sequence points within and between expressions. Therefore, the timing and order of side effects can vary between any two sequence points. A common error is to depend on a particular order for the evaluation of argument expressions on a function call. Any order is permissible.

Whether you can usefully type cast a pointer value to an integer value or type cast a nonzero integer value to a pointer value depends on the implementation. Each implementation defines how it converts between scalar types.

If the quotient of an integer division is negative, the sign of a nonzero remainder can be either positive or negative. The result is implementation defined. Use the div and ldiv functions for consistent behavior across implementations.

When the program right shifts a negative integer value, different implementations can define different results. To get consistent results across implementations, you can right shift only positive (or unsigned) integer values.

When the program converts a long double value to another floating-point type, or a double to a float, it can round the result to either a nearby higher or a nearby lower representation of the original value. Each implementation defines how such conversions behave.

When the program accesses or stores a value in a volatile object, each implementation defines the number and nature of the accesses and stores. Three possibilities exist:

multiple accesses to different bytes
multiple accesses to the same byte
no accesses at all

You cannot write a program that assuredly produces the same pattern of accesses across multiple implementations.

The expansion of the null pointer constant macro NULL can be any of 0, 0L, or (void *)0. The program should not depend on a particular choice. You should not assign NULL to a pointer to a function, and you should not use NULL as an argument to a function call that has no type information for the corresponding parameter.

The actual integer types corresponding to the type definitions ptrdiff_t, size_t, and wchar_t can vary. Use the type definitions.

Library Issues

The behavior of the Standard C library can vary.

What happens to the file-position indicator for a text stream immediately after a successful call to ungetc is not defined. Avoid mixing file-positioning operations with calls to this function.

When the function bsearch can match either of two equal elements of an array, different implementations can return different matches.

When the function qsort sorts an array containing two elements that compare equal, different implementations can leave the elements in different order.

Whether or not floating-point underflow causes the value ERANGE to be stored in errno (as the result of a range error) can vary. Each implementation defines how it handles floating-point underflow.

What library functions store values in errno varies considerably. To determine whether the function of interest reported an error, you must store the value zero in errno before you call a library function and then test the stored value before you call another library function.

You can do very little with signals in a portable program. A target environment can elect not to report signals. If it does report signals, any handler you write for an asynchronous signal can only:

make a successful call to signal for that particular signal
alter the value stored in an object of type volatile sig_atomic_t
return control to its caller

Asynchronous signals can disrupt proper operation of the library. Avoid using signals, or tailor how you use them to each target environment.

Scan functions can give special meaning to a minus (-) that is not the first or the last character of a scan set. The behavior is implementation defined. Write this character only first or last in a scan set.

If you allocate an object of zero size by calling one of the functions calloc, malloc, or realloc, the behavior is implementation defined. Avoid such calls.

If you call the function exit with a status argument value other than zero (for successful termination), EXIT_FAILURE, or EXIT_SUCCESS, the behavior is implementation defined. Use only these values to report status.

Converting to Standard C

If you have a program written in an earlier dialect of C that you want to convert to Standard C, be aware of all the portability issues described earlier in this document. You must also be aware of issues peculiar to earlier dialects of C. Standard C tries to codify existing practice wherever possible, but existing practice varied in certain areas. This section discusses the major areas to address when moving an older C program to a Standard C environment.

Function-Call Issues

In earlier dialects of C, you cannot write a function prototype. Function types do not have argument information, and function calls occur in the absence of any argument information. Many implementations let you call any function with a varying number of arguments.

You can directly address many of the potential difficulties in converting a program to Standard C by writing function prototypes for all functions. Declare functions with external linkage that you use in more than one file in a separate file, and then include that file in all source files that call or define the functions.

The translator will check that function calls and function definitions are consistent with the function prototypes that you write. It will emit a diagnostic if you call a function with an incorrect number of arguments. It will emit a diagnostic if you call a function with an argument expression that is not assignment compatible with the corresponding function parameter. It will convert an argument expression that is assignment compatible but that does not have the same type as the corresponding function parameter.

Older C programs often rely on argument values of different types having the same representation on a given implementation. By providing function prototypes, you can ensure that the translator will diagnose, or quietly correct, any function calls for which the representation of an argument value is not always acceptable.

For functions intended to accept a varying number of arguments, different implementations provide different methods of accessing the unnamed arguments. When you identify such a function, declare it with the ellipsis notation, such as int f(int x, ...). Within the function, use the macros defined in <stdarg.h> to replace the existing method for accessing unnamed arguments.

Preprocessing Issues

Perhaps the greatest variation in dialects among earlier implementations of C occurs in preprocessing. If the program defines macros that perform only simple substitutions of preprocessing tokens, then you can expect few problems. Otherwise, be wary of variations in several areas.

Some earlier dialects expand macro arguments after substitution, rather than before. This can lead to differences in how a macro expands when you write other macro invocations within its arguments.

Some earlier dialects do not rescan the replacement token sequence after substitution. Macros that expand to macro invocations work differently, depending on whether the rescan occurs.

Dialects that rescan the replacement token sequence work differently, depending on whether a macro that expands to a macro invocation can involve preprocessing tokens in the text following the macro invocation.

The handling of a macro name during an expansion of its invocation varies considerably.

Some dialects permit empty argument sequences in a macro invocation. Standard C does not always permit empty arguments.

The concatenation of tokens with the operator ## is new with Standard C. It replaces several earlier methods.

The creation of string literals with the operator # is new with Standard C. It replaces the practice in some earlier dialects of substituting macro parameter names that you write within string literals in macro definitions.

Library Issues

The Standard C library is largely a superset of existing libraries. Some conversion problems, however, can occur.

Many earlier implementations offer an additional set of input/output functions with names such as close, creat, lseek, open, read, and write. You must replace calls to these functions with calls to other functions defined in <stdio.h>.

Standard C has several minor changes in the behavior of library functions, compared with popular earlier dialects. These changes generally occur in areas where practice also varied.

Quiet Changes

Most differences between Standard C and earlier dialects of C cause a Standard C translator to emit a diagnostic when it encounters a program written in the earlier dialect of C. Some changes, unfortunately, require no diagnostic. What was a valid program in the earlier dialect is also a valid program in Standard C, but with different meaning.

While these quiet changes are few in number and generally subtle, you need to be aware of them. They occasionally give rise to unexpected behavior in a program that you convert to Standard C. The principal quiet changes are discussed below.

Trigraphs do not occur in earlier dialects of C. An older program that happens to contain a sequence of two question marks (??) can change meaning in a variety of ways.

Some earlier dialects effectively promote any declaration you write that has external linkage to file level. Standard C keeps such declarations at block level.

Earlier dialects of C let you use the digits 8 and 9 in an octal escape sequence, such as in the string literal "\08". Standard C treats this as a string literal with two characters (plus the terminating null character).

Hexadecimal escape sequences, such as \xff, and the escape sequence \a are new with Standard C. In certain earlier implementations, they may have different meaning.

Some earlier dialects guarantee that identical string literals share common storage, and others guarantee that they do not. Some dialects let you alter the values stored in string literals. You cannot be certain that identical string literals overlap in Standard C, or that they do not. Do not alter the values stored in string literals in Standard C.

Some earlier dialects have different rules for promoting the types unsigned char, unsigned short, and unsigned bitfields. On most implementations, the difference is detectable only on a few expressions where a negative value becomes a large positive value of unsigned type. Add type casts to specify the types you require.

Earlier dialects convert lvalue expressions of type float to double, in a value context, so all floating-point arithmetic occurs only in type double. A program that depends on this implicit increase in precision can behave differently in a Standard C environment. Add type casts if you need the extra precision.

On some earlier dialects of C, shifting an int or unsigned int value left or right by a long or unsigned long value first converts the value to be shifted to the type of the shift count. In Standard C, the type of the shift count has no such effect. Use a type cast if you need this behavior.

Some earlier dialects guarantee that the if directive performs arithmetic to the same precision as the target environment. (You can write an if directive that reveals properties of the target environment.) Standard C makes no such guarantee. Use the macros defined in <float.h> and <limits.h> to test properties of the target environment.

Earlier dialects vary considerably in the grouping of values within an object initializer, when you omit some (but not all) of the braces within the initializer. Supply all braces for maximum clarity.

Earlier dialects convert the expression in any switch statement to type int. Standard C also performs comparisons within a switch statement in other integer types. A case label expression that relies on being truncated when converted to int, in an earlier dialect, can behave differently in a Standard C environment.

Some earlier preprocessing expands parameter names within string literals or character constants that you write within a macro definition. Standard C does not. Use the string literal creation operator #, along with string-literal concatenation, to replace this method.

Some earlier preprocessing concatenates preprocessor tokens separated only by a comment within a macro definition. Standard C does not. Use the token concatenation operator ## to replace this method.

Newer Dialects

Making standards for programming languages is an on-going activity. As of this writing, the C Standard has been formally amended. A standard for C++, which is closely related to C, is in the late stages of development. One aspect of portability is writing code that is compatible with these newer dialects, whether or not the code makes use of the newer features.

Most of the features added with Amendment 1 are declared or defined in three new headers -- <iso646.h>, <wchar.h>, and <wctype.h>. A few take the form of capabilities added to the functions declared in <stdio.h>. While not strictly necessary, it is best to avoid using any of the names declared or defined in these new headers.

Maintaining compatibility with C++ takes considerably more work. It can be useful, however, to write in a common dialect called typesafe C Here is a brief summary of the added constraints:

Avoid using any C++ keywords. As of this writing, the list includes:

and         and_eq      asm         bitand      bitor
bool        catch       class       compl       delete
explicit    false       friend      inline      mutable
namespace   new         not         not_eq      operator
or          or_eq       private     protected   public
template    this        throw       true        try
typeid      typename    using       virtual     wchar_t
xor         xor_eq      const_cast  dynamic_cast
reinterpret_cast        static_cast

Write function prototypes for all functions you call.

Define each tag name also as a type, as in:

typedef struct x x;

Assume each enumeration type is a distinct type that promotes to an integer type. Type cast an integer expression that you assign to an object of enumeration type.

Write an explicit storage class for each constant object declaration at file level.

Do not write tentative declarations.

Do not apply the sizeof operator to an rvalue operand.

See also the Table of Contents and the Index.