C Programming for C++ Programmers

Although C++ has largely supplanted C, the C language has not disappeared. Systems are still developed for C, both because C provides more direct control of the hardware and because C compilers exist for more platforms. Additionally, many supposed C++ systems use C features extensively, either because the system was originally developed in C years ago, or because the programmers have never fully switched over.

So C is a worthwhile language to learn professionally for two reasons. First, understanding C means you can work with those many systems out there that use it. But also, understanding C programming will strengthen you as a C++ programmer, as C programs exercise some features of C++ that C++ programs typically neglect.

Almost any C program can work as a C++ program. So if you are a C++ programmer, learning C is more a matter of learning what you cannot do than what you can do.

This handout describes some of the more important differences. (There are many tiny differences that we do not discuss.) In the first of the two sections, we discuss genuine differences between C and C++ --- what features C++ added and are not available in C. The second section discusses techniques essential to C programming, but which are often neglected in C++ programming due to more convenient alternatives provided by C++.

Restrictions within C

C++ adds many features that are completely unavailable in C. The primary addition in C++ was object-oriented programming. Of course this includes the syntax for defining and using classes and objects, but it also includes all the library object classes, like string and I/O streams (including cin and cout). But C++ also added several innovations unrelated to objects; these include reference parameters, the new/delete constructs for dynamic memory, operator overloading, and constants.

Comments

C++ introduced the // commenting construct; it is not available in C. All C comments begin with /* and proceed to the closing */.

/* This is a comment. Notice that
 * it proceeds over two lines with no problem.
 * And even onto a third. */

Variable declarations

In C programs, local variables for a function must be declared at the beginning of a function. Consider the following program to find the value of an environment variable named in the parameter.

char *findenv(char *name, char **envp) {
  int i, j;
  for(i = 0; envp[i] != NULL; i++) {
    j = 0;
    while(name[j] == envp[i][j]) ++j;
    if(name[j] == '\0' && envp[i][j] == '=') {
      return envp[i] + (j + 1);
    }
  }
  return NULL;
}

In a C++ program, we could declare the j variable within the loop. But in C, this should be done at the beginning of the function definition as above, before any statements occur.

Structures

C still includes structures (as in C++). In fact, they are more important in C because of the lack of classes. But in C, the type name must include the keyword struct. For example, the stat() system call wants a pointer to a stat structure as a parameter. To declare a variable named data which is an instance of that stat structure, you would type the following.

struct stat data;

In C++, you can omit the struct keyword and just type ``stat data;'', but in C the keyword is required. (C programmers often work around this using the typedef construct to alias a new type name to the structure type.)

(The stat() system call takes a pointer as its parameter because C does not have call-by-reference. Thus we pass a pointer so that stat() can modify the structure by modifying the values the pointer points to. (Additionally, it's faster to pass a pointer instead of copying the entire structure to be passed to stat().))

Function definitions

In the original C definition, a function declaration proceeded as follows.

int main(argc, argv, envp)
  int argc;
  char **argv;
  char **envp;
{
  /* body of function */
}

That is, the parameters are specified by listing their names, and then listing the types of the parameters outside the parentheses. This style of declaring functions is called the K&R technique, after the writers of the original C textbook, by Kernighan and Ritchie.

This technique is frowned upon, and later C designers added the more familiar function-definition technique.

int main(int argc, char **argv, char **envp) {
  /* body of function */
}

This is the ANSI technique, named after the standardization committee who propagated this alternative.

Use the ANSI technique. But you should be familiar with K&R technique, because many systems (including Minix) still employ it, on the off-chance that some day somebody might try to compile it with a fifteen-year-old compiler. This is a legitimate concern for widely distributed code like Minix, but not for most of us. So stick with ANSI in this class.

Features neglected in C++

In many instances, C++ added new features as preferred alternatives to techniques used in C. These techniques are frequently neglected in studying C++, but you need to know about them to use C.

Macros

C allows the user to define a macro. Formally, this is not part of the C compiler, but part of the C preprocessor that processes the C source code to generate the C program passed onto the compiler.

The most common use of macros is to define a constant. For example, if you wanted to define a BUF_LEN constant, you would include the following.

#define BUF_LEN 1024

All subsequent uses of BUF_LEN in the source code will be automatically replaced by 1024, so that the compiler will never see the word BUF_LEN.

Macros can be considerably more powerful, with the use of parameters. The following defines a SQR macro that takes a parameter.

#define SQR(X) ((X) * (X))

int dist(int x, int y) {
  return SQR(x - y);
}

You can name your parameter as you like (X in this case). If you have multiple parameters, you can separate them by commas.

Given this code, the C preprocessor would generate the following.

int dist(int x, int y) {
  return ((x - y) * (x - y));
}

This is the code that would actually be compiled. Notice that this code indicates that two subtractions will occur, followed by a multiplication (although a good optimizer might observe and remove the redundant subtraction). If defined a sqr() function instead, the code would indicate a single subtraction, but with the added overhead of a function call and the resulting call stack manipulations (although, again, a good optimizer would remove these).

Notice that the definition of SQR included some seemingly redundant parentheses. This is to get around the fact that macro arguments are simply textually substituted in place of macro parameters. Consider the following.

#define SQR(X)    X * X

If we used this, the preprocessor would textually substitute ``x - y'' for each occurrence of X in the macro value.

int dist(int x, int y) {
  return x - y * x - y;
}

This isn't what was intended, and it's not an easy bug to detect. Usually you should be generous with parentheses around macro parameters to avoid problems. Better yet, use functions.

Strings

C does not provide an explicit type for strings; it provides for strings through the facility of arrays of characters. The characters are listed in the order they occur within the string, followed by a NUL character (represented by '\0' in C) to indicate the string's end. The earlier findenv() function illustrates one function that uses C strings.

Because it's a pain to keep going through arrays every time you want to work with a string (as findenv() does), C provides a variety of convenience functions for working with strings, with prototypes in <string.h>. For example, the strlen() function takes a single argument (a pointer to the first character of the array) and returns an integer telling how many things are in the array. A strcpy() function takes two strings and copies from the second string into the first string. There are many more; the Unix man page for strlen explains several of them.

The strdup() function illustrates the use of strlen() and strcpy().

Memory allocation

C provides memory allocation via two library functions named malloc() and free(). The malloc() function takes as its parameter the number of bytes needed and returns a pointer to the first byte of the allocated space (or NULL should the memory be unavailable). The free() function takes as its parameter a pointer to the first byte of the allocated space and returns nothing after freeing the memory for future allocations.

The following function illustrates memory allocation using malloc It takes a string as a parameter, allocates a new, identical string, and returns a pointer to its first chararacter.

char *strdup(char *str) {
  char *ret = (char*) malloc(sizeof(char) * (strlen(str) + 1));
  strcpy(ret, str);
  return ret;
}

For the parameter to malloc(), we multiply the number of bytes per character (computed using the sizeof operator within C) by the number of characters needed (the number of characters in the string, plus 1 for the NUL character). The return value is typecast to a char* because we want to assign that pointer to a char* variable. The next line copies the contents of the parameter string into the return string. And then we return this pointer.

Input and output

Since all the stream-based input and output in C++, including cin and cout, is built on top of C++'s object facilities, C cannot support the same I/O operations used in C++. In C you must use the alternative capabilities.

In C-style I/O, the standard input (from the keyboard) is referred to as stdin, and the standard output (to the monitor) is stdout. To read a line from the standard input, use the following.

result = fgets(buffer, BUF_LEN, stdin);

The buffer argument should be a pointer to the first character of a character array into which the input should be placed. The second argument says how large this array is, so that the function doesn't go off the end of the array. And the final argument specifies which file you want to read from. The functions reads characters into the buffer until it has inserted the end-of-line into the buffer, it has reached the end of the file, or it has gone to the end of the buffer indicated by the second argument (BUF_LEN) The function will usually returns buffer, but it returns NULL if the end of the file has been reached.

Output to stdout is most convenient using the printf() function. This function takes different numbers of arguments at different times, but the first argument is always a string specifying the format of what should be printed.

printf("Now we are %d.\n", i);

The string within the format string is printed, except that percent signs specify that the value of the next argument should be printed instead. So if i held the value 6 when reaching this statement, the program would print the following.

Now we are 6.

The type of the argument is inferred from the letter following the percent sign. The d stands for decimal, and indicates that the argument is an integer. Other useful letters for following the percent sign are s for strings, x for integers to be printed in hexadecimal, and and lf for doubles.

If you want to print multiple values with the same printf() statement, you can just put several formatting indicators in the format string and list the values in order in the following arguments.

printf("%d^2 = %d\n", 5, 5*5);  // prints ``5^2 = 25''