Pointers and Indirection


References

Like all programming languages, C++ allows us to reference storage locations (a.k.a. variables) using identifiers known as "variable labels". Labels such as these are often called references because they are used to refer to the contents of storage locations. Variable labels are declared using statements such as:

     char  L;

The declaration statement above allocates an unused storage location that is large enough to store a single symbol and allows us to refer to it within our source code using the identifier L. The result of this statement is that the identifier L will then be associated with a specific address in computer memory. The illustration below provides an example of a possible result from the statement above in which the variable identifier L has been associated with memory location (address) 2048. The boxes in the illustration indicate 8-bit memory cells. The numbers above them indicate their addresses in memory. Each storage location consists of groups of binary digits (bits) that represent the contents of each variable as patterns of ones and zeros that conform to some language standard. Storage locations are never empty. Their contents might be unknown; but they always have some pattern of bits in them. Thus, the bit pattern shown below at address 2048 is simply the pattern that happened to be in the unused storage location at the time that the identifier L was associated with it.

Illustration of 5 character memory locations addressed 2046 through 2050,
 with the identifier L attached to the address 2048

Once the variable has been declared, we can assign a value (such as the character capital A) to that storage location by using its identifier to reference it, as in the statement:

     L = 'A';

Because the binary code used to represent the character capital A is 01000001, that pattern will be stored at location 2048 resulting in the following memory status:

Illustration of 5 character memory locations addressed 2046 through 2050,
 with the identifier L attached to the address 2048 with contents of 01000001

After the assignment statement above, wherever the identifier L appears in our source code, the computer will look in location 2048 and the character (in this example: A) will be used. Thus, the following output statement will display the character contents of the variable L:

     cout << L;

Referencing Numeric Data

Number values are referenced in a manner similar to characters, except that the data languages used to represent most numbers in memory often require more than one memory location to store the number. The declaration statement below locates enough unused storage locations to store an integer and allows us to refer to it within our source code using the identifier N.

     int  N;

The result of this statement is that the identifier N will then be associated with a single address in computer memory regardless of how many locations might be required to store the value. The illustration below provides an example of a possible result from the statement above in which the identifier N will be stored using memory locations 2048 and 2049. Two locations are typically used to store an integer. Despite this, the address of N would be indicated using its starting location of 2048. Once the variable is declared, we can assign an integer value (such as 5) to the variable by using its identifier to reference it, as in the statement:

     N = 5;

The binary code used to represent the integer value 5 (0000000000000101) will be stored starting at location 2048 (and running into 2049 also) resulting in the following memory status:

Illustration of 5 memory locations addressed 2046 through 2050,
 with the identifier N attached to the pair of addresses 2048 through 2049 with contents of 00000000 00000001

Note that the exact method used to store numeric data differs between different types of processors. This is not a problem because specific versions of the C++ compiler are developed separately for each type of processor and each compiler defines how the machine code will be written. The illustration above demonstrates only one possible method for representing integer data.

Address Referencing

Occasionally, we need to refer to the address of a variable. For this reason, C++ provides the "address of" operator & which, when used in as a prefix (placed in front of a variable label), refers to that identifier's address instead of its contents. So, the following statement will display the address of variable N (which is 2048) instead of its contents (5).

     cout << &N;

The & operator & is a unary operator, meaning that it operates on only one operand (the identifier that follows it). It always is written in front of and touching the variable label. The statement above displays the address as a hexadecimal (base-16) integer.

Pointers

Variables are used to store a variety of different data types. Once in a while, we need to store the address of an already declared storage location. Although the & operator allows us to refer to an address within a C++ statement, it does not provide storage location to save an address. For that we use a special type of variable called a pointer. Pointers are variables that are used to hold memory addresses rather than the normal data content (such as numbers and characters) that we usually manipulate with our programs.

Example

We can declare a character variable named SYMBOL and initialize it as follows:

     char  SYMBOL = 'A';

After that, any use of the identifier SYMBOL will refer to the character "A" stored in that location. Use of &SYMBOL will refer to the address of SYMBOL . If we want to store that address for some future use, we will have to declare a pointer to hold the address. This is done by declaring another variable. We could use any valid identifier to name it, but programmers typically compose a identifier including all or part of the original variable and the the letters "ptr" (or just "p"). Thus, a pointer that is intended to store the address of variable SYMBOL is likely to be named something like SYMPTR (or perhaps SYMBOLP). It would be declared in this way:

     char *SYMPTR;

Notice the star in the declaration statement above. Despite the fact that the star touches the identifier (SYMPTR), the identifier is still just SYMPTR (not *SYMPTR). As the star appears in the statement above, it changes the meaning of the data type from just "char" to "char *" (or "pointer to a char"). The star will be used again in another way below.

Simply declaring the SYMPTR variable is not enough to make it a pointer to the SYMBOL variable. To do that we must assign the address of SYMBOL into the pointer variable SYMPTR as such:

     SYMPTR = &SYMBOL;

Notice that we did not use the star in the assignment statement above. This is because the pointer's name is SYMPTR (not *SYMPTR) and we wanted to assign the address (referenced by &SYMBOL) directly into the pointer variable named SYMPTR.

Indirect Assignment

Once the pointer SYMPTR is declared and assigned, we can use it to point at the storage location SYMBOL. This method of referencing is referred to a indirection. The reason for using it will be explained later.

Normally, if we wanted to assign the symbol "$" into the character variable SYMBOL, we would do it through direct assignment, as in the statement:

     SYMBOL = '$';

But, the pointer SYMPTR allows us another way of referencing the SYMBOL storage location. By preceding the pointer label with the special unary indirection operator *, we can now store the dollar sign in SYMBOL using indirect assignment via our pointer SYMPTR, as in:

     *SYMPTR = '$';

The statement above says "Assign a dollar sign into the variable pointed to by SYMPTR." Notice that the statement above used the indirection operator * in front of the pointer identifier SYMPTR. If we had not done so, the statement would have been misinterpreted as saying "Assign a dollar sign into the pointer variable SYMPTR." The star in the statement above prevented us from directly referencing the pointer's storage location and instead used its contents (the address of SYMBOL) to point at and reference the SYMBOL variable. For this reason, the indirection operator * is also often called the dereferencing operator. As the star appears in the statement above, it is a unary operator, meaning that it operates on only one operand (the identifier that follows it). As such, it is written in front of and touching the pointer variable identifier.

Why use indirection?

You might ask "why use this apparently convoluted approach to referencing a variable?" The answer is "because, sometimes it's the only way we can."  Consider the process of passing parameters between functions. When we return a single value from a child function to the parent function that called it, the value is normally passed "by value", meaning that a copy of the value is passed via the identifier of the child function and then assigned into a variable declared by the parent. This works fine if we need to return only a single value. But if we want to return multiple values, we need to devise a different technique that does not depend upon the single identifier of the child function. Indirection is the solution. We can use the "address of" operator & in the actual parameter list of a function call to pass addresses of variables declared in the parent function into the child function. Then when declaring the child function we use the star symbol to declare pointer variables in the formal parameter list to receive the addresses. That will allow us to pass the addresses of multiple variables into the child function and use them as pointers to indirectly refer to multiple variables in the parent function. This technique of passing parameters is referred to as passing "by reference", meaning that variable references (in this case their addresses) are passed into the function rather than copies of the values.

"But why don't we just reference the variables in the parent function directly?" you ask. Because, those variable labels were declared locally in the parent function and don't exist in the child function. There is no way to pass variable identifiers. We can only pass their contents and (now through the use of pointers and indirection) their addresses.

For more information about parameter passing using pointers, see the the web page entitled "Parameter Passing by Reference".

PATH: Instructional Server> COP 2000> Examples>