C-String Data in C++


If you have not yet read the page on this site about arrays, you would benefit from doing so first to learn about how this data type is structured and stored.

BACKGROUND

This document was written under the assumption that its reader was already familiar with the fundamental concepts involving computer data types and languages. The computer data type that is most often used by people for input is the "character" data type. The most widely used "data format" (computer language) is ASCII-8 (the American Standard Code for Information Interchange, 8 bit enhancement). This is a language coding standard that provides for 256 different possible items, including: uppercase and lowercase letters, numerals, standard punctuation, input/output device control codes (such as carriage return and tab), and a variety of symbols that are not normally found on a keyboard, but can be produced by the computer, such as the copyright symbol ©.

When using the character data type, we treat each individual symbol as one unit of data. For example, the phrase "abc123" would be stored as six separate characters; but doing so would require six separate storage locations (variables). If we want to store the entire phrase as a single unit of data in just one storage location, then we must use a data type that allows for multiple characters to be stored under just one label. A data type that can be used for this in both C and C++ is referred to as "c-string" (a chain of characters).

C++ allows programmers to define data objects from a class named string. In the C programming language, there is not a formal string data type. Both C and C++ allow programmers to implement the string data type using arrays of character storage locations that can be referenced collectively with one label or individually by subscript. Individual character constants (such as 'a' or '5') are represented in these languages enclosed in apostrophes (a.k.a. "single-quoted"). Character storage is declared one character at a time using the data type named char. String constants (such as "Hello" or "The answer is:") can be represented as groups of (double) quoted symbols. String storage can done either as string objects or as arrays of characters (know a C-strings). These arrays must be declared as having enough elements to store all of the characters in the string plus one additional special character used to indicate the end of the string. This "end-of-string" character is denoted as '\0' and is often referred to as the "null character". It is used by many string manipulating functions to indicate where a string ends. This is necessary because a string might not fill up all of the character elements available in the array that was declared to store it. The '\0' marks the first unused element in the array and prevents it (and higher) elements from being processed as part of the string data.

String Functions

Both the C and C++ languages provide a variety of pre-written functions to help programmers manipulate strings. Some of them copy string data. Others extract sub-strings from larger strings. And others help to attach strings together to form larger strings. Most of these functions receive string input and produce string output. However, some of them involve strings in some way, but don't use string input or produce string output. From the point of view of terminology, any function that involves strings in any way is referred to as a "string function". For more details on string functions, see Chapter 10 in your textbook.


STRING DECLARATION

In C++, C-string arrays can be declared in two different ways, depending on whether you know the contents of each element in advance or not.

Option 1 - String Declaration without Initialization

If you do not know the contents of the array in advance of the program's execution, then you would declare the array in the following manner:

     char label[size];

The statement used to declare an array is written in a manner similar to other variable declarations. The data type is written first, followed by the variable label, and finally an integer constant (or symbolic constant) in brackets indicating the size (or quantity of elements to be allocated). All elements will be of char data type. The identifiers used to label an array must conform to the same rules as any other identifier in C++ and cannot duplicate a name already in use by a scalar. An array to store a person's last name no longer than 15 characters would be declared with the statement:

     char LAST[16];

This would declare sixteen char storage locations identified as: LAST[0], LAST[1], LAST[2], through LAST[15]. The extra one would be provided to hold the end-of-string character '\0' that is appended to the end of all string data by many commands in C++.

Option 2 - String Declaration with String Constant Initialization

The act of storing a string into the newly created string variable (array of characters) could be accomplished in a variety of ways. If the value (data) was known at the time the array was declared, the array could be both declared and initialized in the same statement as follows:

     char LAST[16]="Andrews";

This would declare sixteen char storage locations identified as: LAST[0], LAST[1], LAST[2], through LAST[15]. The first seven elements (LAST[0] through LAST[6]) would be assigned the characters in the name "Andrews". LAST[7]) would be assigned the end-of-string character '\0'. The remaining elements of the array would be unassigned (unknown). This would not be a problem, because most functions that manipulate strings stop processing the array data when they encounter the '\0'.

Option 3 - String Declaration with Element Initialization

Because arrays also can be declared and initialized in a manner that involves listing the values of individual elements, the LAST array also could be declared and initialized as follows:

     char LAST[16] = {'A','n','d','r','e','w','s','\0'};

This approach would have the same effect as the declaration above, but is not often used in place of the easier form above.


STRING STORAGE

String Storage as Individual Elements

Another (more difficult) approach to storing a name in the string LAST could be accomplished after the array was declared (as shown in Option 1 above) using individual assignment statements, such as:

     LAST[0]='A';
     LAST[1]='n';
     LAST[2]='d';
     LAST[3]='r';
     LAST[4]='e';
     LAST[5]='w';
     LAST[6]='s';

But the end-of-string character '\0' would have to be assigned manually as:

     LAST[7]='\0';

String Storage as an Entire String Constant

Novice C++ programmers are often surprised to discover that a string cannot be directly assigned to a C-string variable using a statement such as:

     LAST="Andrews";  /* Example of a typical coding error */

Remember that C-string variables are not scalar storage locations, but rather arrays of characters. In the C++ language, a reference to an array using just its label (LAST in this example) is interpreted by C++ as a reference to the address of the first element of the array (in other words &LAST[0]). It would make no sense to assign a string constant to an address. Thus a function was developed to help copy string data into a C-string storage location. The name "Andrews" can be copied into the C-string variable LAST using the "string copy" function as follows:

     strcpy (LAST, "Andrews");  /* Store string data in LAST array */

This would work much like the combined declaration and initialization statement shown in Option 2 above. The first seven elements (LAST[0] through LAST[6]) would be assigned the characters in the name "Andrews". As above, LAST[7]) would be assigned the end-of-string character '\0' and the remaining elements of the array would be unassigned (unknown).

String Storage of Sub-Strings

An optional function was developed to help copy only a portion of a source string into another C-string storage location. For example, consider the following declarations:

     char STRING1[10]="Nathan";
     char STRING2[10];

STRING1 was initialized to have the contents "Nathan". STRING2 was declared, but given no initial value. If we now wanted to copy only the first three characters ("Nat") from STRING1 to STRING2, we could use the special "limited string copy" function as follows:

     strncpy (STRING2, STRING1, 3);  /* copy the first 3 characters of STRING1 to STRING2 */

Notice the difference in the name of the function. It has an 'n' in the middle of its name. Also notice the addition of a third actual parameter (3) in the call. This indicates the quantity of characters to copy. The function will copy this number of characters unless it encounters a '\0' in STRING1 first (in which case it will stop short). In either event, the programmer must add a manual statement to write a '\0' to the end of STRING2, as in:

     STRING2[3]='\0';

KEYBOARD ENTRY OF C-STRINGS

Although the entry of simple C-strings at the keyboard can be handled using the cin object, such an approach is risky because of that object's treatment of whitespace and the potential for inputting more characters than the defined size of the C-string (character array). To avoid these problems, use the getline member function of the cin object, as in:

cin.getline(NAME,SIZE);

The first argument (NAME) in the example above is the identifier of the character array.
The second argument (SIZE) limits user input to prevent reading characters in excess of the array's size.
A third optional argument can be included to specify a delimiter (other than the default '\n') to signal the end the input. Beware that when you specify a delimiter other than '\n' in the getline function, any '\n' entered in the input would be stored just like any other character in the input buffer and would not terminate the input of the string.

NOTE: The getline member function of the cin object should not be confused with the global getline function, which is used to input string objects. The source code below demonstrates how to define a string object and read data into it from the keyboard using the global getline function:

#include <string>  // Load a header file that defines string objects and methods
string NAME;   // define a variable named NAME of class string
cout << "Enter your name:";  // Prompt for the user's name
getline (cin,NAME);  // Read string input from the keyboard that could contain whitespace

OUTPUT OF C-STRINGS

Output of C-strings can be accomplished in two different ways in C++. One method involves treating the string data as a single unit and using the cout stream object, as in:

     cout << STRINGNAME;

where STRINGNAME represents the name of a C-string variable (character array).

The use of the cout object in this way depends on the C-string variable having been properly stored with a '\0' (end-of-string) character terminating it. Without that "null character", the function would have no way of determining how long the string was, and would continue to access data beyond the boundaries of the character array declared to hold the string.

The other method for outputting a C-string involves treating the string data formally as an array of characters and outputting each character separately in a loop as individual elements of the array. For example, after defining the C-string (character array)

     char WORD[10]="Hello";

we could display the five characters within that string with a counting loop as in

     for (C=0; C<5; C++) cout << WORD[C];

Note the use of an integer variable (C) to act as a subscript to each element of the array during each pass of the loop. The use of a counting loop for this purpose requires that we know the size of the string we plan to display. If we do not know this, we can determine it using the "string length" function as in:

     for (C=0; C<strlen(WORD); C++) cout << WORD[C];

Like most other string functions, the strlen function relies on the string having been properly terminated with a null character. The expected presense of the null character would allow us to employ a sentinel controlled loop rather than a counting loop to output the string in the following manner:

     C=0;
     while (WORD[C]!='\0')
     {
          cout << WORD[C];
          C=C+1;
     }

Of course this still relies on the string having been properly terminated with a null character. We can combine both control methods into a hybrid control method using the logical "and" operator (&&) to produce a highly reliable output technique of

     C=0;
     while (WORD[C]!='\0' && C<SIZE)
     {
          cout << WORD[C];
          C=C+1;
     }

where SIZE represents a known size of the array used to store the string.


For more information about strings, read Chapter 10 in your textbook.

PATH: Instructional Server> COP 2000> Examples>