C strings


Declaration

A C string (also known as a null-terminated string) is usually declared as an array of char. However, an array of char is not by itself a C string. A valid C string requires the presence of a terminating "null character" (a character with ASCII value 0, usually represented by the character literal '\0').

Since char is a built-in data type, no header file needs to be included to create a C string. The C library header file <cstring> contains a number of utility functions that operate on C strings.

Here are some examples of declaring C strings as arrays of char:

char s1[20];             // Character array - can hold a C string, but is not yet a valid C string

char s2[20] = { 'h', 'e', 'l', 'l', 'o', '\0' };     // Array initialization

char s3[20] = "hello";                               // Shortcut array initialization

char s4[20] = "";        // Empty or "null" C string of length 0, equal to the string literal ""

It is also possible to declare a C string as a pointer to a char:

const char* s3 = "hello";

This creates an unnamed character array just large enough to hold the string (including the null character) and places the address of the first element of the array in the char pointer s3. This is a somewhat advanced method of manipulating C strings that should probably be avoided by inexperienced programmers who don't understand pointers yet. If used improperly, it can easily result in corrupted program memory or runtime errors.

Representation in Memory

Here is another example of declaring a C string:

char name[10] = "Karen";

The following diagram shows how the string name is represented in memory:

Array-based C string

The individual characters that make up the string are stored in the elements of the array. The string is terminated by a null character. Array elements after the null character are not part of the string, and their contents are irrelevant.

A "null string" or "empty string" is a string with a null character as its first character:

Null C string

The length of a null string is 0.

What about a C string declared as a char pointer?

const char* name = "Karen";

This declaration creates an unnamed character array just large enough to hold the string "Karen" (including room for the null character) and places the address of the first element of the array in the char pointer name:

Pointer-based C string

Subscripting

Like any other array, the subscript operator may be used to access the individual characters of a C++ string:

cout << s3[1] << endl;         // Prints the character 'e', the second character in the string "Hello"

Since the name of a C string is converted to a pointer to a char when used in a value context, you can also use pointer notation to access the characters of the string:

cout << *s3 << endl;           // Prints the character 'h', the character pointed to by s3

cout << *(s3 + 4) << endl;     // Prints the character 'o', the fifth character in the string "Hello"

String Length

You can obtain the length of a C string using the C library function strlen(). This function takes a character pointer that points to a C string as an argument. It returns the data type size_t (a data type defined as some form of unsigned integer), the number of valid characters in the string (not including the null character).

Examples

char s[20] = "Some text";

cout << "String length is " << strlen(s) << endl;     // Length is 9

int length = strlen(s);
// Loop through characters of string
for (int i = 0; i < length; i++)
    cout << s[i];
cout << endl;

String Comparison

Comparing C strings using the relational operators ==, !=, >, <, >=, and <= does not work correctly, since the array names will be converted to pointers. For example, the expression

if (s1 == s2)
{
    ...
}

actually compares the addresses of the first elements of the arrays s1 and s2, not their contents. Since those addresses are different, the relational expression is always false.

To compare the contents of two C strings, you should use the C library function strcmp(). This function takes two pointers to C strings as arguments, either or both of which can be string literals. It returns an integer less than, equal to, or greater than zero if the first argument is found, respectively, to be less than, to match, or be greater than the second argument.

The strcmp() function can be used to implement various relational expressions:

if (strcmp(s1, s2) < 0)      // If the C string s1 is less than the C string s2
{
    ...
}

if (strcmp(s1, s2) == 0)     // If the C string s1 is equal to the C string s2
{
    ...
}

if (strcmp(s1, s2) > 0)      // If the C string s1 is greater than the C string s2
{
    ...
}

if (strcmp(s1, s2) <= 0)     // If the C string s1 is less than or equal to the C string s2
{
    ...
}

if (strcmp(s1, s2) != 0)     // If the C string s1 is not equal to the C string s2
{
    ...
}

if (strcmp(s1, s2) >= 0)     // If the C string s1 is greater than or equal to the C string s2
{
    ...
}

Assignment

A character array (including a C string) can not have a new value assigned to it after it is declared.

char s1[20] = "This is a string";
char s2[20];

s1 = "Another string";     // error: invalid array assignment

s2 = s1;                   // error: invalid array assignment

To change the contents of a character array, use the C library function strcpy(). This function takes two arguments: 1) a pointer to a destination array of characters that is large enough to hold the entire copied string (including the null character), and 2) a pointer to a valid C string or a string literal. The function returns a pointer to the destination array, although this return value is frequently ignored.

Examples

char s1[20];
char s2[20] = "Another new string";

strcpy(s1, "");               // Contents of s1 changed to null string

strcpy(s1, "new string");     // Contents of s1 changed to "new string"

strcpy(s1, s2);               // Contents of s1 changed to "Another new string"

If the string specified by the second argument is larger than the character array specified by the first argument, the string will overflow the array, corrupting memory or causing a runtime error.

Input and Output

The stream extraction operator >> may be used to read data into a character array as a C string. If the data read contains more characters than the array can hold, the string will overflow the array.

The stream insertion operator << may be used to print a C string or string literal.

Concatenation

The C library function strcat() can be used to concatenate C strings. This function takes two arguments: 1) a pointer to a destination character array that contains a valid C string, and 2) a pointer to a valid C string or string literal. The function returns a pointer to the destination array, although this return value is frequently ignored.

char s1[20] = "Hello";
char s2[20] = "friend";

strcat(s1, ", my ");     // s1 now contains "Hello, my "

strcat(s1, s2);          // s1 now contains "Hello, my friend"

The destination array must be large enough to hold the combined strings (including the null character). If it is not, the array will overflow.

Passing and returning

Regardless of how a C string is declared, when you pass the string to a function, the data type of the string can be specified as either char[] (array of char) or char* (pointer to char). In both cases, the string is passed by address.

A C string that was passed into a function can be returned by that function (strcpy() and strcat() are examples of this being done). Similarly, a member function of a class may return a C string data member of that class. In all cases, the string is returned by address, and the data type should be coded as char* or const char*. Trying to return a C string declared as a local variable of a function will produce a warning (and won't work).

A string literal like "hello" is considered a constant C string, and typically has its data type specified as const char* (pointer to a char constant).