Storing and manipulating text

The most common type of data for variables is text. The most common types in .NET for working with text are shown in the following table:

Getting the length of a string

Add a new console application project named Ch04_ManipulatingText. Set the solution's startup project to be the current selection.

Sometimes, you need to find out the length of a piece of text stored in a string variable. Modify the code to look like this:

using static System.Console;

namespace Ch04_ManipulatingText
{
    class Program
    {
        static void Main(string[] args)
        {
 string city = "London";
 WriteLine($"{city} is {city.Length} characters long.");
        }
    }
}

Getting the characters of a string

A string variable uses an array of char internally to store the text. It also has an indexer, which means that we can use the array syntax to read its characters. Add the following statement:

WriteLine($"First char is {city[0]} and third is {city[2]}.");

Splitting a string

Sometimes you need to split some text wherever there is a character such as a comma.

Add more lines of code to define a single string with comma-separated city names. You can use the Split method and specify a character that you want to treat as the separator. An array of strings is then created which you can enumerate using a foreach statement:

string cities = "Paris,Berlin,Madrid,New York";
string[] citiesArray = cities.Split(',');
foreach (string item in citiesArray)
{
    WriteLine(item);
}

Extracting part of a string

Sometimes you need to get part of some text. For example, if you had a person's full name stored in a string with a space character between the first and last name, then you could find the position of the space using the IndexOf method, and then extract the first name and last name as two parts using the Substring method shown as follows:

string fullname = "Alan Jones";
int indexOfTheSpace = fullname.IndexOf(' ');
string firstname = fullname.Substring(0, indexOfTheSpace);
string lastname = fullname.Substring(indexOfTheSpace + 1);
WriteLine($"{lastname}, {firstname}");
Tip

If the format of the full name was different, for example, "Lastname, Firstname", then the code would be slightly different.

Checking a string for content

Sometimes you need to check whether a piece of text starts or ends with some characters or contains some characters. For example, the following code checks whether the company variable starts with the letter M and contains the letter N:

string company = "Microsoft";
bool startsWithM = company.StartsWith("M");
bool containsN = company.Contains("N");
WriteLine($"Starts with M: {startsWithM}, contains an N: {containsN}");

Press Ctrl + F5 to run the application and check the output:

London is 6 characters long.
First char is L and third is n.
Paris
Berlin
Madrid
New York
Jones, Alan
Starts with M: True, contains an N: False

Other string members

Here are some other string members:

Building strings efficiently

You can concatenate two strings to make a new string using the String.Concat method or simply using the + operator. But, this is a bad practice because .NET must create a completely new string in memory. This might not be noticeable if you are only concatenating two strings but if you concatenate inside a loop, it can have a significant negative impact on performance and memory use.

Validating input with regular expressions

Regular expressions are useful for validating input from the user. They are very powerful and can get very complicated. Almost all programming languages have support for regular expressions, and use a common set of special characters to define them.

Add a new console application project named Ch04_RegularExpressions. At the top of the file, import the following namespace and type:

using System.Text.RegularExpressions;
using static System.Console;

In the Main method, add the following statements:

Write("Enter your age: ");
string input = ReadLine();
Regex ageChecker = new Regex(@"\d");
if(ageChecker.IsMatch(input))
{
    WriteLine("Thank you!");
}
else
{
    WriteLine($"This is not a valid age: {input}");
}
Tip

The @ character in front of a string switches off the ability to enter escape characters in a string variable. Escape characters are prefixed with a backslash (\). For example, \t means a tab and \n means new line. When writing regular expressions, we can disable this feature. Prefixing a string with @ allows a backslash to be a backslash.

Press Ctrl + F5, and see the output. If you enter a valid age, it will say "Thank you!"

Enter your age: 34
Thank you!

If you enter carrots, you will see the error message:

Enter your age: carrots
This is not a valid age: carrots

However, if you enter bob30smith it says "Thank you!"

Enter your age: bob30smith
Thank you!

The regular expression we used is \d, which means one digit. However, it does not limit what is entered before and after the digit.

Change the regular expression to ^\d$, like this:

Regex ageChecker = new Regex(@"^\d$");

Rerun the application. Now, it rejects anything except a single digit.

We want to allow one or more digits. To do this, we add a + (plus) after the \d. Change the regular expression to look like this:

Regex ageChecker = new Regex(@"^\d+$");

Rerun the application and see how the regular expression now only allows positive whole numbers of any length.

The syntax of a regular expression

Here are some common special symbols that you can use in regular expressions:

Examples of regular expressions

Here are some examples of regular expressions:

Tip

Best Practice

Use regular expressions to validate input from the user. The same regular expressions can be reused in C# and other languages, such as JavaScript.