T-SQL String Manipulation Tips and Techniques, Part 1
T-SQL is a language that was mainly designed to handle data manipulation tasks. Not much effort and attention were given to other kinds of tasks, such as string manipulation. Therefore, when you do need to manipulate strings in T-SQL, it can sometimes be quite challenging even for seemingly simple tasks. This article is the first of a two-part series in which I cover several common string manipulation needs. I d like to thank the following people who provided input regarding this topic: Ruben Garrigos, Kevin Boles, Fabiano Amorim, Milos Radivojevic, Peter Larsson, and Davide Mauri.
Counting Occurrences of a Substring Within a String
This code returns the value 3, indicating that the substring hello appears three times in the string abchellodehellofhello .
The steps in this technique are:
Because the solution is in the form of a single expression, it can be applied as part of a query using the underlying table s or view s attributes as inputs.
Exactly N Occurrences of a Substring Within a String
One obvious way to achieve this is to use the previous technique to count occurrences of a substring in a string, like so:
Another way to handle the task is to use the LIKE predicate, like so:
With both techniques, the validation is handled by a single expression. Therefore, you can easily embed the expression in a query filter or a CHECK constraint in a table definition.
Replacing Multiple Contiguous Spaces With a Single Space
) or even multiple characters (e.g.
(token plus space).
Replace in the result of the previous step each occurrence of
(space plus token) with (an empty string).
Replace in the result of the previous step each occurrence of
(token plus space) with (space).
All this translates to the following T-SQL expression:
The output of this code is: this is a string with lots of spaces.
Replacing Overlapping Occurrences
The next challenge was brought up by Davide Mauri; he initially found it in the Italian SQL Server forums. It involves a certain misconception that some people have regarding the way the REPLACE function works. Consider the following code. Before executing it, see if you can guess what the output will be:
Some people intuitively think that the output should be .y.y.y.y. however, it s actually .y.x.y.x. The reasoning behind the actual result is that REPLACE considers nonoverlapping occurrences from left to right. If we represent the string .x. with the symbol A, you can express the source string as AxAx.; then, replacing each occurrence of A with .y. gives you .y.x.y.x..
Suppose you want to handle the replacement task while considering overlapping occurrences. One way to achieve this is to nest two calls to the REPLACE function, like so:
Another option is replacing each occurrence of the separator with two, then applying the originally intended replacement, then replacing each two occurrences of the separator with one. Here s the expression in T-SQL:
String Formatting Numbers With Leading Zeros
There are several tasks related to formatting numbers as strings that people often inquire about. T-SQL perhaps isn t the best place to handle those, but I ll still present solutions mainly for their technical value.
The output of this code is -0000001759.
Another solution involves converting the absolute input value to a character string, concatenating a string with nine zeros (000000000) with the result of the previous step, extracting the 10 rightmost characters from the result, and finally adding the minus sign in front if the input is negative. Here s the expression that implements this logic:
SQL Server 2012 (formerly code-named Denali ), is planned to support a function called FORMAT that will make your life really easy when it comes to formatting strings representing numbers and other types of values. The FORMAT function accepts a format string compatible with .NET format strings for similar purposes, indicating how you want to format the value. For our specific task, the expression is very simple:
Left Trimming Leading Occurrences of a Character
The technique to achieve this is quite straightforward when the source string doesn t contain spaces to begin with. You first replace each occurrence of a zero with a space, then apply LTRIM to remove leading spaces, then replace each occurrence of a space with a zero, like so:
This code returns 1709.
If spaces are allowed in the input, you want to first replace each existing space with a token that you know can t appear in the source data (e.g.
), then apply the previous technique, then replace each occurrence of the token with a space, like so:
This code returns 1709 ABC.
This just gives you an idea of dealing with one complication. Of course there can be several additional complications, such as support for negative values, spaces at the beginning, and so on.
Checking That a String Is Made of Only Digits
One solution in which you don t need to worry about special cases is to generate a string pattern in which you replicate the character pattern [0-9] (basically, a digit) as many times as the number of characters in the input string, like so:
The problem with this approach is that if the input strings are very long, the pattern generated will be five times as long as the input.
There s another solution that also doesn t require you to deal with special cases, and it uses a very short pattern. You use two levels of negation, verifying that the string doesn t contain any character that s not a digit, like so:
Just like with the previous techniques I described, the last two validation techniques implement the logic using a single expression. This means that you can embed the predicate wherever predicates are supported in T-SQL, operating on a table or view attribute. For example, if you need to enforce this rule in a CHECK constraint as part of a table definition, you can, like so:
Make the Most of T-SQL s Tools
I covered several common string manipulation tasks: counting occurrences of a substring within a string, verifying that there are an exact number of occurrences of a substring within a string, replacing multiple contiguous spaces with a single space, replacing overlapping occurrences, formatting strings representing numbers with leading zeros, left trimming leading character occurrences, and verifying that a string is made of only digits. As you saw, T-SQL doesn t provide very rich capabilities in this area. You therefore need to be a bit creative with the limited set of tools that you do get. Fortunately, SQL Server Denali improves support for string manipulation by adding new functions, among them the FORMAT function that lets you format an input based on a format string.
Next month, I ll continue the discussion about string manipulation, and I ll cover additional challenges related to string manipulation involving tricky type conversions. If you have your own cool and creative techniques to handle common string manipulation needs, I d love to hear about them.
Posted In: NEWS
database, Development, Server, sql, t-sql
Leave a Comment