SQL string functions
Handling strings in SSIS expressions is different from dealing with string data in SQL Server. The previous section discussed some of the differences with handling NULL values. You also have to pay attention to the Unicode and non- Unicode strings. If a package is moving data between multiple Unicode string sources, you have to pay attention to the code pages between the strings. If you are comparing strings, you also have to pay attention to string padding, trimming, and issues with data truncations. Handling strings is a little involved, but you really only need to remember a few things.
Expression functions return Unicode string results. If you are writing an expression to return the uppercase version of a varchar-type column of data, the result will be a Unicode column with all capital letters. The string function Upper() returns a Unicode string. In fact, SSIS sets all string operations to return a Unicode string. For example, note the date expression in the Derived Column Transformation in shown in below screen shot.
Here you are just adding a string column that includes the concatenation of a date value. The function is using a DatePart() function whose results are cast to a non-Unicode string, but the default data type chosen in the editor is a Unicode string data type. This can be overridden, of course, but it is something to watch for as you develop packages. On the one hand, if the data type is reverted to non-Unicode, then the string has to be converted for each further operation. On the other hand, if the value is left as a Unicode string and the result is persisted in a non-Unicode format, then at some point it has to be converted to a non-Unicode value. The rule of thumb that usually works out is to leave the strings converted as Unicode and then convert back to non- Unicode if required during persistence. Of course, this depends on whether there is a concern about using Unicode data.
Comparing strings requires that you have two strings of the same padding length and case. The comparison is case and padding sensitive. Expressions should use the concatenation operator (+) to get the strings into the same padding style. Typically, this is done when putting together date strings with an expected type of padding, like this:
RIGHT(“0” + @Day, 2) + “/” + RIGHT(“0” + @Month, 2) + “/” +
RIGHT(“00” + @Year, 2)
This type of zero padding ensures that the values in both variables are in the same format for comparison purposes. By padding both sides of the comparison, you ensure the proper equality check:
RIGHT(“0” + @Day, 2) + “/”
+ RIGHT(“0” + @Month, 2) + “/”
+ RIGHT(“00” + @Year, 2)
== RIGHT(“0” + @FileDay, 2) + “/”
+ RIGHT(“0” + @FileMonth, 2) + “/”
+ RIGHT(“00” + @FileYear, 2)
A similar type of padding operation can be used to fill in spaces between two values:
Typically, space padding is used for formatting output, but it could be used for comparisons. More often than not, spaces are removed from strings for comparison purposes. To remove spaces from strings in expressions, use the trim functions: LTrim(), RTrim(), and Trim(). These functions are selfexplanatory, and they enable comparisons for strings that have leading and trailing spaces. For example, comparing the strings “Canterbury” and “Canterbury” return a false unless the expression is written like this:
Trim(“Canterbury”) == Trim(“Canterbury “)
This expression returns true because the significant spaces are declaratively removed. Be careful with these extra spaces in string expressions as well. Spaces are counted in all string functions, which can result in extra character counts for extra spaces when using the LEN() function and can affect carefully counted SUBSTRING() functions that do not expect leading and trailing spaces. If these issues are of importance, employ a Derived Column Transformation to trim these columns early in the Data Flow process.