Skip to content

String Manipulation Functions

Difficulty: Intermediate
Generated on: 2025-07-10 02:26:12
Category: SQL Cheatsheet for Database Development


SQL String Manipulation Cheatsheet (Intermediate)

Section titled “SQL String Manipulation Cheatsheet (Intermediate)”

This cheatsheet covers intermediate-level string manipulation functions in SQL. These functions allow you to extract, modify, and analyze string data within your database. Using these effectively improves data quality, enables complex queries, and enhances application functionality.

  • SUBSTRING(string, start, length) (Standard SQL)

    • Extracts a substring of length characters from string, starting at position start.
    SELECT SUBSTRING('Hello World', 7, 5); -- Output: World
    • MySQL: SUBSTR(string, start, length) or MID(string, start, length) are aliases.
    • SQL Server: SUBSTRING(string, start, length)
    • PostgreSQL: SUBSTRING(string FROM start FOR length) or SUBSTR(string, start, length)
    • Oracle: SUBSTR(string, start, length)
  • || (ANSI SQL) - Standard concatenation operator

    SELECT 'Hello' || ' ' || 'World'; -- Output: Hello World
  • CONCAT(string1, string2, ...) (Standard SQL)

    • Concatenates multiple strings into a single string.
    SELECT CONCAT('Hello', ' ', 'World'); -- Output: Hello World
    • MySQL: CONCAT(string1, string2, ...)
    • SQL Server: string1 + string2 + ... (Use CONCAT for NULL handling)
    • PostgreSQL: string1 || string2 || ... or CONCAT(string1, string2, ...)
    • Oracle: string1 || string2 || ... or CONCAT(string1, string2, ...)
  • LENGTH(string) (Standard SQL)

    • Returns the length (number of characters) of a string.
    SELECT LENGTH('Hello'); -- Output: 5
    • MySQL: LENGTH(string)
    • SQL Server: LEN(string) (Trailing spaces are ignored)
    • PostgreSQL: LENGTH(string) or CHAR_LENGTH(string)
    • Oracle: LENGTH(string)
  • REPLACE(string, old_substring, new_substring) (Standard SQL)

    • Replaces all occurrences of old_substring with new_substring in string.
    SELECT REPLACE('Hello World', 'World', 'SQL'); -- Output: Hello SQL
    • MySQL: REPLACE(string, old_substring, new_substring)
    • SQL Server: REPLACE(string, old_substring, new_substring)
    • PostgreSQL: REPLACE(string, old_substring, new_substring)
    • Oracle: REPLACE(string, old_substring, new_substring)
  • LPAD(string, length, pad_string) (Standard SQL / Oracle)

    • Pads the left side of string with pad_string until it reaches length.
    SELECT LPAD('SQL', 5, '0'); -- Output: 00SQL
  • RPAD(string, length, pad_string) (Standard SQL / Oracle)

    • Pads the right side of string with pad_string until it reaches length.
    SELECT RPAD('SQL', 5, '0'); -- Output: SQL00
    • MySQL: LPAD(string, length, pad_string), RPAD(string, length, pad_string)
    • SQL Server: Use LEFT(REPLICATE(pad_string, length) + string, length) for LPAD and LEFT(string + REPLICATE(pad_string, length), length) for RPAD.
    • PostgreSQL: LPAD(string, length, pad_string), RPAD(string, length, pad_string)
    • Oracle: LPAD(string, length, pad_string), RPAD(string, length, pad_string)
  • TRIM([LEADING | TRAILING | BOTH] [trim_character] FROM string) (Standard SQL)

    • Removes specified characters from the beginning, end, or both sides of a string.
    SELECT TRIM(BOTH ' ' FROM ' Hello World '); -- Output: Hello World
    SELECT TRIM(LEADING '0' FROM '000123'); -- Output: 123
    • MySQL: TRIM([LEADING | TRAILING | BOTH] [trim_character] FROM string), LTRIM(string), RTRIM(string)
    • SQL Server: TRIM(trim_character FROM string) (SQL Server 2017 and later), LTRIM(string), RTRIM(string)
    • PostgreSQL: TRIM([LEADING | TRAILING | BOTH] [trim_character] FROM string), LTRIM(string), RTRIM(string)
    • Oracle: TRIM([LEADING | TRAILING | BOTH] [trim_character] FROM string), LTRIM(string), RTRIM(string)
  • UPPER(string) (Standard SQL)

    • Converts a string to uppercase.
    SELECT UPPER('hello'); -- Output: HELLO
  • LOWER(string) (Standard SQL)

    • Converts a string to lowercase.
    SELECT LOWER('HELLO'); -- Output: hello
    • MySQL: UPPER(string), LOWER(string)
    • SQL Server: UPPER(string), LOWER(string)
    • PostgreSQL: UPPER(string), LOWER(string)
    • Oracle: UPPER(string), LOWER(string)
  • POSITION(substring IN string) (Standard SQL)

    • Returns the starting position of substring within string. Returns 0 if not found.
    SELECT POSITION('World' IN 'Hello World'); -- Output: 7
  • INSTR(string, substring) (Oracle)

    SELECT INSTR('Hello World', 'World'); -- Output: 7
  • CHARINDEX(substring, string) (SQL Server)

    SELECT CHARINDEX('World', 'Hello World'); -- Output: 7
  • LOCATE(substring, string) (MySQL)

    SELECT LOCATE('World', 'Hello World'); -- Output: 7
  • STRPOS(string, substring) (PostgreSQL)

    SELECT STRPOS('Hello World', 'World'); -- Output: 7
  • Data Cleansing: Trimming whitespace, replacing invalid characters, standardizing case.
  • Data Extraction: Extracting relevant information from text fields (e.g., extracting area codes from phone numbers).
  • Data Formatting: Formatting phone numbers, addresses, and other data for display or reporting.
  • Search and Filtering: Using string functions in WHERE clauses for more flexible searches (e.g., searching for names containing a specific substring).
  • Data Transformation: Combining data from multiple columns into a single, formatted string.
  • Generating Unique Identifiers: Using string functions combined with other data to create unique keys.
  • Parsing Log Files: Extracting specific information from log file entries.
  • Implementing Business Rules: Enforcing data validation rules based on string patterns.
  • Performance:
    • Avoid using string functions in WHERE clauses on large tables if possible. Consider creating indexes on the columns you’re filtering. Function-based indexes can sometimes help, but they add overhead.
    • Use LENGTH to pre-filter rows before applying more expensive string functions.
    • Use CONCAT and || carefully, especially when dealing with many strings. Repeated concatenation can be inefficient.
    • Consider using stored procedures or user-defined functions (UDFs) to encapsulate complex string manipulation logic. This can improve code reusability and performance.
  • Security:
    • Be mindful of SQL injection vulnerabilities when constructing dynamic SQL queries that involve string manipulation. Always sanitize user input to prevent malicious code from being injected into your queries.
    • Avoid storing sensitive information directly in strings if possible. Encrypt sensitive data when storing it in the database.
  • Readability and Maintainability:
    • Use comments to explain complex string manipulation logic.
    • Break down complex queries into smaller, more manageable steps.
    • Use meaningful variable names to improve code clarity.
    • Use consistent coding style.
  • Data Types:
    • Be aware of the data types you are working with. Implicit conversions can sometimes lead to unexpected results.
    • Use the correct data type for storing string data (e.g., VARCHAR, TEXT, NVARCHAR).
  • NULL Handling:
    • Be aware of how string functions handle NULL values. In many databases, concatenating a string with NULL results in NULL. Use COALESCE or similar functions to handle NULL values explicitly.
    • Use IS NULL and IS NOT NULL to check for NULL values.
  • Character Sets and Collations:
    • Be aware of the character set and collation of your database. These settings can affect how string functions behave.
    • Use appropriate collations when comparing strings to ensure correct results.

Sample Data (Employees Table):

employee_idfirst_namelast_nameemailphone_number
1JohnDoejohn.doe@example.com123-456-7890
2JaneSmithjane.smith@example.com987-654-3210
3PeterJonespeter.jones@example.com555-123-4567
4MaryBrownmary.brown@example.com111-222-3333
5DavidWilsondavid.wilson@example.com444-555-6666
-- Concatenate first and last names to create a full name.
SELECT
employee_id,
CONCAT(first_name, ' ', last_name) AS full_name
FROM
Employees;
-- Output:
-- | employee_id | full_name |
-- |-------------|-------------|
-- | 1 | John Doe |
-- | 2 | Jane Smith |
-- | 3 | Peter Jones |
-- | 4 | Mary Brown |
-- | 5 | David Wilson|

5.2. Extracting the Domain from Email Addresses

Section titled “5.2. Extracting the Domain from Email Addresses”
-- Extract the domain name from email addresses.
SELECT
employee_id,
SUBSTRING(email, POSITION('@' IN email) + 1) AS email_domain
FROM
Employees;
-- Output:
-- | employee_id | email_domain |
-- |-------------|------------------|
-- | 1 | example.com |
-- | 2 | example.com |
-- | 3 | example.com |
-- | 4 | example.com |
-- | 5 | example.com |
-- Format phone numbers to (XXX) XXX-XXXX.
SELECT
employee_id,
CONCAT(
'(',
SUBSTRING(phone_number, 1, 3),
') ',
SUBSTRING(phone_number, 5, 3),
'-',
SUBSTRING(phone_number, 9, 4)
) AS formatted_phone_number
FROM
Employees;
-- Output:
-- | employee_id | formatted_phone_number |
-- |-------------|--------------------------|
-- | 1 | (123) 456-7890 |
-- | 2 | (987) 654-3210 |
-- | 3 | (555) 123-4567 |
-- | 4 | (111) 222-3333 |
-- | 5 | (444) 555-6666 |
-- Sample data with leading/trailing spaces.
CREATE TABLE TempEmployees (
employee_id INT,
first_name VARCHAR(50),
last_name VARCHAR(50)
);
INSERT INTO TempEmployees (employee_id, first_name, last_name) VALUES
(1, ' John ', ' Doe ');
-- Trim leading and trailing whitespace from first and last names.
SELECT
employee_id,
TRIM(first_name) AS first_name,
TRIM(last_name) AS last_name
FROM
TempEmployees;
DROP TABLE TempEmployees;
-- Output:
-- | employee_id | first_name | last_name |
-- |-------------|------------|-----------|
-- | 1 | John | Doe |
-- Replace "example.com" with "company.com" in email addresses.
SELECT
employee_id,
REPLACE(email, 'example.com', 'company.com') AS updated_email
FROM
Employees;
-- Output:
-- | employee_id | updated_email |
-- |-------------|-------------------------------|
-- | 1 | john.doe@company.com |
-- | 2 | jane.smith@company.com |
-- | 3 | peter.jones@company.com |
-- | 4 | mary.brown@company.com |
-- | 5 | david.wilson@company.com |
-- Perform a case-insensitive search for employees whose last name starts with 's'.
SELECT
employee_id,
first_name,
last_name
FROM
Employees
WHERE
LOWER(last_name) LIKE 's%'; -- Using LOWER for case-insensitivity
-- Output:
-- | employee_id | first_name | last_name |
-- |-------------|------------|-----------|
-- | 2 | Jane | Smith |
-- Pad the employee ID with leading zeros to ensure it is always 5 digits long
SELECT employee_id, LPAD(employee_id, 5, '0') AS padded_id FROM Employees;
-- Result:
-- | employee_id | padded_id |
-- |-------------|-----------|
-- | 1 | 00001 |
-- | 2 | 00002 |
-- | 3 | 00003 |
-- | 4 | 00004 |
-- | 5 | 00005 |
  • Incorrect String Lengths: Double-check the length arguments in SUBSTRING and LPAD/RPAD functions. Off-by-one errors are common.
  • Case Sensitivity: Be aware of case sensitivity when using string functions, especially in WHERE clauses. Use UPPER or LOWER to normalize the case.
  • NULL Values: Remember that concatenating with NULL often results in NULL. Use COALESCE or ISNULL to handle NULL values.
  • Performance Issues: Avoid using string functions excessively in WHERE clauses on large tables without proper indexing.
  • Character Encoding Issues: Be aware of character encoding differences and use appropriate data types (e.g., NVARCHAR for Unicode characters).
  • Database-Specific Syntax: Remember that some string functions have different names or slightly different syntax in different database systems. Always consult the documentation for your specific database.
  • SQL Injection: Avoid creating dynamic SQL queries by concatenating strings directly from user input. Use parameterized queries or prepared statements to prevent SQL injection vulnerabilities.
  • Forgetting to TRIM: When comparing strings, especially when pulling data from various sources, remember to TRIM whitespace from both sides to ensure a consistent comparison.
  • Assuming Fixed Lengths: Don’t assume that strings will always have a fixed length. Use LENGTH to determine the actual length of a string before performing operations on it.
  • Incorrect POSITION Usage: When using POSITION (or its equivalent), remember that it returns the starting position of the substring. Add 1 to the position if you want to start extracting from the character after the substring. Also, make sure to handle the case where the substring is not found (returns 0).
  • Missing Error Handling: When manipulating strings, consider adding error handling to your queries to gracefully handle unexpected data formats or invalid input. For example, you might use CASE statements to check for specific conditions before performing a string operation.
FunctionMySQLPostgreSQLSQL ServerOracleNotes
SUBSTRINGSUBSTR, MID, SUBSTRINGSUBSTRING, SUBSTRSUBSTRINGSUBSTR
CONCATCONCATCONCAT, ``CONCAT, +
LENGTHLENGTHLENGTH, CHAR_LENGTHLENLENGTHSQL Server’s LEN excludes trailing spaces.
REPLACEREPLACEREPLACEREPLACEREPLACE
LPADLPADLPADLEFT(REPLICATE(pad,len) + str, len)LPADSQL Server requires a workaround using REPLICATE and LEFT.
RPADRPADRPADLEFT(str + REPLICATE(pad,len), len)RPADSQL Server requires a workaround using REPLICATE and LEFT.
TRIMTRIM, LTRIM, RTRIMTRIM, LTRIM, RTRIMTRIM, LTRIM, RTRIMTRIM, LTRIM, RTRIMTRIM may require a newer version of SQL Server (2017+). LTRIM and RTRIM are more widely supported.
UPPERUPPERUPPERUPPERUPPER
LOWERLOWERLOWERLOWERLOWER
POSITIONLOCATESTRPOSCHARINDEXINSTRPOSITION is standard SQL, other databases have their own equivalents. The order of arguments may differ. CHARINDEX searches string for substring, INSTR searches string for substring.
Case SensitivityDepends on collation settings.Depends on collation settings.Depends on collation settings.Depends on collation settings.Use COLLATE clause for explicit control.
NULL HandlingCONCAT(NULL, 'str') returns NULL.CONCAT(NULL, 'str') returns NULL.NULL + 'str' returns NULL, CONCAT handles NULLs.`NULL