String Manipulation Functions
Difficulty: Intermediate
Generated on: 2025-07-10 02:26:12
Category: SQL Cheatsheet for Database Development
SQL String Manipulation Cheatsheet (Intermediate)
Section titled “SQL String Manipulation Cheatsheet (Intermediate)”1. Quick Overview
Section titled “1. Quick Overview”This cheatsheet covers intermediate-level string manipulation functions in SQL. These functions allow you to extract, modify, and analyze string data within your database. Using these effectively improves data quality, enables complex queries, and enhances application functionality.
2. Syntax
Section titled “2. Syntax”2.1. Substring Extraction
Section titled “2.1. Substring Extraction”-
SUBSTRING(string, start, length)(Standard SQL)- Extracts a substring of
lengthcharacters fromstring, starting at positionstart.
SELECT SUBSTRING('Hello World', 7, 5); -- Output: World- MySQL:
SUBSTR(string, start, length)orMID(string, start, length)are aliases. - SQL Server:
SUBSTRING(string, start, length) - PostgreSQL:
SUBSTRING(string FROM start FOR length)orSUBSTR(string, start, length) - Oracle:
SUBSTR(string, start, length)
- Extracts a substring of
2.2. String Concatenation
Section titled “2.2. String Concatenation”-
||(ANSI SQL) - Standard concatenation operatorSELECT 'Hello' || ' ' || 'World'; -- Output: Hello World -
CONCAT(string1, string2, ...)(Standard SQL)- Concatenates multiple strings into a single string.
SELECT CONCAT('Hello', ' ', 'World'); -- Output: Hello World- MySQL:
CONCAT(string1, string2, ...) - SQL Server:
string1 + string2 + ...(UseCONCATfor NULL handling) - PostgreSQL:
string1 || string2 || ...orCONCAT(string1, string2, ...) - Oracle:
string1 || string2 || ...orCONCAT(string1, string2, ...)
2.3. String Length
Section titled “2.3. String Length”-
LENGTH(string)(Standard SQL)- Returns the length (number of characters) of a string.
SELECT LENGTH('Hello'); -- Output: 5- MySQL:
LENGTH(string) - SQL Server:
LEN(string)(Trailing spaces are ignored) - PostgreSQL:
LENGTH(string)orCHAR_LENGTH(string) - Oracle:
LENGTH(string)
2.4. String Replacement
Section titled “2.4. String Replacement”-
REPLACE(string, old_substring, new_substring)(Standard SQL)- Replaces all occurrences of
old_substringwithnew_substringinstring.
SELECT REPLACE('Hello World', 'World', 'SQL'); -- Output: Hello SQL- MySQL:
REPLACE(string, old_substring, new_substring) - SQL Server:
REPLACE(string, old_substring, new_substring) - PostgreSQL:
REPLACE(string, old_substring, new_substring) - Oracle:
REPLACE(string, old_substring, new_substring)
- Replaces all occurrences of
2.5. String Padding
Section titled “2.5. String Padding”-
LPAD(string, length, pad_string)(Standard SQL / Oracle)- Pads the left side of
stringwithpad_stringuntil it reacheslength.
SELECT LPAD('SQL', 5, '0'); -- Output: 00SQL - Pads the left side of
-
RPAD(string, length, pad_string)(Standard SQL / Oracle)- Pads the right side of
stringwithpad_stringuntil it reacheslength.
SELECT RPAD('SQL', 5, '0'); -- Output: SQL00- MySQL:
LPAD(string, length, pad_string),RPAD(string, length, pad_string) - SQL Server: Use
LEFT(REPLICATE(pad_string, length) + string, length)forLPADandLEFT(string + REPLICATE(pad_string, length), length)forRPAD. - PostgreSQL:
LPAD(string, length, pad_string),RPAD(string, length, pad_string) - Oracle:
LPAD(string, length, pad_string),RPAD(string, length, pad_string)
- Pads the right side of
2.6. String Trimming
Section titled “2.6. String Trimming”-
TRIM([LEADING | TRAILING | BOTH] [trim_character] FROM string)(Standard SQL)- Removes specified characters from the beginning, end, or both sides of a string.
SELECT TRIM(BOTH ' ' FROM ' Hello World '); -- Output: Hello WorldSELECT TRIM(LEADING '0' FROM '000123'); -- Output: 123- MySQL:
TRIM([LEADING | TRAILING | BOTH] [trim_character] FROM string),LTRIM(string),RTRIM(string) - SQL Server:
TRIM(trim_character FROM string)(SQL Server 2017 and later),LTRIM(string),RTRIM(string) - PostgreSQL:
TRIM([LEADING | TRAILING | BOTH] [trim_character] FROM string),LTRIM(string),RTRIM(string) - Oracle:
TRIM([LEADING | TRAILING | BOTH] [trim_character] FROM string),LTRIM(string),RTRIM(string)
2.7. String Case Conversion
Section titled “2.7. String Case Conversion”-
UPPER(string)(Standard SQL)- Converts a string to uppercase.
SELECT UPPER('hello'); -- Output: HELLO -
LOWER(string)(Standard SQL)- Converts a string to lowercase.
SELECT LOWER('HELLO'); -- Output: hello- MySQL:
UPPER(string),LOWER(string) - SQL Server:
UPPER(string),LOWER(string) - PostgreSQL:
UPPER(string),LOWER(string) - Oracle:
UPPER(string),LOWER(string)
2.8. String Position
Section titled “2.8. String Position”-
POSITION(substring IN string)(Standard SQL)- Returns the starting position of
substringwithinstring. Returns 0 if not found.
SELECT POSITION('World' IN 'Hello World'); -- Output: 7 - Returns the starting position of
-
INSTR(string, substring)(Oracle)SELECT INSTR('Hello World', 'World'); -- Output: 7 -
CHARINDEX(substring, string)(SQL Server)SELECT CHARINDEX('World', 'Hello World'); -- Output: 7 -
LOCATE(substring, string)(MySQL)SELECT LOCATE('World', 'Hello World'); -- Output: 7 -
STRPOS(string, substring)(PostgreSQL)SELECT STRPOS('Hello World', 'World'); -- Output: 7
3. Common Use Cases
Section titled “3. Common Use Cases”- Data Cleansing: Trimming whitespace, replacing invalid characters, standardizing case.
- Data Extraction: Extracting relevant information from text fields (e.g., extracting area codes from phone numbers).
- Data Formatting: Formatting phone numbers, addresses, and other data for display or reporting.
- Search and Filtering: Using string functions in
WHEREclauses for more flexible searches (e.g., searching for names containing a specific substring). - Data Transformation: Combining data from multiple columns into a single, formatted string.
- Generating Unique Identifiers: Using string functions combined with other data to create unique keys.
- Parsing Log Files: Extracting specific information from log file entries.
- Implementing Business Rules: Enforcing data validation rules based on string patterns.
4. Best Practices
Section titled “4. Best Practices”- Performance:
- Avoid using string functions in
WHEREclauses on large tables if possible. Consider creating indexes on the columns you’re filtering. Function-based indexes can sometimes help, but they add overhead. - Use
LENGTHto pre-filter rows before applying more expensive string functions. - Use
CONCATand||carefully, especially when dealing with many strings. Repeated concatenation can be inefficient. - Consider using stored procedures or user-defined functions (UDFs) to encapsulate complex string manipulation logic. This can improve code reusability and performance.
- Avoid using string functions in
- Security:
- Be mindful of SQL injection vulnerabilities when constructing dynamic SQL queries that involve string manipulation. Always sanitize user input to prevent malicious code from being injected into your queries.
- Avoid storing sensitive information directly in strings if possible. Encrypt sensitive data when storing it in the database.
- Readability and Maintainability:
- Use comments to explain complex string manipulation logic.
- Break down complex queries into smaller, more manageable steps.
- Use meaningful variable names to improve code clarity.
- Use consistent coding style.
- Data Types:
- Be aware of the data types you are working with. Implicit conversions can sometimes lead to unexpected results.
- Use the correct data type for storing string data (e.g.,
VARCHAR,TEXT,NVARCHAR).
- NULL Handling:
- Be aware of how string functions handle
NULLvalues. In many databases, concatenating a string withNULLresults inNULL. UseCOALESCEor similar functions to handleNULLvalues explicitly. - Use
IS NULLandIS NOT NULLto check forNULLvalues.
- Be aware of how string functions handle
- Character Sets and Collations:
- Be aware of the character set and collation of your database. These settings can affect how string functions behave.
- Use appropriate collations when comparing strings to ensure correct results.
5. Examples
Section titled “5. Examples”Sample Data (Employees Table):
| employee_id | first_name | last_name | phone_number | |
|---|---|---|---|---|
| 1 | John | Doe | john.doe@example.com | 123-456-7890 |
| 2 | Jane | Smith | jane.smith@example.com | 987-654-3210 |
| 3 | Peter | Jones | peter.jones@example.com | 555-123-4567 |
| 4 | Mary | Brown | mary.brown@example.com | 111-222-3333 |
| 5 | David | Wilson | david.wilson@example.com | 444-555-6666 |
5.1. Concatenating First and Last Names
Section titled “5.1. Concatenating First and Last Names”-- Concatenate first and last names to create a full name.SELECT employee_id, CONCAT(first_name, ' ', last_name) AS full_nameFROM Employees;
-- Output:-- | employee_id | full_name |-- |-------------|-------------|-- | 1 | John Doe |-- | 2 | Jane Smith |-- | 3 | Peter Jones |-- | 4 | Mary Brown |-- | 5 | David Wilson|5.2. Extracting the Domain from Email Addresses
Section titled “5.2. Extracting the Domain from Email Addresses”-- Extract the domain name from email addresses.SELECT employee_id, SUBSTRING(email, POSITION('@' IN email) + 1) AS email_domainFROM Employees;
-- Output:-- | employee_id | email_domain |-- |-------------|------------------|-- | 1 | example.com |-- | 2 | example.com |-- | 3 | example.com |-- | 4 | example.com |-- | 5 | example.com |5.3. Formatting Phone Numbers
Section titled “5.3. Formatting Phone Numbers”-- Format phone numbers to (XXX) XXX-XXXX.SELECT employee_id, CONCAT( '(', SUBSTRING(phone_number, 1, 3), ') ', SUBSTRING(phone_number, 5, 3), '-', SUBSTRING(phone_number, 9, 4) ) AS formatted_phone_numberFROM Employees;
-- Output:-- | employee_id | formatted_phone_number |-- |-------------|--------------------------|-- | 1 | (123) 456-7890 |-- | 2 | (987) 654-3210 |-- | 3 | (555) 123-4567 |-- | 4 | (111) 222-3333 |-- | 5 | (444) 555-6666 |5.4. Trimming Whitespace from Names
Section titled “5.4. Trimming Whitespace from Names”-- Sample data with leading/trailing spaces.CREATE TABLE TempEmployees ( employee_id INT, first_name VARCHAR(50), last_name VARCHAR(50));
INSERT INTO TempEmployees (employee_id, first_name, last_name) VALUES(1, ' John ', ' Doe ');
-- Trim leading and trailing whitespace from first and last names.SELECT employee_id, TRIM(first_name) AS first_name, TRIM(last_name) AS last_nameFROM TempEmployees;
DROP TABLE TempEmployees;
-- Output:-- | employee_id | first_name | last_name |-- |-------------|------------|-----------|-- | 1 | John | Doe |5.5. Replacing Substrings
Section titled “5.5. Replacing Substrings”-- Replace "example.com" with "company.com" in email addresses.SELECT employee_id, REPLACE(email, 'example.com', 'company.com') AS updated_emailFROM Employees;
-- Output:-- | employee_id | updated_email |-- |-------------|-------------------------------|-- | 1 | john.doe@company.com |-- | 2 | jane.smith@company.com |-- | 3 | peter.jones@company.com |-- | 4 | mary.brown@company.com |-- | 5 | david.wilson@company.com |5.6. Case-Insensitive Search
Section titled “5.6. Case-Insensitive Search”-- Perform a case-insensitive search for employees whose last name starts with 's'.SELECT employee_id, first_name, last_nameFROM EmployeesWHERE LOWER(last_name) LIKE 's%'; -- Using LOWER for case-insensitivity
-- Output:-- | employee_id | first_name | last_name |-- |-------------|------------|-----------|-- | 2 | Jane | Smith |5.7. Padding a String
Section titled “5.7. Padding a String”-- Pad the employee ID with leading zeros to ensure it is always 5 digits longSELECT employee_id, LPAD(employee_id, 5, '0') AS padded_id FROM Employees;
-- Result:-- | employee_id | padded_id |-- |-------------|-----------|-- | 1 | 00001 |-- | 2 | 00002 |-- | 3 | 00003 |-- | 4 | 00004 |-- | 5 | 00005 |6. Common Pitfalls
Section titled “6. Common Pitfalls”- Incorrect String Lengths: Double-check the length arguments in
SUBSTRINGandLPAD/RPADfunctions. Off-by-one errors are common. - Case Sensitivity: Be aware of case sensitivity when using string functions, especially in
WHEREclauses. UseUPPERorLOWERto normalize the case. - NULL Values: Remember that concatenating with
NULLoften results inNULL. UseCOALESCEorISNULLto handleNULLvalues. - Performance Issues: Avoid using string functions excessively in
WHEREclauses on large tables without proper indexing. - Character Encoding Issues: Be aware of character encoding differences and use appropriate data types (e.g.,
NVARCHARfor Unicode characters). - Database-Specific Syntax: Remember that some string functions have different names or slightly different syntax in different database systems. Always consult the documentation for your specific database.
- SQL Injection: Avoid creating dynamic SQL queries by concatenating strings directly from user input. Use parameterized queries or prepared statements to prevent SQL injection vulnerabilities.
- Forgetting to TRIM: When comparing strings, especially when pulling data from various sources, remember to
TRIMwhitespace from both sides to ensure a consistent comparison. - Assuming Fixed Lengths: Don’t assume that strings will always have a fixed length. Use
LENGTHto determine the actual length of a string before performing operations on it. - Incorrect
POSITIONUsage: When usingPOSITION(or its equivalent), remember that it returns the starting position of the substring. Add 1 to the position if you want to start extracting from the character after the substring. Also, make sure to handle the case where the substring is not found (returns 0). - Missing Error Handling: When manipulating strings, consider adding error handling to your queries to gracefully handle unexpected data formats or invalid input. For example, you might use
CASEstatements to check for specific conditions before performing a string operation.
7. Database Variations
Section titled “7. Database Variations”| Function | MySQL | PostgreSQL | SQL Server | Oracle | Notes |
|---|---|---|---|---|---|
SUBSTRING | SUBSTR, MID, SUBSTRING | SUBSTRING, SUBSTR | SUBSTRING | SUBSTR | |
CONCAT | CONCAT | CONCAT, ` | ` | CONCAT, + | |
LENGTH | LENGTH | LENGTH, CHAR_LENGTH | LEN | LENGTH | SQL Server’s LEN excludes trailing spaces. |
REPLACE | REPLACE | REPLACE | REPLACE | REPLACE | |
LPAD | LPAD | LPAD | LEFT(REPLICATE(pad,len) + str, len) | LPAD | SQL Server requires a workaround using REPLICATE and LEFT. |
RPAD | RPAD | RPAD | LEFT(str + REPLICATE(pad,len), len) | RPAD | SQL Server requires a workaround using REPLICATE and LEFT. |
TRIM | TRIM, LTRIM, RTRIM | TRIM, LTRIM, RTRIM | TRIM, LTRIM, RTRIM | TRIM, LTRIM, RTRIM | TRIM may require a newer version of SQL Server (2017+). LTRIM and RTRIM are more widely supported. |
UPPER | UPPER | UPPER | UPPER | UPPER | |
LOWER | LOWER | LOWER | LOWER | LOWER | |
POSITION | LOCATE | STRPOS | CHARINDEX | INSTR | POSITION is standard SQL, other databases have their own equivalents. The order of arguments may differ. CHARINDEX searches string for substring, INSTR searches string for substring. |
| Case Sensitivity | Depends on collation settings. | Depends on collation settings. | Depends on collation settings. | Depends on collation settings. | Use COLLATE clause for explicit control. |
| NULL Handling | CONCAT(NULL, 'str') returns NULL. | CONCAT(NULL, 'str') returns NULL. | NULL + 'str' returns NULL, CONCAT handles NULLs. | `NULL |