User-Defined Functions

Difficulty: Intermediate
Generated on: 2025-07-10 02:28:18
Category: SQL Cheatsheet for Database Development

SQL User-Defined Functions (UDFs) Cheatsheet

1. Quick Overview

User-Defined Functions (UDFs) are routines you create to extend the functionality of SQL. They encapsulate complex logic, promote code reuse, and improve query readability. Use UDFs when you need to perform calculations, data transformations, or validations repeatedly within your SQL queries.

When to Use:

Code Reusability: Avoid duplication by encapsulating logic in a function.
Complex Calculations: Simplify queries by offloading calculations to a function.
Data Transformations: Standardize data formats consistently.
Data Validation: Ensure data integrity by validating inputs within the function.
Improved Readability: Break down complex SQL logic into smaller, manageable functions.

2. Syntax

The specific syntax varies slightly between database systems, but the general structure remains consistent.

SQL Server:

-- Scalar Function (returns a single value)
CREATE FUNCTION dbo.FunctionName (
    @Parameter1 DataType,
    @Parameter2 DataType = DefaultValue -- Optional parameter with default value
)
RETURNS ReturnDataType
AS
BEGIN
    -- Function Body: SQL statements to perform the calculation
    -- or transformation.  Must end with a RETURN statement.
    DECLARE @Result ReturnDataType;

    -- Your logic here
    SET @Result = @Parameter1 * @Parameter2;

    RETURN @Result;
END;
GO

-- Table-Valued Function (returns a table)
CREATE FUNCTION dbo.FunctionName_Table (
    @Parameter1 DataType
)
RETURNS TABLE
AS
RETURN
(
    -- Select statement that defines the table structure and data
    SELECT Column1, Column2
    FROM TableName
    WHERE Condition = @Parameter1
);
GO

PostgreSQL:

-- Scalar Function
CREATE OR REPLACE FUNCTION FunctionName (
    Parameter1 DataType,
    Parameter2 DataType DEFAULT DefaultValue  -- Optional parameter with default value
)
RETURNS ReturnDataType
AS $$
BEGIN
    -- Function Body: SQL statements to perform the calculation
    -- or transformation.  Must end with a RETURN statement.
    DECLARE Result ReturnDataType;

    -- Your logic here
    Result := Parameter1 * Parameter2;

    RETURN Result;
END;
$$ LANGUAGE plpgsql;


-- Table-Valued Function
CREATE OR REPLACE FUNCTION FunctionName_Table (
    Parameter1 DataType
)
RETURNS TABLE (
    Column1 DataType,
    Column2 DataType
)
AS $$
BEGIN
    RETURN QUERY
    SELECT Column1, Column2
    FROM TableName
    WHERE Condition = Parameter1;
END;
$$ LANGUAGE plpgsql;

MySQL:

-- Scalar Function
DELIMITER //
CREATE FUNCTION FunctionName (
    Parameter1 DataType,
    Parameter2 DataType -- No default values allowed directly in parameters
)
RETURNS ReturnDataType
DETERMINISTIC  -- Required for functions that always return the same result for the same input
BEGIN
    -- Function Body: SQL statements to perform the calculation
    -- or transformation.  Must end with a RETURN statement.
    DECLARE Result ReturnDataType;

    -- Your logic here
    SET Result = Parameter1 * Parameter2;

    RETURN Result;
END //
DELIMITER ;

Oracle:

CREATE OR REPLACE FUNCTION FunctionName (
    Parameter1 DataType,
    Parameter2 DataType DEFAULT DefaultValue -- Optional parameter with default value
)
RETURN ReturnDataType
AS
    Result ReturnDataType;
BEGIN
    -- Function Body: SQL statements to perform the calculation
    -- or transformation.  Must end with a RETURN statement.

    -- Your logic here
    Result := Parameter1 * Parameter2;

    RETURN Result;
END;
/

3. Common Use Cases

Calculating Age from Date of Birth:

-- SQL Server Example
CREATE FUNCTION dbo.CalculateAge (@DateOfBirth DATE)
RETURNS INT
AS
BEGIN
    DECLARE @Age INT;
    SET @Age = DATEDIFF(year, @DateOfBirth, GETDATE());
    -- Adjust for leap years and birthdate not yet passed
    IF (MONTH(@DateOfBirth) > MONTH(GETDATE())) OR
       (MONTH(@DateOfBirth) = MONTH(GETDATE()) AND DAY(@DateOfBirth) > DAY(GETDATE()))
        SET @Age = @Age - 1;
    RETURN @Age;
END;
GO

SELECT dbo.CalculateAge('1990-05-15'); -- Returns the age as of today

Formatting Phone Numbers:

-- PostgreSQL Example
CREATE OR REPLACE FUNCTION FormatPhoneNumber (PhoneNumber TEXT)
RETURNS TEXT
AS $$
BEGIN
    -- Remove non-numeric characters and format as (XXX) XXX-XXXX
    PhoneNumber := regexp_replace(PhoneNumber, '[^0-9]+', '', 'g');
    IF LENGTH(PhoneNumber) = 10 THEN
        RETURN '(' || SUBSTRING(PhoneNumber, 1, 3) || ') ' ||
               SUBSTRING(PhoneNumber, 4, 3) || '-' ||
               SUBSTRING(PhoneNumber, 7, 4);
    ELSE
        RETURN PhoneNumber; -- Return original if not a valid phone number
    END IF;
END;
$$ LANGUAGE plpgsql;

SELECT FormatPhoneNumber('123-456-7890'); -- Returns (123) 456-7890
SELECT FormatPhoneNumber('1234567890');   -- Returns (123) 456-7890
SELECT FormatPhoneNumber('12345');         -- Returns 12345

Splitting a Comma-Separated String into a Table:

-- SQL Server Example (Table-Valued Function)
CREATE FUNCTION dbo.SplitString (@InputString VARCHAR(MAX), @Delimiter VARCHAR(1))
RETURNS @OutputTable TABLE (Value VARCHAR(MAX))
AS
BEGIN
    DECLARE @StartIndex INT, @EndIndex INT;
    SET @StartIndex = 1;
    IF SUBSTRING(@InputString, LEN(@InputString) - 1, 1) <> @Delimiter
    BEGIN
        SET @InputString = @InputString + @Delimiter;
    END;

    WHILE CHARINDEX(@Delimiter, @InputString, @StartIndex) > 0
    BEGIN
        SET @EndIndex = CHARINDEX(@Delimiter, @InputString, @StartIndex);

        INSERT INTO @OutputTable(Value)
        SELECT SUBSTRING(@InputString, @StartIndex, @EndIndex - @StartIndex);

        SET @StartIndex = @EndIndex + 1;
    END;

    RETURN;
END;
GO

SELECT * FROM dbo.SplitString('apple,banana,cherry', ',');

Calculating Sales Tax based on State:

-- MySQL Example
DELIMITER //
CREATE FUNCTION CalculateSalesTax (Price DECIMAL(10, 2), StateCode VARCHAR(2))
RETURNS DECIMAL(10, 2)
DETERMINISTIC
BEGIN
    DECLARE TaxRate DECIMAL(5, 4);
    SET TaxRate = CASE
        WHEN StateCode = 'CA' THEN 0.0725
        WHEN StateCode = 'NY' THEN 0.04
        ELSE 0.00
    END;
    RETURN Price * TaxRate;
END //
DELIMITER ;

SELECT CalculateSalesTax(100.00, 'CA'); -- Returns 7.25

4. Best Practices

Keep Functions Short and Focused: Each function should perform a single, well-defined task.
Avoid Excessive I/O: Minimize disk access and network calls within functions. Especially crucial for scalar functions, as they can be executed repeatedly within a query.
Use Deterministic Functions When Possible: Deterministic functions (always return the same result for the same input) can be optimized by the database engine. In MySQL, explicitly declare functions as DETERMINISTIC (or NOT DETERMINISTIC if they are not). In SQL Server, use SCHEMABINDING for deterministic functions.
Consider Table-Valued Functions over Scalar Functions for Large Datasets: Table-valued functions generally perform better for processing large amounts of data, especially in SQL Server.
Use Appropriate Data Types: Choose data types that accurately represent the data being processed to avoid data loss or unexpected behavior.
Handle Errors Gracefully: Implement error handling within the function to prevent unexpected failures.
Test Thoroughly: Thoroughly test your functions with various inputs, including edge cases and invalid data.
Use SCHEMABINDING in SQL Server (when possible): This improves performance by preventing changes to underlying objects that the function depends on.
Avoid Side Effects: Functions should ideally not modify data outside of their scope (e.g., updating tables). While technically possible, this can lead to unexpected behavior and make debugging difficult.
Index Usage: If a function is used in a WHERE clause, consider if the underlying table columns used in the function can be indexed to improve query performance.

5. Examples

Sample Data:

-- Example Table (SQL Server)
CREATE TABLE Employees (
    EmployeeID INT PRIMARY KEY,
    FirstName VARCHAR(50),
    LastName VARCHAR(50),
    Salary DECIMAL(10, 2),
    DepartmentID INT
);

INSERT INTO Employees (EmployeeID, FirstName, LastName, Salary, DepartmentID) VALUES
(1, 'John', 'Doe', 60000.00, 1),
(2, 'Jane', 'Smith', 75000.00, 2),
(3, 'David', 'Johnson', 55000.00, 1),
(4, 'Emily', 'Brown', 80000.00, 2);

-- Example Table (MySQL)
CREATE TABLE Departments (
    DepartmentID INT PRIMARY KEY,
    DepartmentName VARCHAR(50)
);

INSERT INTO Departments (DepartmentID, DepartmentName) VALUES
(1, 'Sales'),
(2, 'Marketing');

Example 1: Calculating Bonus (SQL Server)

-- Function to calculate bonus based on salary
CREATE FUNCTION dbo.CalculateBonus (@Salary DECIMAL(10, 2), @BonusPercentage DECIMAL(5, 2))
RETURNS DECIMAL(10, 2)
AS
BEGIN
    DECLARE @Bonus DECIMAL(10, 2);
    SET @Bonus = @Salary * (@BonusPercentage / 100);
    RETURN @Bonus;
END;
GO

-- Use the function in a query
SELECT EmployeeID, FirstName, LastName, Salary, dbo.CalculateBonus(Salary, 10) AS Bonus
FROM Employees;

Example 2: Getting Department Name (PostgreSQL)

-- Function to get department name based on DepartmentID
CREATE OR REPLACE FUNCTION GetDepartmentName (DepartmentID INT)
RETURNS VARCHAR(50)
AS $$
DECLARE
    DepartmentName VARCHAR(50);
BEGIN
    SELECT DepartmentName INTO DepartmentName
    FROM Departments
    WHERE DepartmentID = GetDepartmentName.DepartmentID; -- Use GetDepartmentName.DepartmentID to avoid naming conflict

    RETURN DepartmentName;
END;
$$ LANGUAGE plpgsql;

-- Use the function in a query
SELECT EmployeeID, FirstName, LastName, Salary, GetDepartmentName(DepartmentID) AS DepartmentName
FROM Employees;

Example 3: Converting to Title Case (MySQL)

DELIMITER //
CREATE FUNCTION TitleCase (InputString VARCHAR(255))
RETURNS VARCHAR(255)
DETERMINISTIC
BEGIN
    DECLARE Result VARCHAR(255);
    DECLARE i INT;
    DECLARE CharCode INT;

    SET Result = LOWER(InputString);
    SET i = 1;

    WHILE i <= LENGTH(InputString) DO
        IF i = 1 OR SUBSTRING(InputString, i - 1, 1) = ' ' THEN
            SET CharCode = ASCII(UPPER(SUBSTRING(InputString, i, 1)));
            IF CharCode >= 65 AND CharCode <= 90 THEN -- Check if it's a letter
                SET Result = INSERT(Result, i, 1, UPPER(SUBSTRING(InputString, i, 1)));
            END IF;
        END IF;
        SET i = i + 1;
    END WHILE;

    RETURN Result;
END //
DELIMITER ;

SELECT TitleCase('this is a test string'); -- Returns "This Is A Test String"

6. Common Pitfalls

Performance Issues with Scalar Functions: Scalar functions can significantly impact query performance if they are complex or called repeatedly within a query. Consider alternatives like table-valued functions or views for better performance.
Nondeterministic Functions: Using nondeterministic functions (e.g., those using GETDATE() or RAND()) in indexed views or computed columns can lead to incorrect results. MySQL requires you to explicitly declare functions as NOT DETERMINISTIC if they are not deterministic.
Incorrect Data Types: Using incorrect data types can lead to data loss, unexpected results, or errors.
Security Vulnerabilities (SQL Injection): When using user-supplied input in functions, sanitize the input to prevent SQL injection attacks. Use parameterized queries or appropriate escaping mechanisms.
Naming Conflicts: Avoid naming conflicts between function parameters and table columns. Use qualified names (e.g., FunctionName.ParameterName) to resolve conflicts.
Infinite Recursion: Be careful when calling functions recursively to avoid infinite recursion, which can crash the database server.
Lack of Error Handling: Failing to handle errors within a function can lead to unexpected failures and make debugging difficult. Use TRY...CATCH blocks (SQL Server), EXCEPTION blocks (PostgreSQL), or conditional logic to handle errors gracefully.
Implicit Conversions: Relying on implicit data type conversions can lead to unexpected results. Be explicit with CAST or CONVERT.

Troubleshooting Tips:

Check the Error Log: Examine the database server’s error log for detailed information about errors that occur within functions.
Use Debugging Tools: Use the database server’s debugging tools (if available) to step through the function’s code and identify the source of the problem.
Simplify the Function: If the function is complex, try simplifying it to isolate the problem.
Test with Different Inputs: Test the function with different inputs, including edge cases and invalid data, to identify potential issues.
Review the Function’s Definition: Carefully review the function’s definition to ensure that it is correct and that it is using the appropriate data types and logic.

7. Database Variations

SQL Server:
- Supports scalar, inline table-valued, and multi-statement table-valued functions.
- Uses SCHEMABINDING for performance optimization of deterministic functions.
- Uses GO to separate batches of SQL statements.
- DATEDIFF function is used for date calculations.
PostgreSQL:
- Uses CREATE OR REPLACE FUNCTION to create or update functions.
- Uses $$ to delimit the function body.
- Uses LANGUAGE plpgsql to specify the procedural language.
- Uses RETURN QUERY for table-valued functions.
- Supports more advanced data types and functions than some other databases.
MySQL:
- Requires DETERMINISTIC (or NOT DETERMINISTIC) to be specified.
- Does not allow default values for function parameters directly.
- Uses DELIMITER to change the statement delimiter.
- Has limitations on the types of statements that can be used within functions.
Oracle:
- Uses / to execute the function definition.
- Requires explicit RETURN statement with the return value.
- Uses PL/SQL as the procedural language.
- Has a strong focus on security and data integrity.

This cheatsheet provides a comprehensive overview of SQL User-Defined Functions, covering syntax, use cases, best practices, examples, common pitfalls, and database variations. It’s designed to be a practical resource for developers working with UDFs in various SQL environments. Remember to consult your specific database system’s documentation for the most up-to-date and detailed information.