Skip to content

JSON and XML Data Handling

Difficulty: Advanced
Generated on: 2025-07-10 02:35:42
Category: SQL Cheatsheet for Database Development


SQL Cheatsheet: JSON and XML Data Handling (Advanced)

Section titled “SQL Cheatsheet: JSON and XML Data Handling (Advanced)”

JSON (JavaScript Object Notation): A lightweight data-interchange format, easy for humans to read and write, and easy for machines to parse and generate. Useful for storing semi-structured data, APIs, and configuration files.

XML (Extensible Markup Language): A markup language that defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. Useful for complex data structures, document storage, and interoperability between systems.

When to Use:

  • JSON: Preferred for web APIs, mobile applications, and scenarios where simplicity and performance are crucial. Better choice for modern applications and microservices.
  • XML: Suitable for applications requiring strict schema validation, document-centric data, or legacy system integration. Often used in enterprise environments with existing XML infrastructure.
-- PostgreSQL
SELECT json_build_object('name', 'John Doe', 'age', 30, 'city', 'New York'); -- Creates a JSON object
SELECT json_array_elements('[1, 2, 3]'); -- Expands a JSON array into rows
-- MySQL 8.0+
SELECT JSON_OBJECT('name', 'John Doe', 'age', 30, 'city', 'New York');
SELECT JSON_ARRAYAGG(value) FROM (SELECT 1 AS value UNION ALL SELECT 2 UNION ALL SELECT 3) AS values; -- Creates a JSON array from rows
-- SQL Server 2016+
SELECT JSON_OBJECT('name': 'John Doe', 'age': 30, 'city': 'New York'); -- SQL Server 2022+
SELECT value FROM OPENJSON('[1, 2, 3]'); -- Expands a JSON array into rows
-- Oracle 12c+
SELECT JSON_OBJECT('name' VALUE 'John Doe', 'age' VALUE 30, 'city' VALUE 'New York') FROM dual;
SELECT value FROM JSON_TABLE('[1, 2, 3]', '$[*]' COLUMNS (value VARCHAR2(10) PATH '$')); -- Expands a JSON array into rows. Needs JSON_TABLE.
-- SQL Server
SELECT CAST('<root><name>John Doe</name><age>30</age></root>' AS XML); -- Cast a string to XML
SELECT Nodes.query('./name/text()').value('.', 'VARCHAR(100)') AS Name,
Nodes.query('./age/text()').value('.', 'INT') AS Age
FROM (SELECT CAST('<root><name>John Doe</name><age>30</age></root>' AS XML) AS XMLData) AS T
CROSS APPLY XMLData.nodes('/root') AS T1(Nodes);
-- Oracle (Requires XMLTABLE)
SELECT XMLType('<root><name>John Doe</name><age>30</age></root>') AS xml_data FROM dual;
SELECT xt.*
FROM XMLTable('/root'
PASSING XMLType('<root><name>John Doe</name><age>30</age></root>')
COLUMNS name VARCHAR2(100) PATH '/root/name',
age NUMBER PATH '/root/age') xt;
  • Storing Configuration Data: Store application configurations in JSON/XML format within database columns.
  • API Integration: Parse JSON/XML responses from external APIs.
  • Logging: Store log events as JSON documents for easier querying and analysis.
  • E-commerce Product Catalogs: Represent product attributes and variations in a flexible JSON structure.
  • Document Management: Store and retrieve XML documents with metadata.
  • Data Exchange: Facilitate data exchange between different systems using XML as a standard format.
  • Indexing: Index JSON/XML columns to improve query performance. Specifically, consider creating indexes on frequently accessed attributes within the JSON/XML document. (e.g. CREATE INDEX idx_product_name ON products((data->>'name')); in PostgreSQL)
  • Data Type Considerations: Choose the appropriate database data type for storing JSON/XML data (e.g., JSONB in PostgreSQL for optimized storage and querying). JSONB is generally preferred over JSON in PostgreSQL because it parses the JSON during insertion, making queries faster.
  • Validation: Validate JSON/XML documents against a schema to ensure data quality and consistency. Consider using JSON Schema or XML Schema Definition (XSD).
  • Avoid Large Documents: Break down very large JSON/XML documents into smaller, more manageable chunks.
  • Use Native Functions: Leverage native database functions for JSON/XML processing instead of relying on custom code. These functions are optimized for performance.
  • Security: Sanitize JSON/XML data before storing it in the database to prevent injection attacks.
  • Performance Testing: Thoroughly test queries that process JSON/XML data to identify and address performance bottlenecks.
  • Normalization (Considerations): While JSON/XML allows for denormalized data, consider the trade-offs. Over-denormalization can lead to data redundancy and inconsistencies. A balanced approach is often best.

Sample Table: products

iddata
1{"name": "Laptop", "price": 1200, "features": ["16GB RAM", "512GB SSD", "Intel Core i7"], "dimensions": {"width": 36, "height": 24, "depth": 2}}
2{"name": "Mouse", "price": 25, "features": ["Wireless", "Optical Sensor"], "dimensions": {"width": 6, "height": 4, "depth": 3}}
3{"name": "Keyboard", "price": 75, "features": ["Mechanical", "RGB Backlit"], "dimensions": {"width": 44, "height": 15, "depth": 4}, "variants": [{"color": "black", "layout": "US"}, {"color": "white", "layout": "UK"}]}

PostgreSQL Examples:

-- Extract product name
SELECT data ->> 'name' AS product_name FROM products;
-- Extract price as a number
SELECT (data ->> 'price')::numeric AS product_price FROM products;
-- Check if a product has a specific feature
SELECT id FROM products WHERE data @> '{"features": ["16GB RAM"]}';
-- Extract the width from the dimensions object
SELECT data -> 'dimensions' ->> 'width' AS width FROM products;
-- Extract all features from the JSON array
SELECT id, json_array_elements_text(data -> 'features') AS feature FROM products;
-- Filter products with a price greater than 100
SELECT id FROM products WHERE (data ->> 'price')::numeric > 100;
-- Update the price of a product
UPDATE products SET data = jsonb_set(data, '{price}', '1300'::jsonb) WHERE id = 1;
-- Add a new feature to the features array
UPDATE products SET data = jsonb_set(data, '{features}', (data -> 'features') || '["Bluetooth"]'::jsonb) WHERE id = 2;
-- Example using JSONB for faster lookups, assuming `data` column is of type `JSONB`
CREATE INDEX idx_product_name ON products((data ->> 'name'));
SELECT id FROM products WHERE data ->> 'name' = 'Laptop';

MySQL Examples:

-- Extract product name
SELECT JSON_EXTRACT(data, '$.name') AS product_name FROM products;
-- Extract price as a number (needs casting in application layer)
SELECT JSON_EXTRACT(data, '$.price') AS product_price FROM products;
-- Check if a product has a specific feature (more complex, requires JSON_CONTAINS)
SELECT id FROM products WHERE JSON_CONTAINS(data, '["16GB RAM"]', '$.features');
-- Extract the width from the dimensions object
SELECT JSON_EXTRACT(data, '$.dimensions.width') AS width FROM products;
-- Filter products with a price greater than 100 (needs casting in application layer)
SELECT id FROM products WHERE CAST(JSON_EXTRACT(data, '$.price') AS UNSIGNED) > 100;
-- Update the price of a product
UPDATE products SET data = JSON_SET(data, '$.price', 1300) WHERE id = 1;
-- Add a new feature to the features array (more complex, requires JSON_ARRAY_APPEND)
UPDATE products SET data = JSON_ARRAY_APPEND(data, '$.features', 'Bluetooth') WHERE id = 2;

SQL Server Examples:

-- Extract product name
SELECT JSON_VALUE(data, '$.name') AS product_name FROM products;
-- Extract price as a number (needs casting)
SELECT JSON_VALUE(data, '$.price') AS product_price FROM products;
-- Check if a product has a specific feature (requires OPENJSON and more complex logic)
SELECT p.id
FROM products p
CROSS APPLY OPENJSON(p.data, '$.features') WITH (feature VARCHAR(255) '$')
WHERE feature = '16GB RAM';
-- Extract the width from the dimensions object
SELECT JSON_VALUE(data, '$.dimensions.width') AS width FROM products;
-- Filter products with a price greater than 100
SELECT id FROM products WHERE CAST(JSON_VALUE(data, '$.price') AS DECIMAL(10,2)) > 100;
-- Update the price of a product
UPDATE products SET data = JSON_MODIFY(data, '$.price', 1300) WHERE id = 1;

Oracle Examples:

-- Extract product name
SELECT JSON_VALUE(data, '$.name') AS product_name FROM products;
-- Extract price as a number
SELECT JSON_VALUE(data, '$.price') AS product_price FROM products;
-- Check if a product has a specific feature (requires JSON_TABLE and more complex logic)
SELECT p.id
FROM products p,
JSON_TABLE(p.data, '$.features[*]' COLUMNS (feature VARCHAR2(255) PATH '$')) AS jt
WHERE jt.feature = '16GB RAM';
-- Extract the width from the dimensions object
SELECT JSON_VALUE(data, '$.dimensions.width') AS width FROM products;
-- Filter products with a price greater than 100
SELECT id FROM products WHERE TO_NUMBER(JSON_VALUE(data, '$.price')) > 100;
-- Update the price of a product
UPDATE products SET data = JSON_MERGEPATCH(data, '{"price": 1300}') WHERE id = 1;

Sample Table: articles

| id | xml_data | | 1 | <article><title>Introduction to XML</title><author>John Doe</author><content>This is the content of the article.</content></article>

SQL Server Examples:

-- Extract article title
SELECT xml_data.value('(/article/title)[1]', 'VARCHAR(100)') AS article_title FROM articles;
-- Extract article author
SELECT xml_data.value('(/article/author)[1]', 'VARCHAR(100)') AS article_author FROM articles;
-- Extract the entire XML content
SELECT CAST(xml_data AS VARCHAR(MAX)) AS xml_content FROM articles;
-- Filter articles by author
SELECT id FROM articles WHERE xml_data.exist('/article[author="John Doe"]') = 1;
-- Update the author of an article
UPDATE articles
SET xml_data.modify('replace value of (/article/author)[1] with "Jane Smith"')
WHERE id = 1;

Oracle Examples:

-- Extract article title
SELECT extract(xml_data, '/article/title/text()').getStringVal() AS article_title FROM articles;
-- Extract article author
SELECT extract(xml_data, '/article/author/text()').getStringVal() AS article_author FROM articles;
-- Extract the entire XML content
SELECT xml_data.getStringVal() AS xml_content FROM articles;
-- Filter articles by author (requires XMLTABLE)
SELECT p.id
FROM articles p,
XMLTable('/article'
PASSING p.xml_data
COLUMNS author VARCHAR2(100) PATH '/article/author') xt
WHERE xt.author = 'John Doe';
-- Update the author of an article (more complex, requires XMLTransform)
-- This is complex and might require using PL/SQL.
-- Example:
-- UPDATE articles SET xml_data = XMLTransform(xml_data, XMLType('<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
-- <xsl:template match="/"><xsl:copy><xsl:apply-templates select="@*|node()"/></xsl:copy></xsl:template>
-- <xsl:template match="/article/author"><author>Jane Smith</author></xsl:template>
-- </xsl:stylesheet>')) WHERE id = 1;
  • Incorrect JSON/XML Path Syntax: Using incorrect path expressions can lead to failed extractions or updates. Always double-check your path syntax.
  • Type Mismatches: Ensure that the data type of the extracted value matches the expected data type. Explicitly cast values when necessary.
  • Performance Issues with Large Documents: Processing very large JSON/XML documents can be resource-intensive. Consider optimizing queries or breaking down the documents.
  • Security Vulnerabilities: Failing to sanitize JSON/XML data can lead to injection attacks. Always validate and sanitize data before storing it in the database.
  • Database-Specific Syntax: Be aware of the differences in syntax and functionality between different database systems. Test your queries thoroughly on the target database.
  • Not Handling Null Values: Ensure you handle null values correctly when extracting data or performing comparisons. Use IS NULL or IS NOT NULL appropriately.
  • Incorrectly handling nested structures: Complex JSON/XML documents with deeply nested structures can be difficult to query efficiently. Consider using recursive queries or breaking down the query into smaller steps.
  • Ignoring Character Encoding: Ensure that the character encoding of JSON/XML data is consistent with the database’s character encoding. Incorrect encoding can lead to data corruption.

| Feature | MySQL 8.0+ | PostgreSQL 9.3+ | SQL Server 2016+ | Oracle 12c+