☁️Cloud, Big Data, and SQL Extensions

This module covers SQL in the Cloud, NoSQL and Hybrid Models, SQL Extensions and Variants.

📘 SQL in the Cloud

When working with SQL in the cloud, understanding the differences between major platforms is crucial for optimizing performance and reducing costs.

💡 Cloud SQL Platforms Overview

  • `BigQuery`: Best for analytics on large datasets with pay-as-you-go pricing.
  • `Redshift`: Designed for petabyte-scale data warehouses with columnar storage.
  • `Snowflake`: Supports hybrid workloads with zero-copy cloning and elastic scaling.
  • `Azure Synapse`: Integrates analytics with cloud data integration services.

Key Differences Between Platforms

Each platform has unique features that make them better suited for specific use cases:

  • BigQuery uses a serverless architecture and supports nested and repeated fields.
  • Redshift offers materialized views, external tables, and Spectrum capabilities.
  • Snowflake provides time-travel queries and automatic tuning features.
  • Azure Synapse integrates with Power BI for visualization and reporting.

💡 Query Pricing Models Explained

Pricing varies based on usage patterns:

  • BigQuery charges per query byte processed and for storage.
  • Redshift uses a provisioned node pricing model.
  • Snowflake has credits-based pricing with daily usage limits.
  • Azure Synapse has both consumption-based and reserved capacity pricing.

Optimizing Cloud SQL Queries

Best practices for efficient query execution:

  • Use appropriate indexing strategies.
  • Partition tables to reduce query scan size.
  • Optimize join operations with proper table distribution.
  • Leverage caching mechanisms when available.

Example: Partitioning in BigQuery

CREATE TABLE sales (
  transaction_id INT,
  amount FLOAT,
  transaction_date DATE
)
PARTITION BY RANGE (transaction_date) (
  START ('2023-01-01') END ('2024-01-01') EVERY INTERVAL '1' MONTH);

Common Mistakes to Avoid

  • Don't use overly complex joins without proper testing.
  • Avoid full table scans on large datasets.
  • Don't ignore partitioning and distribution strategies.

📘 NoSQL and Hybrid Models

In this chapter, we'll explore the intersection of SQL and NoSQL databases. While traditional SQL databases are excellent for structured data, modern applications often require handling unstructured or semi-structured data at scale. This is where NoSQL systems shine. We'll learn how to use SQL-like queries in NoSQL systems such as <code>Cassandra CQL</code> and <code>MongoDB Atlas SQL</code>, enabling you to work with both relational and non-relational data effectively.

💡 What is NoSQL?

  • NoSQL databases store unstructured or semi-structured data in formats like JSON, XML, etc.
  • Common types include document stores (e.g., MongoDB), key-value stores (e.g., Redis), column-family stores (e.g., Cassandra), and graph databases (e.g., Neo4j)
  • NoSQL systems often trade consistency for availability and partition tolerance (CAP theorem)

💡 Hybrid Database Models

Hybrid databases combine the best of both worlds - they support both SQL and NoSQL operations. Examples include:

  • <code>MongoDB Atlas SQL</code>: Supports standard SQL queries alongside MongoDB's document model
  • <code>Cassandra CQL</code>: Uses a SQL-like language optimized for distributed systems
  • Amazon DynamoDB: Supports both key-value and document models with flexible querying

Using SQL-like Queries in NoSQL Systems

/* MongoDB Atlas SQL Example */
SELECT * FROM users WHERE age > 25;
/* Cassandra CQL Example */
CREATE TABLE IF NOT EXISTS users (
    user_id uuid PRIMARY KEY,
    name text,
    email text
);

💡 Best Practices for Working with Hybrid Models

  • Understand your data model and access patterns before choosing a database
  • Use SQL when you need complex queries, joins, or transactional consistency
  • Leverage NoSQL capabilities for high availability, scalability, and handling unstructured data
  • Index wisely - over-indexing can degrade performance in distributed systems

Common Pitfalls to Avoid

  • Don't force a relational model onto NoSQL data (avoid 'object-relational impedance mismatch')
  • Be cautious with cross-datacenter replication and latency
  • Avoid deep nesting in documents - denormalize where appropriate
  • Don't ignore data consistency entirely - find the right balance for your use case

💡 Key Considerations When Choosing a Database

  • Data structure and access patterns
  • Scalability requirements (horizontal vs vertical scaling)
  • Consistency needs (ACID vs BASE)
  • Latency tolerance
  • Cost considerations (e.g., managed services vs self-hosted)

📘 SQL Extensions and Variants

SQL extensions are procedural languages that extend the capabilities of SQL, allowing developers to create complex logic within the database itself. These extensions vary across different database systems such as PostgreSQL (PL/pgSQL), Microsoft SQL Server (T-SQL), and MySQL.

💡 Key Features of SQL Extensions

  • Procedural programming capabilities (variables, loops, conditionals)
  • Support for creating stored procedures, functions, and triggers
  • Ability to handle complex business logic directly in the database
  • Integration with database-specific features

💡 PL/pgSQL - PostgreSQL Procedural Language

PL/pgSQL is the procedural language for PostgreSQL. It provides full access to all PostgreSQL data types, functions, and operators.

💡 Key PL/pgSQL Features

  • Block structure with BEGIN/END
  • Support for variables (DECLARE)
  • Exception handling (EXCEPTION)
  • Cursor operations (FOR, USING)
  • Function overloading
CREATE OR REPLACE FUNCTION calculate_total(price DECIMAL, tax_rate DECIMAL) RETURNS DECIMAL AS $$
DECLARE
  total DECIMAL;
BEGIN
  total = price * (1 + tax_rate);
  RETURN total;
END;
$$ LANGUAGE plpgsql;

Best Practices for PL/pgSQL

  • Use SET search_path to control schema access
  • Avoid using dynamic SQL when possible
  • Optimize loops and cursors
  • Use proper exception handling

Common Mistakes in PL/pgSQL

  • Forgetting to declare variables
  • Using implicit transaction management
  • Not handling exceptions properly
  • Overusing dynamic SQL

💡 T-SQL - Transact-SQL for Microsoft SQL Server

T-SQL is the procedural language used in Microsoft SQL Server. It combines SQL with procedural programming elements.

💡 Key T-SQL Features

  • Support for variables using DECLARE and SET
  • Conditional logic (IF-ELSE, CASE)
  • Loop constructs (WHILE, FOR)
  • Transaction control (COMMIT, ROLLBACK)
  • Integration with SQL Agent jobs
CREATE PROCEDURE CalculateTotal (
  @price DECIMAL(10,2),
  @tax_rate DECIMAL(10,2)
)
AS
BEGIN
  DECLARE @total DECIMAL(10,2);
  SET @total = @price * (1 + @tax_rate);
  RETURN @total;
END;

Best Practices for T-SQL

  • Use proper indexing strategies
  • Avoid cursors unless necessary
  • Optimize query plans
  • Use TRY...CATCH blocks for error handling

💡 MySQL Procedural Extensions

MySQL supports procedural programming through stored routines (procedures and functions) using its own extension of SQL.

💡 Key MySQL Features

  • Procedural statements in stored routines
  • Variable declaration with DECLARE
  • Cursor operations (OPEN, FETCH)
  • Exception handling (DECLARE HANDLER)
  • Support for triggers and events
DELIMITER $$

CREATE PROCEDURE CalculateTotal (
  IN price DECIMAL(10,2),
  IN tax_rate DECIMAL(10,2)
)
BEGIN
  DECLARE total DECIMAL(10,2);
  SET total = price * (1 + tax_rate);
  SELECT total;
END$$

DELIMITER ;

Best Practices for MySQL Procedural Programming

  • Optimize stored routines for performance
  • Use proper error handling
  • Avoid using global variables
  • Test routines thoroughly before deployment

Quiz

Question 1 of 16

Which cloud SQL platform is best suited for nested and repeated fields?

  • BigQuery
  • Redshift
  • Snowflake
  • Azure Synapse