☁️Cloud, Big Data, and SQL Extensions

This module covers SQL in the Cloud, NoSQL and Hybrid Models, SQL Extensions and Variants.

📘 SQL in the Cloud

When working with SQL in the cloud, understanding the differences between major platforms is crucial for optimizing performance and reducing costs.

💡 Cloud SQL Platforms Overview

`BigQuery`: Best for analytics on large datasets with pay-as-you-go pricing.
`Redshift`: Designed for petabyte-scale data warehouses with columnar storage.
`Snowflake`: Supports hybrid workloads with zero-copy cloning and elastic scaling.
`Azure Synapse`: Integrates analytics with cloud data integration services.

✅ Key Differences Between Platforms

Each platform has unique features that make them better suited for specific use cases:

BigQuery uses a serverless architecture and supports nested and repeated fields.
Redshift offers materialized views, external tables, and Spectrum capabilities.
Snowflake provides time-travel queries and automatic tuning features.
Azure Synapse integrates with Power BI for visualization and reporting.

💡 Query Pricing Models Explained

Pricing varies based on usage patterns:

BigQuery charges per query byte processed and for storage.
Redshift uses a provisioned node pricing model.
Snowflake has credits-based pricing with daily usage limits.
Azure Synapse has both consumption-based and reserved capacity pricing.

✅ Optimizing Cloud SQL Queries

Best practices for efficient query execution:

Use appropriate indexing strategies.
Partition tables to reduce query scan size.
Optimize join operations with proper table distribution.
Leverage caching mechanisms when available.

✅ Example: Partitioning in BigQuery

CREATE TABLE sales (
  transaction_id INT,
  amount FLOAT,
  transaction_date DATE
)
PARTITION BY RANGE (transaction_date) (
  START ('2023-01-01') END ('2024-01-01') EVERY INTERVAL '1' MONTH);

❌ Common Mistakes to Avoid

Don't use overly complex joins without proper testing.
Avoid full table scans on large datasets.
Don't ignore partitioning and distribution strategies.

📘 NoSQL and Hybrid Models

In this chapter, we'll explore the intersection of SQL and NoSQL databases. While traditional SQL databases are excellent for structured data, modern applications often require handling unstructured or semi-structured data at scale. This is where NoSQL systems shine. We'll learn how to use SQL-like queries in NoSQL systems such as <code>Cassandra CQL</code> and <code>MongoDB Atlas SQL</code>, enabling you to work with both relational and non-relational data effectively.

💡 What is NoSQL?

NoSQL databases store unstructured or semi-structured data in formats like JSON, XML, etc.
Common types include document stores (e.g., MongoDB), key-value stores (e.g., Redis), column-family stores (e.g., Cassandra), and graph databases (e.g., Neo4j)
NoSQL systems often trade consistency for availability and partition tolerance (CAP theorem)

💡 Hybrid Database Models

Hybrid databases combine the best of both worlds - they support both SQL and NoSQL operations. Examples include:

<code>MongoDB Atlas SQL</code>: Supports standard SQL queries alongside MongoDB's document model
<code>Cassandra CQL</code>: Uses a SQL-like language optimized for distributed systems
Amazon DynamoDB: Supports both key-value and document models with flexible querying

✅ Using SQL-like Queries in NoSQL Systems

/* MongoDB Atlas SQL Example */
SELECT * FROM users WHERE age > 25;

/* Cassandra CQL Example */
CREATE TABLE IF NOT EXISTS users (
    user_id uuid PRIMARY KEY,
    name text,
    email text
);

💡 Best Practices for Working with Hybrid Models

Understand your data model and access patterns before choosing a database
Use SQL when you need complex queries, joins, or transactional consistency
Leverage NoSQL capabilities for high availability, scalability, and handling unstructured data
Index wisely - over-indexing can degrade performance in distributed systems

❌ Common Pitfalls to Avoid

Don't force a relational model onto NoSQL data (avoid 'object-relational impedance mismatch')
Be cautious with cross-datacenter replication and latency
Avoid deep nesting in documents - denormalize where appropriate
Don't ignore data consistency entirely - find the right balance for your use case

💡 Key Considerations When Choosing a Database

Data structure and access patterns
Scalability requirements (horizontal vs vertical scaling)
Consistency needs (ACID vs BASE)
Latency tolerance
Cost considerations (e.g., managed services vs self-hosted)

📘 SQL Extensions and Variants

SQL extensions are procedural languages that extend the capabilities of SQL, allowing developers to create complex logic within the database itself. These extensions vary across different database systems such as PostgreSQL (PL/pgSQL), Microsoft SQL Server (T-SQL), and MySQL.

💡 Key Features of SQL Extensions

Procedural programming capabilities (variables, loops, conditionals)
Support for creating stored procedures, functions, and triggers
Ability to handle complex business logic directly in the database
Integration with database-specific features

💡 PL/pgSQL - PostgreSQL Procedural Language

PL/pgSQL is the procedural language for PostgreSQL. It provides full access to all PostgreSQL data types, functions, and operators.

💡 Key PL/pgSQL Features

Block structure with BEGIN/END
Support for variables (DECLARE)
Exception handling (EXCEPTION)
Cursor operations (FOR, USING)
Function overloading

CREATE OR REPLACE FUNCTION calculate_total(price DECIMAL, tax_rate DECIMAL) RETURNS DECIMAL AS $$
DECLARE
  total DECIMAL;
BEGIN
  total = price * (1 + tax_rate);
  RETURN total;
END;
$$ LANGUAGE plpgsql;

✅ Best Practices for PL/pgSQL

Use SET search_path to control schema access
Avoid using dynamic SQL when possible
Optimize loops and cursors
Use proper exception handling

❌ Common Mistakes in PL/pgSQL

Forgetting to declare variables
Using implicit transaction management
Not handling exceptions properly
Overusing dynamic SQL

💡 T-SQL - Transact-SQL for Microsoft SQL Server

T-SQL is the procedural language used in Microsoft SQL Server. It combines SQL with procedural programming elements.

💡 Key T-SQL Features

Support for variables using DECLARE and SET
Conditional logic (IF-ELSE, CASE)
Loop constructs (WHILE, FOR)
Transaction control (COMMIT, ROLLBACK)
Integration with SQL Agent jobs

CREATE PROCEDURE CalculateTotal (
  @price DECIMAL(10,2),
  @tax_rate DECIMAL(10,2)
)
AS
BEGIN
  DECLARE @total DECIMAL(10,2);
  SET @total = @price * (1 + @tax_rate);
  RETURN @total;
END;

✅ Best Practices for T-SQL

Use proper indexing strategies
Avoid cursors unless necessary
Optimize query plans
Use TRY...CATCH blocks for error handling

💡 MySQL Procedural Extensions

MySQL supports procedural programming through stored routines (procedures and functions) using its own extension of SQL.

💡 Key MySQL Features

Procedural statements in stored routines
Variable declaration with DECLARE
Cursor operations (OPEN, FETCH)
Exception handling (DECLARE HANDLER)
Support for triggers and events

DELIMITER $$

CREATE PROCEDURE CalculateTotal (
  IN price DECIMAL(10,2),
  IN tax_rate DECIMAL(10,2)
)
BEGIN
  DECLARE total DECIMAL(10,2);
  SET total = price * (1 + tax_rate);
  SELECT total;
END$$

DELIMITER ;

✅ Best Practices for MySQL Procedural Programming

Optimize stored routines for performance
Use proper error handling
Avoid using global variables
Test routines thoroughly before deployment

Quiz

Question 1 of 16

Which cloud SQL platform is best suited for nested and repeated fields?

BigQuery
Redshift
Snowflake
Azure Synapse