Microsoft Certified: Azure Data Engineer Associate Fundamentals — Quiz 1
Microsoft Certified: Azure Data Engineer Associate Fundamentals — Quiz 1 — Study Guide
Azure Data Engineer Associate Fundamentals — Quiz 1 Study Guide
Understanding Azure's data services is essential for anyone building modern data pipelines and analytics platforms. Whether you're migrating on-premises workloads to the cloud or designing new architectures from scratch, knowing which Azure service fits which scenario — and why — is the foundation of the Data Engineer Associate certification.
Azure Storage Solutions
Blob Storage vs. Data Lake Storage Gen2
Azure Blob Storage is the go-to service for storing massive amounts of unstructured data — images, videos, backups, and log files. Think of it as a giant, cheap file cabinet in the cloud.
Azure Data Lake Storage Gen2 (ADLS Gen2) builds on top of Blob Storage but adds a critical feature: the hierarchical namespace (HNS). This organizes data into a true directory tree (like a file system), rather than a flat key-value store. This matters because:
Analogy: Blob Storage is like a warehouse with numbered bins. ADLS Gen2 is like a warehouse with organized shelves, aisles, and labeled sections — much easier to navigate and secure.
Best use case for initial JSON data landing: ADLS Gen2. It handles semi-structured data efficiently and integrates seamlessly with downstream analytics tools.
Data Tiering and Cost Optimization
Blob Storage offers data tiering to reduce costs based on how often you access data:
| Tier | Use Case | Cost |
|---|---|---|
| Hot | Frequently accessed data | Higher storage, lower access |
| Cool | Infrequently accessed (30+ days) | Lower storage, higher access |
| Archive | Rarely accessed (180+ days) | Lowest storage, highest access latency |
Relational Database Services
Azure SQL Database
Azure SQL Database is a fully managed, cloud-native relational database (PaaS). It handles patching, backups, and high availability automatically. It supports full ACID transactions — Atomicity, Consistency, Isolation, Durability — ensuring data integrity.
Deployment Options
Azure SQL comes in three deployment options:
| Option | Control Level | Best For |
|---|---|---|
| Single Database | Low | Independent apps needing dedicated resources |
| Elastic Pools | Medium | Multiple databases with variable workloads |
| Managed Instance | High | Lift-and-shift migrations needing near 100% SQL Server compatibility |
Managed Instance gives you the most control over infrastructure and is ideal for migration scenarios where your app relies on SQL Server-specific features (like linked servers or CLR).
Security Features
-- Example: Row-Level Security policy
CREATE SECURITY POLICY SalesFilter
ADD FILTER PREDICATE dbo.fn_SecurityPredicate(SalesRegion)
ON dbo.Sales
WITH (STATE = ON);NoSQL and Distributed Databases
Azure Cosmos DB
Cosmos DB is Azure's globally distributed NoSQL database. It's designed for applications requiring low latency (single-digit millisecond reads/writes) at any scale. It supports multiple APIs including SQL, MongoDB, Cassandra, and Gremlin.
Key strengths:
Analogy: If Azure SQL is a precise, rule-following accountant, Cosmos DB is a fast, globally distributed courier — optimized for speed and reach over strict structure.
Analytics Services
Azure Synapse Analytics
Synapse Analytics is an integrated analytics platform combining data warehousing, big data, and data integration. Key features:
-- Serverless SQL: Query a CSV file in ADLS Gen2
SELECT TOP 10 *
FROM OPENROWSET(
BULK 'https://mydatalake.dfs.core.windows.net/data/sales/*.csv',
FORMAT = 'CSV',
HEADER_ROW = TRUE
) AS [result];Partitioning in Synapse improves query performance by dividing large tables into smaller, manageable segments. Queries that filter on the partition column skip irrelevant partitions entirely — a technique called *partition pruning*.
Data Governance and Management
Azure Purview
Azure Purview is Microsoft's unified data governance service. It helps organizations:
Think of Purview as the "library card catalog" for all your organization's data.
Troubleshooting and Query Performance Tips
When diagnosing slow queries or ingestion issues, consider: