AWS Certified Data Analytics – Specialty Intermediate — Quiz 2
AWS Certified Data Analytics – Specialty Intermediate — Quiz 2 — Study Guide
AWS Data Analytics Security & Governance — Intermediate Study Guide
Data is only as valuable as it is trustworthy and protected. In the AWS ecosystem, a robust data analytics platform isn't just about processing speed — it's about knowing *who* accessed *what*, *when*, and *why*. This lesson covers the security, governance, and compliance services you'll need to master for the AWS Certified Data Analytics Specialty exam, and more importantly, to build real-world data platforms that organizations can trust.
Data Governance Fundamentals
Data governance is the framework of policies, processes, and standards that ensure data is accurate, secure, and used appropriately. Think of it like a city's zoning laws — it defines who can build what, where, and under what conditions.
Key pillars of data governance in AWS:
Data Lineage
Data lineage is the "paper trail" of your data — from ingestion through transformation to consumption. AWS Glue captures lineage metadata automatically when running ETL jobs, letting you trace a dashboard metric back to its raw source file.Core Security Services
IAM — Identity and Access Management
IAM is the central nervous system of AWS security. It controls *who* (users, roles, services) can do *what* (actions) on *which* resources.{
"Effect": "Allow",
"Action": ["s3:GetObject"],
"Resource": "arn:aws:s3:::my-data-lake-bucket/*",
"Condition": {
"StringEquals": {"s3:prefix": "finance/"}
}
}Use IAM roles (not users) for services like Glue, Redshift, and Lambda to follow the principle of least privilege.
AWS Organizations
When managing multiple AWS accounts (common in enterprise data lakes), AWS Organizations lets you apply Service Control Policies (SCPs) across all accounts. SCPs act as guardrails — even an account admin can't exceed the permissions the SCP allows.Auditing & Monitoring
CloudTrail
AWS CloudTrail records every API call made in your AWS account — who called it, from where, and when. It's your primary tool for answering: *"Who deleted that S3 bucket?"*| Feature | Detail |
|---|---|
| Scope | All AWS API calls (console, CLI, SDK) |
| Storage | Logs delivered to S3 or CloudWatch Logs |
| Use case | Security audits, compliance, incident response |
Exam tip: If a question asks how to audit API calls to AWS resources, the answer is CloudTrail.
AWS Config
While CloudTrail records *actions*, AWS Config records *resource states*. It continuously monitors whether your resources comply with defined rules (e.g., "Are all S3 buckets encrypted?"). Think of CloudTrail as a security camera and Config as a building inspector.Security Hub
AWS Security Hub aggregates findings from multiple services (GuardDuty, Macie, Config) into a single dashboard. It's your centralized security posture manager — useful when you need a bird's-eye view of compliance across accounts.Encryption & Key Management
KMS — Key Management Service
AWS KMS lets you create and control the encryption keys used to protect your data. Benefits include:Encrypting Data in S3
The simplest way to encrypt data at rest in S3 is to enable default bucket encryption:aws s3api put-bucket-encryption \
--bucket my-data-lake \
--server-side-encryption-configuration '{
"Rules": [{
"ApplyServerSideEncryptionByDefault": {
"SSEAlgorithm": "aws:kms"
}
}]
}'This ensures every object uploaded is automatically encrypted, even if the uploader forgets to specify it.
| Encryption Type | Key Managed By | Use Case |
|---|---|---|
| SSE-S3 | AWS | Simplest, no extra cost |
| SSE-KMS | AWS KMS (your key) | Audit trail, fine-grained control |
| SSE-C | You (customer) | Full key control |
Data Lake & Lake Formation
Data Lake on AWS
A data lake is a centralized repository storing structured, semi-structured, and unstructured data at scale — typically in S3. The challenge isn't storing data; it's governing it.AWS Lake Formation
Lake Formation simplifies building secure data lakes by providing:Without Lake Formation, you'd need to manage S3 bucket policies, IAM policies, and Glue permissions separately. Lake Formation unifies these into one governance layer.
PII, Data Masking & Macie
PII (Personally Identifiable Information)
PII is any data that can identify an individual — names, email addresses, social security numbers, etc. Regulations like GDPR require you to protect, minimize, and properly handle PII.Data Masking
Data masking replaces sensitive values with realistic but fake data. For example:John Smith, SSN: 123-45-6789J* S, SSN: *-**-6789AWS Glue can apply masking transformations in ETL pipelines before data reaches analysts.
Amazon Macie
Amazon Macie uses machine learning to automatically discover and protect PII in S3. It scans buckets, identifies sensitive data, and generates findings (e.g., "This bucket contains 1,200 credit card numbers").Analogy: Macie is like a smart mail sorter that flags envelopes containing sensitive documents before they're sent to the wrong department.
Networking: VPC for Data Security
A VPC (Virtual Private Cloud) isolates your AWS resources in a private network. For data analytics: