🦝

Rackoon Tycoon

Build your cloud empire. Tame the traffic.

AWS Certified Solutions Architect – Associate (SAA-C03) study guide. Gap-focused prep for distributed systems veterans — skips the basics, targets 2023–2026 additions and exam traps.

Exam: SAA-C03 · 65 questions · 130 min · 720/1000 passing · Scenario-based

🎮 Play the companion game → (serve the repo, then open via localhost)

Study Progress 0 / 0 checked

🔴 Priority Gaps — From Your Interview

Weak spots found during the gap interview (2026-06-16). Highest ROI before exam day. Pattern across all of them: you diagnose the problem but don't name the exact AWS fix — the exam rewards the mechanism, not the concept. Click each to self-quiz.

1 Gateway VPC Endpoint costnet

Named NAT Gateway as the cost, didn't name the fix.

Gateway VPC Endpoint for S3/DynamoDB — FREE, routes over private AWS backbone, kills NAT Gateway processing ($0.045/GB).

Gateway endpoints = S3 + DynamoDB ONLY, always free. Interface endpoints (PrivateLink) = everything else, cost money. Reflexive answer for "EC2 in private subnet + S3 cost".

Q: EC2 in private subnet reads 500GB/day from same-region S3 through a NAT Gateway. Cheapest fix?
A: Add a Gateway VPC Endpoint for S3. Free, removes NAT data-processing charge. S3 same-region transfer itself is already free.

2 VPC Lattice vs PrivateLink net

No VPC Lattice experience.

Lattice = internal service mesh across YOUR accounts/VPCs (no peering/TGW). PrivateLink = expose a service to an EXTERNAL org/consumer.

Exam trigger: "microservices across accounts, same org" → Lattice. "endpoint service + interface endpoint" / external partner → PrivateLink.

Q: ECS service A (account 1) must call ECS service B (account 2), same org, with built-in auth + weighted routing, no peering. What?
A: VPC Lattice. (PrivateLink is for cross-org exposure, not internal mesh.)

3 Kinesis Streams vs Firehose data

Said Firehose can replay; said Streams retention = "minutes/hours".

Firehose CANNOT replay (one-way delivery, failed records → error S3 prefix). Streams retention = 24h default, up to 365 days.

"Minutes" = Firehose's delivery BUFFER (60s–15min / 1–128MB), not stream retention. Need replay/multiple consumers → Streams. Load to S3/Redshift/OpenSearch, no code → Firehose.

Q: Real-time anomaly detection (sub-second, replayable) + separate S3 data-lake load (no processing). Which Kinesis for each?
A: Data Streams for the real-time/replay consumer; Firehose for the S3 load. Firehose has no replay.

4 Aurora Limitless vs Serverless v2 db

Picked Limitless for a modest-write spiky workload.

Serverless v2 = vertical auto-scale (ACUs) for spikes + zero capacity planning — right for spiky/unpredictable. Limitless = horizontal write SHARDING, only when a single writer is maxed.

Reach for Limitless only when you can prove single-node writer is the bottleneck (single writer ≈ up to 256 ACUs). It does NOT solve spike elasticity — that's v2's job.

Q: 800K writes/day, 10x growth, flash-sale spikes, zero capacity planning. v2 or Limitless?
A: Serverless v2 — fits one writer easily, auto-scales for spikes. Limitless is overkill until writes exceed a single node.

5 Lambda-in-VPC inherits EC2 networking compute

Vague on whether moving S3 access to Lambda changes the cost answer.

Lambda with NO VPC config → S3 via public AWS endpoints, free path, no NAT. VPC-attached Lambda → same as EC2: needs Gateway VPC Endpoint or it pays NAT.

"Serverless" ≠ "no networking constraints". Attach Lambda to a VPC and it inherits subnet/route-table/NAT behavior.

Q: VPC-attached Lambda pulling from S3 racks up NAT charges. Fix?
A: Gateway VPC Endpoint for S3 — same fix as EC2. (Or detach from VPC if it doesn't need VPC resources.)

6 RCP = hard resource ceiling sec

Right outcome ("can't delete") but reasoning was just "most restrictive wins".

An RCP (Resource Control Policy) is a guardrail on the RESOURCE. Explicit Deny there beats any identity policy, SCP allow, or even account root.

SCP restricts what principals CAN DO. RCP restricts what CAN BE DONE TO a resource. Both are org-level Deny ceilings; explicit Deny always wins.

Q: Identity policy Allows s3:DeleteBucket, no SCP mentions S3, an org RCP Denies s3:DeleteBucket. Can they delete?
A: No. RCP explicit Deny is a hard ceiling on the resource, independent of identity-side evaluation.

7 Cognito User Pools vs Identity Pools sec

Strong on Identity Center vs Cognito, but didn't split Cognito's two halves.

User Pools = authentication + JWT (who are you). Identity Pools (federated identities) = vend temporary AWS credentials (STS) so users hit AWS resources directly.

Workforce SSO across 15 accounts → IAM Identity Center. 2M app users via Google → Cognito (User Pool to auth, Identity Pool if they need direct S3/DynamoDB access).

Q: Mobile users sign in with Google, then must upload directly to S3 with scoped creds. Which Cognito piece vends the AWS creds?
A: Identity Pool (federated identities). User Pool handles the login/JWT.

8 RDS cross-AZ cost fixes costdb

Math right ($0.01/GB each way) but no fix options.

Options: (a) co-locate EC2+RDS same AZ (loses HA), (b) read replica in same AZ — reads local/free, writes still cross-AZ, (c) accept it as the price of HA. No free Gateway endpoint exists for RDS.

When the exam says "reduce cost WITHOUT sacrificing availability" → the same-AZ read replica is usually the answer.

Q: Heavy cross-AZ RDS read traffic costs are high; must keep HA. Best fix?
A: Add a read replica in the EC2's AZ; route reads locally. Writes still cross to primary (accepted), HA preserved.

Exam Domains

Weights

Domain 1

30%

Design Secure Architectures

IAM, encryption, network security, data protection

Domain 2

26%

Design Resilient Architectures

HA, DR, fault tolerance, decoupling

Domain 3

24%

Design High-Performing Architectures

Scaling, caching, optimized storage/compute

Domain 4

20%

Design Cost-Optimized Architectures

Pricing models, rightsizing, storage tiers

Gap Analysis — 17yr Veteran

Review these

AI/ML Services (Likely Blind Spot)

Bedrock, SageMaker Canvas, Comprehend, Kendra, Q Business, Rekognition positioning on the exam. Not about using them — about when to choose which.

Bedrock = managed FM API, no infra, pay per token
SageMaker = full MLOps pipeline, bring your own model
Comprehend = NLP (sentiment, entities, PII detection)
Kendra = intelligent enterprise search over docs
Rekognition = image/video analysis (faces, objects, content moderation)

New Networking: VPC Lattice & Verified Access (2023)

Both likely on exam. VPC Lattice replaces many PrivateLink patterns for service-to-service. Verified Access is zero-trust app access without VPN.

Resource Control Policies (RCPs) — 2024

New SCP-like control at Organizations level but applies to resources not identities. Different from SCPs. Controls what external principals can do to your resources even if their IAM allows it.

S3 Express One Zone (2023)

10x faster than S3 Standard. Single AZ only. Used for ML training, HPC, latency-sensitive analytics. Different pricing model — charged per request + storage per GB.

Aurora Limitless Database (2024)

Horizontal write scaling beyond single Aurora instance. Distributed sharding, managed by Aurora. Exam may contrast with Aurora Global vs Aurora Serverless v2 vs Limitless.

IAM Identity Center (was SSO) — naming/features evolved

Now integrates with external IdPs, trusted identity propagation to analytics services. Replaces per-account IAM role federation for most multi-account patterns.

New Services 2023–2026

High exam weight

Amazon Bedrock Generative AI Platform

AI/ML 2023

Managed access to foundation models (Claude, Llama, Titan, Mistral, Stable Diffusion)
No infra to manage, pay per token
Bedrock Agents — multi-step orchestration with tool use
Bedrock Knowledge Bases — RAG with S3/managed vector store
Bedrock Guardrails — content filtering, PII redaction
Bedrock Model Evaluation — compare model outputs

Use Bedrock when question says "managed", "no ML expertise", "foundation model". Use SageMaker when "custom training", "MLOps pipeline", "bring own model".

VPC Lattice Service Networking

Networking 2023

Logical application-layer network across VPCs and accounts
Service-to-service communication without VPC peering complexity
Built-in auth, observability, traffic management
Works with EC2, ECS, EKS, Lambda targets
Cheaper and simpler than PrivateLink for internal services

PrivateLink = expose to external consumers (SaaS). VPC Lattice = internal service mesh across your own accounts.

AWS Verified Access Zero-Trust App Access

Security Networking 2023

Provides VPN-less secure access to corporate apps
Evaluates each request against identity + device posture
Integrates with IAM Identity Center, Okta, JumpCloud
Integrates with CrowdStrike/Jamf for device trust signals
Access logs to S3/CloudWatch/Firehose

Question: "eliminate VPN, verify user identity + device health per request" → Verified Access, not Client VPN.

S3 Express One Zone High-Speed Object Storage

Storage 2023

10x lower latency than S3 Standard (single-digit ms)
Single AZ — not cross-AZ replicated
Ideal: ML training datasets, financial analytics, HPC
Uses "directory buckets" (different API path from S3)
Charged per request + GB, no free tier

Single AZ = not durable to AZ failure. Exam will tempt you to pick this for HA scenarios — wrong. Also: NOT a replacement for EFS for shared file access.

Aurora Limitless Database

Database 2024

Horizontal write scaling beyond single Aurora writer limit
AWS manages distributed sharding transparently
Still Aurora-compatible SQL interface
Compare: Aurora Serverless v2 = scale down to zero, Aurora Global = multi-region reads, Limitless = multi-shard writes

Don't confuse with Aurora Serverless. Limitless = massive write scale. Serverless = variable, spiky workloads that need scale-to-zero.

ElastiCache Serverless

Database 2023

No cluster sizing decisions, auto-scales instantly
Supports both Redis and Memcached engine
Sub-millisecond latency maintained at scale
Pay per ECU consumed + GB stored

Exam may frame as "unpredictable cache demand" → Serverless. Predictable steady-state → provisioned (cheaper).

Resource Control Policies (RCPs)

Security 2024

Organization-level guardrails on resources (not identities)
Prevents cross-account access even if destination IAM allows
Complements SCPs: SCPs control principals, RCPs control resources
Example: "deny S3 bucket access from outside org" in one policy

SCP denies what org members CAN DO. RCP denies what CAN BE DONE TO your resources regardless of who grants permission.

AWS Graviton4 ARM Compute

Compute 2024

40% better price-performance vs x86 for general workloads
R8g instances: memory-optimized, database workloads
M8g, C8g instances available
Fully compatible with containers, Java, Python, Go, Rust
Graviton3: M7g, C7g, R7g — still heavily used/tested

Exam: "reduce EC2 cost, no application changes required" → Graviton (if app is already Linux/compiled). Wrong answer: Reserved Instances alone without instance type change.

AWS Trainium2 / Inferentia2

AI/ML Compute 2024

Trainium2 (Trn2): ML training chip, cheaper than GPU for large models
Inferentia2 (Inf2): ML inference, optimized throughput/latency/cost
Capacity Blocks for ML: reserved GPU/accelerator capacity for time-boxed training runs
P5en instances: H100 GPUs for extreme training

Training = Trn2 (cheapest). Inference at scale = Inf2. Need NVIDIA compatibility = P4/P5. Exam may distinguish these.

Amazon Q Business Enterprise AI

AI/ML 2024

AI assistant grounded in your enterprise data
Connects to S3, Confluence, SharePoint, Salesforce, Jira, etc.
Respects existing IAM/SAML permissions when answering
Different from Kendra: Q Business = conversational AI, Kendra = document search API

"Build internal ChatGPT over company docs" → Q Business. "Search API over document corpus" → Kendra. "Custom AI product" → Bedrock.

AWS Clean Rooms

Security 2023

Collaborate on data analysis without sharing raw data
Each party brings data, run joint queries, no raw export
Built-in controls: aggregation-only, noise injection, column restrictions
For: ad measurement, financial analytics, healthcare research

"Analyze joint data with partner without exposing PII to each other" → Clean Rooms. Not Lake Formation, not just S3 permissions.

IAM Identity Center (evolved from SSO)

Security Updated 2023+

Central SSO for all AWS accounts + business apps
Trusted identity propagation: pass user identity to EMR, Redshift, S3 Access Grants
Integrates with external IdPs via SAML 2.0 / OIDC
Permission sets = IAM roles deployed across accounts
Replaces per-account federation for multi-account setups

Multi-account + SSO = Identity Center, not Cognito. Cognito = app user pools (B2C). Identity Center = workforce/employee access.

AWS Private 5G

Networking 2023

Deploy private 5G/LTE network in your facility
AWS provides hardware + SIM cards + management
Low latency industrial/manufacturing connectivity
Integrates with AWS Outposts

Amazon DataZone Data Governance

Database 2023

Data catalog + marketplace + governance platform
Publish, discover, and subscribe to data assets
Works with Redshift, Glue, S3, Athena
Lake Formation = row/column access control. DataZone = catalog + discovery + business glossary

Lake Formation for fine-grained access. DataZone for data discovery and cross-team sharing workflows.

Amazon OpenSearch Serverless

Database 2023

No cluster sizing for OpenSearch
Auto-scales index + search capacity independently
Collections: time-series or search (different optimizations)
Pay per OCU (OpenSearch Compute Unit) consumed

AWS Supply Chain

AI/ML 2023

ML-powered supply chain visibility and risk management
Connects to ERP, WMS, TMS systems
Demand planning, inventory optimization recommendations

Compute — Exam Focus

EC2 · Lambda · ECS · EKS · Batch

EC2 Instance Types — When to Pick What

M-family: general purpose, balanced CPU/RAM (web servers, app servers)
C-family: compute optimized (HPC, batch, gaming servers, media encoding)
R-family: memory optimized (in-memory DBs, Redis, real-time analytics)
I-family: storage optimized NVMe (NoSQL DBs, data warehousing, Elasticsearch)
P/G-family: GPU (ML training, graphics rendering)
Trn/Inf: ML-specific training/inference (cheaper than P-family for compatible workloads)
Graviton (g suffix: m7g, c7g): ARM, 20-40% cost reduction
Mac instances: Xcode CI/CD on actual macOS

EC2 Pricing Models

On-Demand: no commitment, highest rate
Reserved (1yr/3yr): up to 72% off, Standard RI or Convertible RI
Savings Plans (Compute): flexible, applies to EC2+Lambda+Fargate, hourly commitment
Savings Plans (EC2): region+family locked, higher discount
Spot: up to 90% off, interruptible 2min notice
Dedicated Host: physical server, BYOL (Oracle, Windows Server)
Dedicated Instance: single-tenant, no visibility into host

Compute Savings Plans > EC2 Savings Plans in flexibility. RI Standard = can't change instance type. RI Convertible = can change, lower discount. Dedicated Host ≠ Dedicated Instance.

Auto Scaling Policies

Target tracking: maintain metric at target (e.g., CPU 60%) — simplest
Step scaling: add N instances based on alarm breach magnitude
Simple scaling: one action per alarm, has cooldown — legacy
Scheduled: predictable load patterns
Predictive: ML forecast of future load, pre-scales
Warm pools: pre-initialized stopped instances, fast scale-out

Target tracking is usually the right answer for "minimize costs while maintaining performance". Predictive = known traffic pattern. Warm pools = slow boot time problem.

Lambda Limits & Patterns

Max execution: 15 minutes
Memory: 128MB – 10GB (CPU scales with memory)
Ephemeral storage (/tmp): up to 10GB
Concurrency: 1000 default per region (can increase)
Cold start mitigations: Provisioned Concurrency, SnapStart (Java)
Lambda SnapStart: 10x faster Java cold starts, pre-initialized snapshots
Layers: share code/deps up to 250MB total
Container images: up to 10GB (no layer limit constraint)

Lambda timeout = 15min max. Need longer? Use Fargate or ECS. Lambda SnapStart only works with Java runtime. Provisioned Concurrency ≠ reserved concurrency (different thing).

ECS vs EKS vs Fargate

ECS (Fargate): AWS-managed, no node management, simpler, AWS-native
ECS (EC2): control over host, GPU support, spot, custom AMIs
EKS (Fargate): Kubernetes API, serverless pods, no node mgmt
EKS (EC2/managed nodes): K8s, full control, daemonsets, stateful
EKS Anywhere: run K8s on-prem with EKS tooling
ECS Anywhere: run ECS tasks on-prem servers

"We use Kubernetes" → EKS. "Migrate Docker containers with minimal overhead" → ECS Fargate. "On-prem container orchestration" → ECS Anywhere or EKS Anywhere.

AWS Batch

Fully managed batch computing on EC2/Spot/Fargate
Job queues, compute environments, job definitions
Array jobs: parallel processing of independent work items
Fair-share scheduling: multi-tenant priority allocation
AWS Batch Multi-Node: distributed ML training

Batch vs Lambda: Lambda = event-driven, short. Batch = long-running, high-compute, parallel. Batch vs Step Functions: Step Functions = workflow orchestration with state. Batch = embarrassingly parallel compute.

Storage

S3 · EBS · EFS · FSx · Storage Gateway

S3 Storage Classes — Decision Tree

Standard: frequent access, ms latency
Standard-IA: infrequent access, per-GB retrieval fee, min 30 days, 128KB min
One Zone-IA: like IA, single AZ, 20% cheaper, not HA
Glacier Instant: archive with ms retrieval (replaced old Glacier)
Glacier Flexible: min/hours retrieval, cheapest active archive
Glacier Deep Archive: 12hr+ retrieval, lowest cost storage in AWS
Intelligent-Tiering: auto-moves between tiers, no retrieval fees, monitoring fee per object
Express One Zone: 10x faster than Standard, single AZ, premium price

Intelligent-Tiering has NO retrieval fees but has per-object monitoring cost. Bad for millions of tiny files. Standard-IA min 30-day billing — delete before 30 days? Still charged.

S3 Key Features

Object Lock: WORM — Compliance mode (nobody can delete) vs Governance mode (privileged can override)
Replication: CRR (cross-region) / SRR (same-region). Requires versioning on both buckets.
Transfer Acceleration: CloudFront edge → private AWS backbone → S3. For large uploads from far clients.
Multipart Upload: required >5GB, recommended >100MB
S3 Access Points: named network endpoints with separate policies per app/team
S3 Access Grants: delegate fine-grained access with identity propagation from IAM Identity Center
Event Notifications: → Lambda, SQS, SNS, EventBridge
Lifecycle rules: transition/expire objects based on age or prefix

Replication does NOT replicate existing objects — only new ones after rule created. Cross-account replication needs bucket policy on destination. Object Lock requires versioning.

EBS Volume Types

gp3: general SSD, 3000 IOPS baseline, up to 16K IOPS, 1GB-16TB. Default.
gp2: older gen, IOPS scales with size (3 IOPS/GB), max 16K
io2/io2 Block Express: provisioned IOPS SSD, up to 256K IOPS, multi-attach capable, 99.999% durability
io1: older provisioned IOPS, max 64K IOPS
st1: throughput HDD, sequential large block (data warehouses, ETL). Can't be boot volume.
sc1: cold HDD, lowest cost, infrequent access. Can't be boot volume.

gp3 vs gp2: gp3 you configure IOPS independently (cheaper for high-IOPS low-storage). Multi-attach = io1/io2 only, same AZ, use with cluster-aware apps. Boot volume can only be SSD (gp2/gp3/io1/io2).

EFS vs FSx

EFS: NFS, Linux-only, multi-AZ, scales automatically, POSIX. Bursting vs Provisioned throughput.
EFS One Zone: single AZ, 47% cheaper, same API
FSx for Windows: SMB, Active Directory integrated, Windows ACLs, DFS
FSx for Lustre: parallel HFS for HPC/ML. Native S3 integration. Sub-ms latency.
FSx for NetApp ONTAP: NFS+SMB+iSCSI, snapshots, replication, dedup/compression, multi-protocol
FSx for OpenZFS: NFS-compatible, ZFS snapshots, data compression, 1M IOPS

Linux shared storage = EFS. Windows file server = FSx for Windows. HPC/ML parallelism = FSx for Lustre. Mixed-OS or NetApp migration = FSx for ONTAP. ZFS snapshots = FSx OpenZFS.

Storage Gateway

File Gateway: NFS/SMB to S3. Local cache. Files stored as S3 objects.
Volume Gateway (Cached): iSCSI block, primary data in S3, frequent data cached locally
Volume Gateway (Stored): entire volume on-prem, async backup to S3
Tape Gateway: VTL for backup software (Veeam, Veritas). Archives to Glacier.

File Gateway = NFS on-prem to S3 (file share). Volume = block/iSCSI. Tape = backup software VTL. "Replace tape backup" → Tape Gateway. "On-prem NFS to cloud" → File Gateway.

Database

RDS · Aurora · DynamoDB · ElastiCache · Redshift

RDS — Exam Points

Read replicas: async replication, can be promoted, cross-region
Multi-AZ: sync replication, automatic failover, NOT for read scaling
Multi-AZ DB Cluster: 2 readable standby instances (new in 2022+)
RDS Proxy: connection pooling, reduces DB connections for Lambda
Performance Insights: DB load monitoring, wait events
Automated backups: 1-35 days retention, PITR
Storage auto-scaling: enabled separately from compute

Multi-AZ = HA failover, NOT read scale. Read replicas = read scale, NOT automatic failover (must promote manually). Lambda + RDS = must use RDS Proxy (connection exhaustion).

Aurora — Specifics

Shared storage layer, 6 copies across 3 AZs, quorum writes
Up to 15 read replicas with sub-10ms replica lag
Aurora Global: 1 primary + up to 5 secondary regions, <1s replication
Aurora Serverless v2: scales in 0.5 ACU increments, scales to 0
Aurora Limitless: horizontal write sharding (2024)
Aurora I/O-Optimized: predictable pricing for I/O-heavy workloads
Backtrack: rewind DB without restore (MySQL only)

Aurora Global RTO <1min for promoted secondary. Aurora Multi-AZ != RDS Multi-AZ (different architecture). Serverless v2 does NOT scale to 0 during active connections. Backtrack = not a backup, limited window.

DynamoDB — Exam Deep Cuts

On-demand vs Provisioned (with Auto Scaling)
Global tables: multi-active multi-region, eventual consistency across regions
DynamoDB Streams: change data capture, triggers Lambda
TTL: auto-delete expired items (no cost for deletes)
DAX: in-memory cache, microsecond reads, write-through
Transactions: ACID across multiple items (TransactWriteItems)
GSI: different partition key, eventual consistency, no uniqueness
LSI: same partition key, different sort key, must be at creation, strongly consistent reads
PartiQL: SQL-compatible query language for DynamoDB

LSI only at table creation. GSI can be added later. Hot partition = design problem (not fixable with more capacity). DAX = read cache only (not write-through for updates). Global Tables require on-demand or the same provisioned settings in all regions.

ElastiCache

Redis: data structures, pub/sub, sorted sets, persistence, replication, cluster mode, Lua scripting
Memcached: simple K/V, no persistence, multi-threaded, horizontal scale (no replication)
Redis Cluster Mode: sharding across up to 500 node groups
Redis Serverless: auto-scale (2023)
Global Datastore: Redis cross-region replication

Need replication/failover = Redis. Need multi-threaded pure caching = Memcached. Need persistence = Redis. Need leaderboard/sorted set = Redis. Memcached = no persistence, no failover, no pub/sub.

Redshift

Columnar, MPP data warehouse (PostgreSQL-compatible)
RA3 nodes: managed storage (S3), scale compute/storage independently
Redshift Serverless: auto-scales, pay per RPU-second
Redshift Spectrum: query S3 directly from Redshift without loading
Data Sharing: live access to Redshift data across accounts/clusters without copy
Redshift ML: CREATE MODEL trains SageMaker model via SQL
Auto-copy from S3: continuously loads new S3 files into tables

Redshift = analytical/OLAP, not transactional OLTP. "Query S3 without loading" = Spectrum or Athena (Athena if no Redshift cluster needed). Redshift Serverless ≠ Aurora Serverless (different engines/purposes).

Database Selection Guide

Relational, known schema → RDS (MySQL/Postgres/SQL Server/Oracle)
Relational, massive scale, managed → Aurora
Key-value, millisecond, massive scale → DynamoDB
In-memory cache → ElastiCache (Redis or Memcached)
In-memory data store with durability → MemoryDB for Redis
Analytics/OLAP → Redshift
Time series → Timestream
Graph → Neptune (fraud, social, knowledge graphs)
Document store → DocumentDB (MongoDB-compatible)
Ledger/immutable audit → QLDB
Search → OpenSearch

MemoryDB for Redis

Durable Redis-compatible in-memory DB (not just cache)
Multi-AZ transaction log ensures durability
Microsecond reads, single-digit ms writes
Primary data store (not just cache layer)
Use when: Redis as primary DB, need durability, need fast reads

ElastiCache Redis = cache. MemoryDB = primary DB with Redis API. MemoryDB is durable. ElastiCache = optional persistence (less durable).

Networking

VPC · CloudFront · Route 53 · Direct Connect · Transit Gateway

VPC — Advanced Concepts

CIDR non-overlapping required for peering/TGW attachments
VPC Peering: non-transitive (A↔B, B↔C ≠ A↔C)
Transit Gateway: hub-and-spoke, transitive routing, up to 5000 VPCs
AWS Network Firewall: stateful/stateless inspection, IDS/IPS in VPC
Security Groups: stateful (return traffic auto-allowed), instance-level
NACLs: stateless (need explicit inbound + outbound), subnet-level, numbered rules
Gateway endpoints: S3 and DynamoDB — free, no NAT needed
Interface endpoints (PrivateLink): ENI in subnet, charges apply, most other services

NACLs are stateless — you MUST allow ephemeral ports (1024-65535) for return traffic. SGs stateful. VPC peering = non-transitive. Need transitive = Transit Gateway or VPN.

VPC Lattice 2023

App-layer service network across VPCs/accounts
Service directory: register services, auto-discover
Auth policies: who can call which service
Traffic management: weighted routing, path-based
Observability: CloudWatch metrics, access logs
Works with EC2, Lambda, ECS, EKS, IP targets

Direct Connect

Dedicated 1/10/100 Gbps private connection to AWS
DX Gateway: connect to multiple VPCs/regions from one DX connection
Virtual Interfaces: Private VIF (to VPC), Public VIF (to AWS public services), Transit VIF (to TGW)
Hosted Connection: via partner, 50Mbps–10Gbps
Resilient: dual DX connections across locations for HA
DX + VPN: encrypted DX traffic (DX is not encrypted by default)

DX NOT encrypted. Add VPN over DX for encryption. DX Gateway doesn't enable VPC-to-VPC routing. SLA requires 2 DX connections from different locations.

Route 53 Routing Policies

Simple: single resource, no health check routing
Failover: active-passive, health check required
Weighted: split traffic by %, blue/green deploys
Latency: route to lowest-latency region
Geolocation: route based on user country/continent
Geoproximity: route based on geography + bias adjustment (requires Traffic Flow)
Multi-value: return multiple healthy IPs, client-side random
IP-based: route based on client CIDR (2023)

Geolocation vs Geoproximity: Geo = exact country match. Geoproximity = distance with bias shifting. Multi-value ≠ load balancer (no stickiness, no connection draining).

CloudFront

Origins: S3, ALB, EC2, custom HTTP, MediaStore
OAC (Origin Access Control): S3 auth, replaces OAI (Origin Access Identity)
Cache behaviors: per path pattern, TTL, query strings, headers
Edge Functions: CloudFront Functions (lightweight JS at edge, sub-ms) vs Lambda@Edge (full runtime, regional)
Field-level encryption: encrypt specific form fields at edge
Signed URLs / Signed Cookies: time-limited access
Price classes: limit edge locations to reduce cost
Real-time logs: to Kinesis Data Streams

OAI is deprecated, use OAC. CloudFront Functions = header manipulation, URL rewrites, light auth. Lambda@Edge = full Node/Python, response body manipulation. CF Functions are ~10x cheaper.

Load Balancers

ALB: HTTP/HTTPS/WebSocket, path/host/header routing, Lambda targets, user auth via Cognito/OIDC
NLB: TCP/UDP/TLS, ultra-low latency, static IPs/Elastic IPs, PrivateLink
GWLB: transparent inspection at L3, sends traffic to appliances (firewalls, IDS)
Target groups: EC2, ECS, Lambda, IPs
Connection draining / deregistration delay: graceful shutdown
Sticky sessions: ALB (app cookie or LB cookie)

Need static IP for whitelist = NLB. Need path routing = ALB. Need transparent network appliance = GWLB. ALB with Lambda target = free response body transformation. NLB for UDP (gaming, VoIP).

Global Accelerator vs CloudFront

Global Accelerator: TCP/UDP, static Anycast IPs, routes via AWS backbone, all traffic types, DDoS protection
CloudFront: HTTP(S) only, caching at edge, content delivery
GA: good for non-HTTP (gaming, VoIP, IoT), or HTTP that can't be cached
CF: good for static assets, cacheable HTTP APIs

"Static IPs for global app" → Global Accelerator. "Cache images globally" → CloudFront. GA ≠ caching. Both use AWS global network.

Security

IAM · KMS · WAF · Shield · GuardDuty

IAM — Exam Nuances

Identity-based policies: attached to user/group/role
Resource-based policies: attached to resource (S3, SQS, KMS)
Permission boundaries: max permissions, doesn't grant itself
Session policies: passed during AssumeRole, further restricts
SCPs: org-level max, affects all accounts including root (but not management account by default)
RCPs: org-level resource control (2024) — see Gap Analysis
Evaluation: deny → org SCPs → resource policy → identity policy → permission boundary → session policy

Explicit deny always wins. SCP doesn't grant permissions — it restricts. Permission boundary restricts but doesn't grant. Cross-account: need BOTH resource policy + identity policy (unless resource-based policy alone grants it).

KMS

AWS managed keys (aws/service): free, automatic rotation, limited control
Customer managed keys (CMK): full control, key policies, $1/month/key
Key policies: resource-based, primary access control for KMS
Grants: temporary delegated access to keys
Envelope encryption: KMS encrypts data key, data key encrypts data
Multi-region keys: replicate key material, same key ID prefix
External key material (BYOK): import your own key material
KMS CloudHSM-backed: FIPS 140-2 Level 3

KMS max encrypt payload = 4KB. For larger data: envelope encryption. CloudHSM = FIPS 140-2 Level 3 (KMS standard = Level 2). "Audit key usage" = KMS CloudTrail. Multi-region keys share key material but are independent keys.

WAF, Shield, Firewall Manager

WAF: L7 rules (SQL injection, XSS, geo-block, rate limit, IP sets, managed rule groups). Attaches to ALB, CloudFront, API GW, AppSync.
Shield Standard: free, always on, DDoS protection L3/L4
Shield Advanced: $3K/month, L7 DDoS, cost protection, 24/7 DRT team, near real-time visibility
Firewall Manager: centrally deploy WAF rules, Shield, SGs, Network Firewall across org
Network Firewall: stateful/stateless inspection in VPC, IDS/IPS

WAF on ALB protects only that ALB. Firewall Manager manages WAF across hundreds of accounts centrally. Shield Advanced required for WAF cost protection during attacks.

Threat Detection & Monitoring

GuardDuty: threat detection from CloudTrail, VPC Flow Logs, DNS, S3 data events. ML-based anomaly detection.
Macie: discover and protect sensitive data in S3 (PII, credentials, financial)
Inspector: vulnerability scanning for EC2 (CVEs), Lambda (function code), container images in ECR
Security Hub: aggregates findings from GuardDuty, Inspector, Macie, Firewall Manager, etc. CSPM. CIS benchmarks.
Detective: investigate security incidents — graph analysis of GuardDuty/CloudTrail findings

GuardDuty = detection. Inspector = vulnerabilities. Macie = data classification. Detective = investigation. Security Hub = aggregation/CSPM. These are different things — exam will mix them.

Secrets & Certificate Management

Secrets Manager: store/rotate secrets, auto-rotation via Lambda, cross-account, $0.40/secret/month
SSM Parameter Store: free (standard), hierarchical, no auto-rotation, SecureString uses KMS
ACM (Certificate Manager): free TLS certs for ALB/CloudFront/API GW. Auto-renews.
ACM Private CA: issue private certs for internal services. Not free.

DB passwords needing rotation = Secrets Manager. Config values/flags = SSM Parameter Store. "Auto-rotate RDS password" → Secrets Manager. SSM Standard = free, max 10K params, 4KB value.

Encryption In Transit/At Rest

S3: SSE-S3 (AWS managed), SSE-KMS (CMK), SSE-C (customer key, client provides), client-side
EBS: KMS encryption, encrypted AMI = encrypted snapshots
RDS/Aurora: at-rest KMS, in-transit SSL/TLS
DynamoDB: always encrypted at rest (can specify KMS CMK)
"Require HTTPS only" on S3: bucket policy condition aws:SecureTransport

S3 SSE-KMS: you control key + audit via CloudTrail. SSE-S3: AWS manages everything. "Customer manages encryption keys fully" = SSE-C or client-side. Encrypted EBS snapshot can be shared cross-account (must re-encrypt with shared key).

AI / ML Services

Bedrock · SageMaker · Comprehend · Rekognition · Kendra

Service Selection — AI/ML Decision Tree

No ML experience + use foundation model → Bedrock
Build/train custom model, MLOps → SageMaker
Analyze text (entities, sentiment, language, PII) → Comprehend
Search documents/internal knowledge base → Kendra (API) or Q Business (conversational)
Analyze images/video (faces, objects, moderation) → Rekognition
Text-to-speech → Polly
Speech-to-text → Transcribe
Translation → Translate
Fraud detection → Fraud Detector
Personalization/recommendations → Personalize
Forecasting → Forecast
OCR / document extraction → Textract

Amazon Bedrock — Deep Dive

Foundation models: Anthropic Claude, Meta Llama, Mistral, AI21, Amazon Titan, Stability AI
Bedrock Agents: multi-step orchestration, tool use (call Lambda/APIs)
Knowledge Bases: RAG with vector embeddings, S3 data source, managed vector store
Guardrails: content filtering, grounding checks, PII redaction, topic denial
Model Evaluation: compare models on custom datasets
Custom model fine-tuning: fine-tune Titan or Cohere models on your data
Bedrock Studio: no-code prototyping for Bedrock apps

SageMaker Key Components

Studio: web IDE for ML development
Training jobs: managed distributed training on EC2
Endpoints: deploy models for real-time inference
Batch Transform: offline inference on large datasets
Pipelines: CI/CD for ML workflows
Feature Store: centralized ML feature repository
Model Registry: versioning, approval workflow
JumpStart: pre-built models, fine-tuning templates
Canvas: no-code ML for business analysts
Ground Truth: data labeling with human workforce

Serverless & Integration

Lambda · API GW · SQS · SNS · EventBridge · Step Functions

API Gateway

REST API: full features, stages, caching, usage plans, API keys
HTTP API: cheaper, faster, OIDC/JWT auth, WebSocket, no usage plans
WebSocket API: persistent connections, bidirectional
Authorizers: Lambda (custom), Cognito, JWT (HTTP API only)
Caching: per stage/method, TTL 0-3600s, up to 237GB
Throttling: account default 10K RPS, burst 5K, per-stage/method overrides
VPC Link: private integration to NLB in VPC

HTTP API = cheaper for simple cases. REST API = throttling, caching, X-Ray tracing, WAF. "Rate limit per customer" = REST API usage plans. WebSocket = REST API or WebSocket API (not HTTP API).

SQS

Standard: at-least-once, best-effort ordering, unlimited throughput
FIFO: exactly-once, strict ordering, 300 msg/s (3K with batching)
Visibility timeout: lock message during processing (default 30s, max 12hr)
DLQ: after maxReceiveCount failed attempts
Long polling: ReceiveMessageWaitTimeSeconds (up to 20s, reduces API calls)
Message retention: 1 min to 14 days
Max message size: 256KB (Extended Client Library → S3 for larger)
Delay queues: postpone delivery 0-15min

FIFO 300 TPS baseline (3000 with high throughput mode). Need to decouple + exactly-once = FIFO. Standard = higher scale. Visibility timeout must be > processing time or message re-appears.

SNS

Pub/sub fan-out to SQS, Lambda, HTTP, email, SMS, mobile push
FIFO topics: ordered delivery, dedup, only SQS FIFO subscribers
Message filtering: subscribers receive only messages matching filter policy
SNS + SQS fan-out: one topic → multiple SQS queues for parallel processing
Message archival: SNS → Kinesis Data Firehose → S3

SNS fan-out pattern: one publish → multiple consumers. Filter policies on subscriber (not publisher). FIFO SNS = only FIFO SQS subscribers.

EventBridge

Event bus: default (AWS services), custom (your events), partner (SaaS)
Rules: match events → route to targets (Lambda, SQS, Step Functions, etc.)
Scheduler: cron/rate-based and one-time scheduled invocations (replaced CloudWatch Events)
Pipes: point-to-point with filtering/enrichment (SQS/DynamoDB/Kinesis → target)
Schema Registry: discover and store event schemas
Cross-account/cross-region event buses

EventBridge Scheduler replaced CloudWatch Events Scheduled Rules — same cron syntax, more targets, better scaling. EventBridge Pipes = filter+enrich before routing. Use Pipes to avoid Lambda glue code.

Step Functions

Standard: durable, max 1 year, exactly-once, audit history, $0.025/1K transitions
Express: high-volume, max 5 min, at-least-once, cheaper ($1/M executions)
Integrations: 200+ AWS services via optimized or SDK integrations
Map state: parallel processing of array items
Wait for task token: pause until external system calls back
Distributed map: process millions of S3 objects in parallel

Standard = long workflows, audit needed. Express = IoT/streaming/high-volume, short. "Human approval step" = Step Functions + SNS + wait for task token. Express is NOT exactly-once.

Kinesis

Data Streams: real-time, shards (1MB/s in, 2MB/s out per shard), 24hr–365day retention
Data Firehose: load to S3/Redshift/OpenSearch/Splunk/HTTP, micro-batching, no replay
Data Analytics (managed Flink): SQL/Flink on streaming data
Video Streams: ingest/store/process video at scale
Enhanced fan-out: 2MB/s per consumer per shard (push model)
On-demand mode: auto-scales capacity

Firehose = no replay capability (delivers then gone). Data Streams = replay within retention window. "Real-time analytics" = Data Streams → Flink/Lambda. "Load to S3 for batch" = Firehose. Firehose buffers (not truly real-time — 60s or 5MB minimum).

Cost Optimization

20% of exam

Compute Cost Strategy

Spot for fault-tolerant: HPC, batch, CI, stateless web (up to 90% savings)
Spot Fleet/EC2 Fleet: mixed on-demand + spot, capacity-optimized allocation
Savings Plans > Reserved Instances for flexibility
Graviton: 20-40% cheaper for same workload
Lambda: ephemeral, pay-per-invocation (vs always-on EC2)
Fargate Spot: spot pricing for Fargate tasks
Right-sizing: Compute Optimizer recommendations

Storage Cost Strategy

S3 Lifecycle → IA → Glacier → Deep Archive
Intelligent-Tiering for unknown access patterns
S3 storage lens: visibility into storage usage/costs
EBS: gp3 over gp2 (cheaper, separate IOPS config)
EBS snapshots: incremental, only changed blocks
Delete unattached EBS volumes (common waste)
EFS: use Infrequent Access lifecycle + One Zone where AZ redundancy not needed

Cost Monitoring Tools

Cost Explorer: visualize, filter, forecast spending
Budgets: alerts at thresholds (actual or forecast), Actions to stop spending
Cost Anomaly Detection: ML-based spending anomaly alerts
Compute Optimizer: rightsizing recommendations (EC2, Lambda, EBS, ECS Fargate, RDS)
Trusted Advisor: cost, security, performance, fault tolerance checks
Cost Allocation Tags: tag resources → report by tag in Cost Explorer
Savings Plans recommendations: built into Cost Explorer

Data Transfer Costs (Common Gotchas)

Inbound to AWS: FREE
Same AZ, same region, same service: FREE
Cross-AZ (EC2-to-EC2): $0.01/GB each direction
Cross-region: varies, ~$0.02-0.09/GB
EC2 to internet: $0.09/GB (first 10TB)
S3 to CloudFront: FREE (then CF to internet: $0.0085/GB)
VPC Endpoints (Gateway): FREE for S3/DynamoDB vs NAT costs

"Reduce data transfer costs for S3 access from EC2" → Gateway VPC Endpoint (free). Architecture diagram shows cross-AZ calls → identify hidden transfer costs. CF in front of S3 = eliminates S3 → internet charges.

Migration & Hybrid

DMS · Snowball · MGN · Outposts

Database Migration

DMS (Database Migration Service): homogeneous or heterogeneous DB migration, CDC, minimal downtime
SCT (Schema Conversion Tool): convert schema from one engine to another (Oracle → PostgreSQL)
DMS + SCT: together for heterogeneous migrations
DMS supports: Oracle, SQL Server, MySQL, PostgreSQL, MongoDB, S3 as source/target
CDC: continuous replication for near-zero downtime cutover

Data Transfer to AWS

Snowball Edge: 80TB, compute + storage, data center to S3
Snowball Edge (Compute Optimized): GPU, edge ML inference
Snowcone: 8-14TB, smallest, field rugged, portable
Snowmobile: 100PB, exabyte migrations, truck
Rule of thumb: if data transfer takes >1 week on existing bandwidth → physical transfer device
DataSync: online transfer, NFS/SMB/S3/EFS/FSx, scheduled/automated, up to 10Gbps

DataSync = online, ongoing sync. Snowball = offline, one-time large migration. DataSync can also transfer between AWS storage services (EFS to S3, etc.).

Server Migration: MGN

MGN (Application Migration Service): lift-and-shift, continuous block replication
Agent-based, continuous replication, minimal downtime cutover
Replaced SMS (Server Migration Service)
VMware Cloud on AWS: run VMware workloads on AWS hardware
EC2 VM Import/Export: simple import of VMs (no continuous replication)

Outposts & Edge

Outposts: AWS rack in your data center, run ECS/EKS/RDS/S3 locally
Connected to parent AWS region via Direct Connect
Outposts Servers: 1U/2U for branch offices (smaller form factor)
Local Zones: AWS infrastructure closer to metro areas, low latency
Wavelength: AWS compute at telecom 5G edge (ultra-low latency mobile apps)

Outposts = on-prem data sovereignty + AWS APIs. Local Zones = nearby AWS infrastructure (not on-prem). Wavelength = 5G/mobile edge (ms latency for mobile games, AR/VR). These are three different things.

Comparison Tables

High value for exam

SQS vs SNS vs EventBridge vs Kinesis

Feature	SQS	SNS	EventBridge	Kinesis Streams
Pattern	Queue (pull)	Pub/Sub (push)	Event routing	Stream (replay)
Consumers	One	Many (fan-out)	Many rules	Many (shards)
Replay	No	No	No*	Yes (retention)
Ordering	FIFO only	FIFO topic	No	Per-shard
Latency	Near-real-time	Near-real-time	~0.5s	Real-time
Filter	Partial	Yes (per subscriber)	Yes (rules)	Must code
Best for	Decouple work	Fan-out notify	SaaS + AWS events	High-volume stream

ECS vs EKS vs Fargate

Feature	ECS (EC2)	ECS (Fargate)	EKS (EC2)	EKS (Fargate)
Node management	You	AWS	You	AWS
Kubernetes API	No	No	Yes	Yes
Spot support	Yes	Yes (Spot)	Yes	Limited
GPU	Yes	No	Yes	No
DaemonSets	No	No	Yes	No
Best for	AWS-native, GPU	Serverless containers	K8s expertise	K8s serverless

DR Strategy Comparison

Strategy	RTO	RPO	Cost	How
Backup & Restore	Hours	Hours	Lowest	Backup to S3, restore on disaster
Pilot Light	10-30 min	Minutes	Low	Core DB running, scale compute on fail
Warm Standby	Minutes	Seconds	Medium	Scaled-down full env, promote on fail
Multi-Site Active-Active	Near-zero	Near-zero	Highest	Full prod in 2+ regions, live traffic split

S3 Storage Classes

Class	Durability	AZs	Min Duration	Retrieval	Best for
Standard	11 9s	≥3	None	Instant (free)	Frequent access
Standard-IA	11 9s	≥3	30 days	Instant (fee)	Backups, DR
One Zone-IA	11 9s	1	30 days	Instant (fee)	Reproducible infrequent
Glacier Instant	11 9s	≥3	90 days	Instant (fee)	Archive quarterly access
Glacier Flexible	11 9s	≥3	90 days	1-12 hrs	Archive annual access
Glacier Deep Archive	11 9s	≥3	180 days	12-48 hrs	Compliance long-term
Intelligent-Tiering	11 9s	≥3	None	Instant (no fee)	Unknown access pattern
Express One Zone	99.95%	1	None	Single-digit ms	ML training, analytics

IAM Policy Evaluation Order

Step	Policy Type	Effect if Deny
1	Explicit Deny (any policy)	Stop — denied
2	Organization SCPs	Stop if not allowed
3	Resource-based policies	Grant if allows (same account)
4	Identity-based policies	Grant if allows
5	Permission boundaries	Restrict max permissions
6	Session policies	Further restrict
Default	Implicit deny	Denied if nothing allowed

Common Scenario Patterns

Read these

Scenario: High Availability Web App

ALB → Auto Scaling Group → EC2 across multiple AZs
RDS Multi-AZ for database (standby sync, auto failover)
Read replicas for read-heavy queries
ElastiCache (Redis) for session state
Route 53 health checks for regional failover
CloudFront for static assets / cache

Scenario: Event-Driven Processing

S3 upload → S3 Event → SQS → Lambda (decoupled, retry-able)
SQS DLQ for failed processing
SNS → multiple SQS queues for fan-out parallel processing
EventBridge for cross-service event routing
Kinesis for ordered high-throughput stream processing

Scenario: Secure Cross-Account Access

IAM role in account B with trust policy for account A
Account A principal calls AssumeRole → gets temp credentials
Permissions: role policy AND requesting principal must allow
For S3: resource policy (bucket policy) can grant cross-account alone
SCPs must allow in both accounts

Scenario: Cost Reduction

Spiky → On-demand + Spot (with Spot Fleet)
Steady-state → Compute Savings Plans or EC2 Reserved
Dev/test shutdown nights/weekends → EventBridge Scheduler
S3 old objects → Lifecycle to Glacier
EC2 over-provisioned → Compute Optimizer → Graviton
NAT Gateway costs high → VPC Endpoints for S3/DynamoDB
CloudFront → reduce S3 egress costs

Scenario: Disaster Recovery

RTO/RPO requirements drive strategy (Backup → Pilot Light → Warm → Active-Active)
Aurora Global = <1s RPO, <1min RTO between regions
DynamoDB Global Tables = multi-region active-active
Route 53 failover routing + health checks = DNS-level failover
CloudFormation / Infrastructure as Code = fast env recreation

Scenario: Data Lake Architecture

S3 (raw) → Glue ETL → S3 (curated) → Athena (query)
Lake Formation for column/row-level security + data catalog
Redshift Spectrum for SQL over S3 from Redshift
DataZone for cross-team data discovery and governance
Kinesis Firehose → S3 for streaming ingest

Scenario: Serverless API

API Gateway (HTTP API for low cost) → Lambda → DynamoDB
Cognito for auth (user pool → JWT → API GW authorizer)
Lambda Layers for shared dependencies
DAX for DynamoDB read caching
CloudFront in front of API GW for geographic distribution
SQS for async tasks that exceed Lambda timeout concern

Scenario: Security & Compliance

GuardDuty → EventBridge → Lambda → auto-remediation
Config Rules → track compliance drift
Security Hub → aggregate findings, CSPM score
Macie → scan S3 for PII before sharing
Inspector → scan EC2/Lambda/containers for CVEs
CloudTrail → all API calls, immutable to S3
WAF → rate limit, geo-block, OWASP rules

SAA-C03 · 65 questions · 130 min · Pass: 720/1000 · Scenario-based multiple choice
Sources: AWS Exam Guide, AWS Documentation, Tutorials Dojo, AWS re:Invent sessions 2023-2025