# Local Testing Strategy for AWS Infrastructure

---

## 1) Executive Summary

* **Goal:** Establish a **progressive, hybrid testing strategy** that moves the majority ofinfrastructure dev/test ( slow ~10 min/run and expensive ~$1,000/month) cycles **off AWS** (snapshot + LocalStack), while keeping a **final AWS sandbox** for production-parity checks.
    
* **Solution**: 3-tier progressive testing strategy (**Snapshot** → **LocalStack** → **AWS Sandbox**) that catches 80% of bugs locally at $0 cost in under 3 minutes.
    
* **Result**: 90% cost reduction ($1,840 → $60/month), 50% faster development cycles, 80% bug detection before production - without relaxing quality bars.
    

---

## 2) **Pragmatic LocalStack + Hybrid Testing Strategy** for AWS Infra

The approach balances **speed**, **cost discipline**, and **parity** by validating early in **snapshot** and **LocalStack** tiers and reserving **AWS sandbox** for production-parity verification and change control. The strategy is **tool-agnostic** (CDK/Terraform) and designed for **AI-assisted workflows with 1 human-in-the-loop (HITL)**.

* **Tier 1 - Snapshot** `$0`: Infrastructure syntax & structural checks (templates, policies, exports).
    
* **Tier 2 - LocalStack** `$0`: AWS service functional tests against emulated AWS APIs in Docker.
    
* **Tier 3 - AWS Sandbox** `$60/mo`: Production parity checks for services and behaviors not reliably emulated; change-managed with approvals.
    
* **When to Run**: *Tier 1 - Snapshot* (Every code change) + *Tier 2 - LocalStack* (Label "ready-to-merge") + *Tier 3 - AWS Sandbox* (Post-merge OR critical path)
    

---

## 3) FAQ (for CxO & Architecture Review)

**Q1. Why hybrid instead of AWS-only or emulator-only?**  
**A.** Hybrid gives **fast feedback** early and **real parity** late. The final sandbox protects against emulator gaps without paying the full cost and latency of AWS for every inner loop.

**Q2. Where does LocalStack fit—and where not?**  
**A.** Use LocalStack for **service-level integration tests** (e.g., object CRUD, table ops, API invocation) where API semantics are stable. Keep **organization/identity/cross-account** and **observability realism** in the AWS sandbox.

**Q3. How do we keep risk managed?**  
**A.** CI gates **block promotion** until snapshot + LocalStack pass; AWS sandbox runs with **approvals**, **audit** and **cleanup**. Evidence (logs, traces, diffs) is linked to the release.

**Q4. Will this work with CDK and Terraform?**  
**A.** Yes—snapshot tests (template assertions/plan inspection) + functional tests (SDK/CLI) + final AWS checks (deploy & verify) are supported for both stacks.

**Q5. How does this support AI-assisted development with one HITL?**  
**A.** Agents iterate autonomously in T1/T2; the HITL reviews **one** AWS sandbox change with full evidence instead of multiple partial attempts.

---

## 4) Customer Experience (Leadership view)

| Persona | Before | After |
| --- | --- | --- |
| **Engineer** | Slow inner loops tied to AWS deploys; noisy failures surface late. | Most failures found locally; AWS used sparingly for parity and approvals. |
| **Architect** | Hard to compare intent vs. deployed reality. | Tiered evidence (snapshots, functional traces, parity checks) aligns to design intent. |
| **HITL** | Multiple approvals per feature with incomplete context. | Single, higher-quality approval with consolidated evidence package. |
| **FinOps** | Testing spend opaque and hard to segment. | Dev/test AWS usage isolated to sandbox; local tiers outside cloud billing. |

* Snapshot diffs (template/plan)
    
* Local functional logs (SDK test outputs)
    
* Sandbox deploy logs + parity checks
    
* Cleanup receipts and change records
    

---

## 5) Success Metrics (business-first, team-owned)

> Track trends; do not hardcode targets in policy. Each team publishes a baseline and a quarterly goal.

* **Lead time (infra change → verified)** — median & p90
    
* **% Test cycles executed locally** — share of total cycles in T1/T2
    
* **Change approval latency** — time from “ready for sandbox” → approved result
    
* **Sandbox hygiene** — time-to-cleanup, orphan resource count
    
* **Escaped-defect rate** — defects found after sandbox vs. before
    

> All metrics and evidence are attached to release artifacts and reviewed in Change Advisory or equivalent forum.

---

## 6) Technical Architecture (at a glance)

### 6.1 **Tier 2: LocalStack Tests** — *CDK + Terraform on a developer machine*

**Technology:** LocalStack (Docker), AWS SDK clients (e.g., `S3Client`, `DynamoDBClient`, `LambdaClient`), **CDK CLI (cdklocal)**, **Terraform CLI** (configured for LocalStack endpoints)

```mermaid
%%{init: {
  "theme": "base",
  "themeVariables": {
    "background":"#0b1220",
    "primaryColor":"lightgray",
    "primaryTextColor":"#e6f3ff",
    "primaryBorderColor":"#86efac",
    "lineColor":"#a7f3d0",
    "textColor":"#e5e7eb"
  }
}}%%
flowchart LR
  %% ---------- Classes / Styles ----------
  classDef host fill:#0b1220,stroke:#34d399,color:#e6f3ff,stroke-dasharray:4 3,rx:8,ry:8
  classDef tool fill:#0f172a,stroke:#86efac,color:#e6f3ff,stroke-width:1.5px,rx:8,ry:8
  classDef test fill:#0f172a,stroke:#60a5fa,color:#e6f3ff,stroke-width:1.5px,rx:8,ry:8
  %% AWS service palettes (approx. enterprise-friendly)
  classDef svcS3  fill:#14532d,stroke:#22c55e,color:#e6f3ff,rx:8,ry:8
  classDef svcDDB fill:#0b2e53,stroke:#38bdf8,color:#e6f3ff,rx:8,ry:8
  classDef svcLAM fill:#4a1d06,stroke:#f59e0b,color:#fff7ed,rx:8,ry:8
  classDef svcAPIG fill:#3b0a3f,stroke:#e879f9,color:#fdf4ff,rx:8,ry:8
  classDef svcCFN fill:#3f1d2e,stroke:#fb7185,color:#ffe4e6,rx:8,ry:8

  %% ---------- Developer Host ----------
  subgraph DEV["👩‍💻 Developer Machine"]
    direction TB
    CDK["🟩 CDK CLI  (cdklocal)"]:::tool
    TF["🟪 Terraform CLI  (LocalStack endpoints)"]:::tool
    TEST["🧪 Jest / Integration Tests  (AWS SDK clients)"]:::test
  end
  class DEV host

  %% ---------- LocalStack Host ----------
  subgraph LST["🧱 LocalStack Container  (Port 4566)"]
    direction TB
    S3["🪣 S3"]:::svcS3
    DDB["🧊 DynamoDB"]:::svcDDB
    LMB["λ Lambda"]:::svcLAM
    APIG["🛣 API Gateway"]:::svcAPIG
    CFM["🏗 CloudFormation"]:::svcCFN
  end
  class LST host

  %% ---------- Connectivity ----------
  DEV -. Docker network .- LST

  %% Tool/Test → Emulated endpoints
  CDK -->|synth / deploy - emulated| S3
  CDK --> LMB
  CDK --> APIG
  CDK --> CFM

  TF  -->|plan / apply - emulated| S3
  TF  --> DDB
  TF  --> LMB
  TF  --> APIG
  TF  --> CFM

  TEST -->|CRUD / invoke / query| S3
  TEST --> DDB
  TEST --> LMB
  TEST --> APIG
```

> * **Why:** This keeps most functional checks local—faster feedback and lower cloud usage—while reserving AWS sandbox for production-parity behaviors that emulators don’t cover.
>     
> * **How:** Point CDK (`cdklocal`) and Terraform providers to the LocalStack endpoint; run integration tests against the emulated services; promote only after Tier-2 tests pass.
>     

---

### 6.2 **CI/CD Monitoring Checklist — CDK & Terraform (at-a-glance)**

A concise, weekly+monthly monitoring loop ensures the pipeline remains efficient and compliant without embedding static thresholds in this 2-pager. The **authoritative targets, tasks, and evidence paths** live in the “CI/CD Monitoring Checklist – CDK Infrastructure” reference.

```mermaid
%%{init:{
  "theme":"base",
  "themeVariables":{
    "background":"#0b1220",
    "primaryColor":"#60a5fa",
    "primaryTextColor":"#e6f3ff",
    "primaryBorderColor":"#93c5fd",
    "lineColor":"#bfdbfe",
    "textColor":"#e5e7eb"
  }
}}%%
flowchart TB
  %% ----------- Classes -----------
  classDef lane     fill:#0e1726,stroke:#93c5fd,stroke-width:2px,color:#e6f3ff,rx:8,ry:8
  classDef item     fill:#0b1220,stroke:#60a5fa,stroke-width:1.4px,color:#e6f3ff,rx:8,ry:8
  classDef util     fill:#0b1220,stroke:#22c55e,stroke-width:1.4px,color:#e6f3ff,rx:8,ry:8
  classDef qual     fill:#0b1220,stroke:#38bdf8,stroke-width:1.4px,color:#e6f3ff,rx:8,ry:8
  classDef rel      fill:#0b1220,stroke:#f59e0b,stroke-width:1.4px,color:#fff7ed,rx:8,ry:8
  classDef gov      fill:#0b1220,stroke:#fb7185,stroke-width:1.4px,color:#ffe4e6,rx:8,ry:8
  classDef legend   fill:#0f172a,stroke:#93c5fd,stroke-dasharray:4 3,color:#e6f3ff,rx:8,ry:8

  %% ========== LAYER 1: WEEKLY (four stacked lanes) ==========
  subgraph W["🗓 Weekly Review — CDK & Terraform"]
    direction LR

    %% Utilization & Spend
    subgraph WUTIL["Utilization & Spend"]
      direction TB
      U1["💸 Actions usage & cost footprint"]:::util
    end
    class WUTIL lane

    %% Quality & Coverage
    subgraph WQUAL["Quality & Coverage"]
      direction TB
      U2["⏱ Test durations by tier (T1 / T2 / T3)"]:::qual
      U3["✅ Pass rates & flaky analysis"]:::qual
      U4["🧭 Coverage trends & diffs"]:::qual
    end
    class WQUAL lane

    %% Reliability & Throughput
    subgraph WREL["Reliability & Throughput"]
      direction TB
      U5["🧯 Workflow failures — root causes"]:::rel
      U6["🗂 Artifacts & retention checks"]:::rel
    end
    class WREL lane

    %% Governance & Policy
    subgraph WGOV["Governance & Policy"]
      direction TB
      U7["🛡 Quality gates / constitutional checks"]:::gov
    end
    class WGOV lane
  end

  %% Handoff arrow (kept minimal and explicit)
  H[/"Roll-ups → insights → decisions"/]:::item

  %% ========== LAYER 2: MONTHLY (single stacked lane) ==========
  subgraph M["📅 Monthly Review — Executive Summary"]
    direction TB
    M1["📈 Cost & trend summary"]:::item
    M2["🧱 Stability & optimization opportunities"]:::item
    M3["⚖️ Policy / threshold revalidation"]:::item
  end
  class M lane

  %% Flow (Weekly lanes converge to handoff → Monthly)
  WUTIL --> H
  WQUAL --> H
  WREL  --> H
  WGOV  --> H
  H --> M

  %% ========== Legend (compact, pinned to the right) ==========
  subgraph LEG["Legend"]
    direction TB
    L1["🟩 Utilization"]:::util
    L2["🟦 Quality"]:::qual
    L3["🟧 Reliability"]:::rel
    L4["🟥 Governance"]:::gov
  end
  class LEG legend

  %% Position legend visually to the right (soft hint via invisible connectors)
  M -. reference .- LEG
```

* **Why:** Keeps leadership and teams aligned on throughput, stability, and cost—without coupling the 2-pager to specific numeric targets.
    
* **How:** Follow the referenced checklist for the exact checks, thresholds, evidence logging format, artifact retention, escalation paths, and review cadence. Store logs and summaries exactly where specified in the checklist doc.
    

---

### 6.3 CI/CD Gate Flow (tool-agnostic)

```mermaid
%%{init: {"theme":"base","themeVariables":{"background":"#0b1220","primaryColor":"#22c55e","lineColor":"#86efac","textColor":"#e5e7eb"}}}%%
sequenceDiagram
  autonumber
  participant Dev as Dev/Agent
  participant CI as CI Pipeline
  participant LCL as LocalStack
  participant AWS as AWS Sandbox
  Dev->>CI: Push change / open PR
  CI->>CI: Tier 1 Snapshot (assert/plan)
  CI-->>Dev: Fail? → fix & retry
  CI->>LCL: Tier 2 Functional tests (SDK/CLI)
  CI-->>Dev: Fail? → fix & retry
  CI->>AWS: Tier 3 Sandbox deploy & parity checks (with approvals)
  AWS-->>CI: Evidence (logs, diffs, cleanup)
  CI-->>Dev: Gate “Ready to Merge”
```

> Patterns and responsibilities are documented for snapshot assertions, LocalStack orchestration, sandbox deploy/cleanup, and evidence capture.

---

## 7) Risks & Mitigations (executive view)

| Risk | Why it matters | Mitigation |
| --- | --- | --- |
| **Emulator gaps vs. AWS behavior** | False green in T2 leads to late discovery. | **Mandatory T3** parity checks; keep a living list of unsupported features; add focused tests in sandbox. |
| **Sandbox sprawl/cost** | Orphaned resources and noisy accounts. | Automated teardown on CI completion; lifecycle rules; budget alerts; periodic hygiene jobs. |
| **Approval bottlenecks** | HITL delay negates inner-loop gains. | Consolidate evidence; one approval per change; rotate approvers; pre-approved patterns for low-risk changes. |
| **Signal quality** | Incomplete evidence weakens decisions. | Standard evidence kit (snapshots, logs, parity diffs, cleanups) attached to every PR. |

---

## Appendix (CXO-friendly quick reference)

### 🎯 What leaders approve

* The **process** (three tiers + gates + evidence), not a single tool.
    
* The **guardrails** (no direct prod, sandbox only with cleanup & audit).
    

### ✅ What teams do next

* Add **snapshot tests**, **LocalStack functional tests**, and a **sandbox parity job** to CI.
    
* Publish **baseline metrics** and review monthly in the same forum as changes.
    

### **References:**

* [https://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/test-aws-infra-localstack-terraform.html](https://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/test-aws-infra-localstack-terraform.html)
    
* [https://aws.amazon.com/blogs/aws/accelerate-serverless-testing-with-localstack-integration-in-vs-code-ide/](https://aws.amazon.com/blogs/aws/accelerate-serverless-testing-with-localstack-integration-in-vs-code-ide/)
    
* [https://blog.localstack.cloud/aws-toolkit-vscode-localstack/](https://blog.localstack.cloud/aws-toolkit-vscode-localstack/)
    

---

**Prepared for:** VP Engineering, Director of Platform Engineering, Principal Architects  
**Document type:** Internal 2-pager (Working Backwards format) • **Source:** [local-testing.md](http://local-testing.md) (team guidance & patterns)