The test pyramid does not fit infrastructure
The classic pyramid is 70% unit, 20% integration, 10% e2e. That works for an application. For Terraform it gets awkward.
- A unit test of a Terraform module is cheap (
.tftest.hcl+ mock_provider), but it checks something narrow: "the module declares what the HCL says it should declare." It will not protect you from a bug in the provider or in the AWS API. - Integration is an order of magnitude more expensive (you bring up LocalStack or AWS, which takes minutes), but it actually checks that the result is working infrastructure.
- E2E is the production deploy itself. The deployment already lives in the pipeline, so a separate e2e stage is usually duplication.
A realistic profile for a Terraform repo is closer to 40% unit, 40% integration on LocalStack, 20% policy/compliance, with e2e being the production pipeline.
What to test
1. The module contract
A module takes var.name and declares aws_s3_bucket.this. The test: you
pass "foo", and in the plan aws_s3_bucket.this.bucket == "foo". This is
insurance against an accidental rename of the variable.
# tests/contract.tftest.hcl
run "var_name_propagates" {command = plan
variables { name = "foo" } assert {condition = aws_s3_bucket.this.bucket == "foo"
error_message = "var.name does not reach bucket"
}
}
2. Business rules (policy)
Encryption is mandatory. The CostCenter tag is mandatory. Nothing is public by default. This is a company-level concern, not a module-level one. The better home for it is policy-as-code (tf-policy-as-code / terraform-compliance) running against the plan file in CI.
3. Complex expressions
A plain name = var.name is not worth a test. But this is:
locals { bucket_name = "${var.team}-${var.purpose}-${random_id.suffix.hex}"tags = merge(
var.default_tags,
{ Team = var.team, ManagedBy = "terraform" },)
}
There is logic here. The test: "when team=ai and purpose=logs, the name starts with ai-logs-".
4. Refactors (moved blocks)
You moved aws_s3_bucket.logs into a module. Write a test for a clean plan:
run "no_diff_after_refactor" {command = plan
# assert that the plan does nothing
}
Terraform may say "no changes" on its own, but without an assert in the test that fact is recorded nowhere.
5. Preconditions and postconditions
variable "env" {type = string
validation {condition = contains(["dev", "stage", "prod"], var.env)
error_message = "env must be dev/stage/prod"
}
}
A test with expect_failures = [var.env] and env = "xyz" guarantees the
validation fires. See tf-test-framework.
What not to test
1. That HCL describes the AWS API correctly
resource "aws_s3_bucket" "this" {bucket = "foo"
}
Testing that "on apply, a bucket named foo appears in AWS" is pointless. That is the job of HashiCorp and the AWS SDK. You are not testing the compiler, you are testing your own code.
2. Trivial pass-through
output "arn" {value = aws_s3_bucket.this.arn
}
A test that "output arn equals the ARN" is meaningless. There is nothing here that can break.
3. That the cloud behaves like the cloud
"After apply, the bucket really does return 200 on a HEAD request" is an operations-level smoke test, not a module test. You do it in production through monitoring, not in a test suite.
4. Performance
A test like "apply 100 resources in under 60 seconds" is always flaky. Terraform performance depends on provider latency and the network. If you want it, run a separate benchmark once a week, not on every PR.
Levels and tools
| Level | What | With |
|---|---|---|
| Static analysis | Syntax, formatting, common mistakes | terraform fmt -check, terraform validate, tf-checkov |
| Lint | Style, deprecated args, provider best practices | tflint with a rule set |
| Unit (module in isolation) | Module contract, naming, business rules | .tftest.hcl + mock_provider |
| Integration | Resources are really created, cross-resource interactions | .tftest.hcl with command = apply on LocalStack, or Terratest |
| Policy | Corporate rules (tags, security) | OPA+Rego, terraform-compliance, Checkov |
| E2E | Deploying the prod environment | The production pipeline itself |
The golden plan
One light but strong test: "the current code produces a plan that matches a saved reference text byte for byte." Any change (to HCL, the provider, or a module) shows up as a diff, and the reviewer sees exactly what changed.
Implementation:
terraform plan -out=plan.tfplan
terraform show -no-color plan.tfplan > plan.golden
You commit plan.golden. In CI:
terraform plan -out=plan.tfplan
terraform show -no-color plan.tfplan > plan.current
diff plan.golden plan.current || exit 1
When you change HCL, you update the golden. The PR then shows a diff in the HCL and a diff in the golden, so both sides are visible. This is useful on root modules where you expect zero diff.
How many tests
No more than you can justify. Signs you went too far:
- The tests take longer than the apply itself.
- They break more often from a provider upgrade than from your own code.
- There is more copy-paste in the tests than in the production code.
- Nobody can explain what this particular test is meant to catch.
Signs you tested too little:
- Boilerplate mistakes reach prod.
- Refactors break something that was not visible in the plan.
- Business rules get violated (a tag is forgotten, encryption is turned off).
Find the balance between the two.
Pitfalls
-
Tests are a liability, not an asset. Every test has to be maintained. An old test that nobody understands but everyone is afraid to delete is toxic debt.
-
Mocks do not catch integration bugs. A unit test with
mock_providercan pass while the real apply fails, for example because the AWS API requires a specific argument order or name format. -
A cheap test can be expensive to maintain. A scenario like "when var.foo=true, the plan shows 5 resources" is simple, but it breaks the moment you add a sixth. Test invariants instead ("every account has a KMS key") rather than exact counts.
-
Tests do not replace code review. Well-written HCL gets reviewed faster than bad code with 100% test coverage. Tests are an addition, not a substitute.
-
Production debugging is written down as tests. Every time something breaks in production, add a test that would have caught it. That is the one reliable way to grow a test suite that actually catches real bugs.