Safe refactoring rules
Any refactoring involves close work with state. The rules:
- Back up state before you start:
terraform state pull > backup.json. - Run plan after every step. If plan shows destroy/create, stop. That step is not a refactoring, it is a resource replacement.
- One atomic refactoring per PR. Do not mix a rename with a new feature. Reviews lose meaning when you do.
- Apply to the test environment first. Run the refactoring in dev, verify the result, then stage, then prod.
Below are the concrete patterns.
Pattern 1: count to for_each
The problem
variable "envs" {type = list(string)
default = ["dev", "stage", "prod"]
}
resource "aws_s3_bucket" "logs" {count = length(var.envs)
bucket = "logs-${var.envs[count.index]}"}
Addresses: aws_s3_bucket.logs[0] (dev), [1] (stage), [2] (prod).
Remove dev from the list and prod becomes index 1, stage becomes index 0.
Plan shows a destroy on [2] and a recreate on [1] and [0]. That is
mass recreation, not the deletion of one bucket.
Solution: for_each
variable "envs" {type = set(string)
default = ["dev", "stage", "prod"]
}
resource "aws_s3_bucket" "logs" {for_each = var.envs
bucket = "logs-${each.key}"}
moved { from = aws_s3_bucket.logs[0], to = aws_s3_bucket.logs["dev"] }moved { from = aws_s3_bucket.logs[1], to = aws_s3_bucket.logs["stage"] }moved { from = aws_s3_bucket.logs[2], to = aws_s3_bucket.logs["prod"] }Addresses: aws_s3_bucket.logs["dev"], ["stage"], ["prod"]. Removing
dev now produces a targeted destroy of ["dev"] only. The other buckets
are untouched.
The pull request shows:
- HCL diff:
count to for_each,list to set,count.index to each.key. - moved blocks that explicitly document the migration.
After apply, you can delete the moved blocks.
Pattern 2: split files by domain
The problem
main.tf at 800 lines. Networking, compute, storage, IAM, all in one file.
A diff in a PR touches 200 lines and the reviewer is lost.
Solution
network.tf # vpc, subnets, route tables, NAT, IGW
compute.tf # ec2, launch templates, autoscaling
storage.tf # s3, ebs, efs
iam.tf # roles, policies, instance profiles
variables.tf # all variable blocks
outputs.tf # all output blocks
versions.tf # terraform + providers + required_providers
locals.tf # all locals
This is a file reorganization, not an HCL change. Terraform reads every
.tf file in the directory and merges them into one graph. It does not care
which resource lives in which file.
Apply it like this:
# cut and paste blocks into the new files
# plan
terraform plan
# should show No changes
No moved blocks. No migration. Only the git diff shows what happened.
Anti-pattern: one file per resource
s3.tf, iam.tf, s3_logs.tf, s3_data.tf is too granular. A useful
rule: one file per large domain (network/compute). Split a file when
it exceeds 500 lines. Merge when it falls below 50.
Pattern 3: extract module
The problem
The same combination appears three times in your HCL: a bucket, a bucket policy, versioning, and logging. Copy-paste means a change to one copy requires updating all three.
Solution
-
Create
modules/audited-bucket/with three.tffiles:hcl# modules/audited-bucket/main.tf
resource "aws_s3_bucket" "this" {bucket = var.name
}
resource "aws_s3_bucket_versioning" "this" {bucket = aws_s3_bucket.this.id
versioning_configuration { status = "Enabled" }}
# ...
-
In the root, replace the blocks with a module call:
hclmodule "logs" {source = "./modules/audited-bucket"
name = "linuxlab-logs"
}
-
Add moved blocks for every moved resource:
hclmoved { from = aws_s3_bucket.logs, to = module.logs.aws_s3_bucket.this }moved { from = aws_s3_bucket_versioning.logs, to = module.logs.aws_s3_bucket_versioning.this }# ...
-
Plan: you want
0 to add, 0 to change, 0 to destroy. If anything shows "to add", it is a new resource in the module. Check the diff. -
Apply. Afterward, delete the moved blocks.
When not to extract
If the resources are similar but not identical, you will end up parametrizing the module with ten variables. By the third parameter you usually realize the copy-paste is simpler, because the differences are real.
Rule of thumb: 3 or more repetitions with minimal differences: extract a module. Fewer than 3: leave it as is.
Pattern 4: merging small resources
The problem
You have five aws_security_group_rule resources in a security group. The
AWS provider 4.x supports only that type. Provider 5.x introduced
aws_vpc_security_group_ingress_rule as the recommended replacement.
Migration means changing the resource type. moved does not work across
types. This is a destroy and create.
Solution: import block
-
Find the IDs of the existing rules (
aws ec2 describe-security-group-rules). -
Declare the new resources in HCL:
hclresource "aws_vpc_security_group_ingress_rule" "web_from_alb" {security_group_id = aws_security_group.web.id
ip_protocol = "tcp"
from_port = 80
to_port = 80
referenced_security_group_id = aws_security_group.alb.id
}
import {to = aws_vpc_security_group_ingress_rule.web_from_alb
id = "sgr-0abc123..."
}
-
Remove the old
aws_security_group_ruleblock. -
Add
removed { from = aws_security_group_rule.web_from_alb, lifecycle { destroy = false } }. -
Plan: you see
importandremovedwith no destroy. Apply.
The cloud resource is not touched. State has moved from one type to another.
Pattern 5: isolating an environment into its own root
The problem
One root manages dev, stage, and prod through count/for_each and
var.env. Any apply touches all three. The state lock blocks parallel
work. When dev breaks, stage and prod cannot be updated either.
Solution
environments/
├── dev/
│ ├── main.tf # module "app" with env=dev
│ └── backend.tf
├── stage/
│ ├── main.tf
│ └── backend.tf
└── prod/
├── main.tf
└── backend.tf
modules/
└── app/ # shared module with the real resources
Migration:
- Create the new root
environments/dev/and import the existing resources. - Add
removed { destroy = false }in the old root. - Apply the new root to claim the resources. Apply the old root to clean up its state.
- Repeat for stage and prod.
This is a large refactoring. Work one environment at a time, not all at once.
Pattern 6: dead code cleanup
The problem
State contains resources that no longer exist in the cloud (deleted through the console, forgotten in HCL). Plan shows a destroy for a resource that does not exist. AWS returns 404.
Solution
terraform refresh
This reconciles state with the cloud and removes phantom entries. It is safe: read-only against the cloud.
For a specific resource:
terraform state rm aws_s3_bucket.deleted_already
Also delete the block from HCL. Otherwise the next apply will try to create it.
Checklist before a refactoring PR
- State backup done (
terraform state pull > backup.json) planis clean (0 to add, 0 to change, 0 to destroy)- All
moved/removed/importblocks are explained in the PR description - Applied in dev and verified
- Commit message uses "refactor: ...", distinct from "feat:" or "fix:"
- PR does not mix refactoring with a new feature
- The reason is stated: "consolidate buckets into audited-bucket module"
Pitfalls
-
Refactoring is not "improving the code". It means preserving behavior while changing structure. If plan shows destroy/create, you are not refactoring, you are rewriting. Stop and investigate.
-
A state backup does not replace S3 versioning. The backup is a file you created yourself. S3 versioning is automatic. Both are needed for refactoring.
-
moved/removeddo not work across different resource types. If the "refactoring" involves a type change, useimportandremoved, notmoved. -
CI should catch a destroy in a refactoring PR. In the pipeline, add
terraform plan -detailed-exitcode. Exit 2 (changes detected) on a PR tagged "refactor" is a red flag for the reviewer. -
Apply in dev, then stage, then prod. Not all at once. Wait for confirmation between environments. There are too many stories that start with "but we did check".