What Terraform Does Well
- Creates cloud-API resources.
- Manages their lifecycle.
- Stores desired state.
- Shows the diff against reality.
In this scope it has no real open-source competition.
What It Does Poorly
1. Kubernetes manifests
resource "kubernetes_deployment" "app" { metadata { name = "app" } spec {replicas = 3
# ... 200 lines of HCL instead of 30 lines of YAML
}
}
Problems:
- The kubernetes provider is weaker than native tools. It does not understand CRDs without extra steps, and reconciliation is not native.
- On update, Terraform does a PUT (replace) rather than a PATCH (merge). That behavior is non-standard in the k8s ecosystem.
- Kubernetes has its own configuration system (Helm charts, Kustomize) and its own ecosystem. Terraform sits outside it.
Use instead: kubectl + Kustomize, or Helm + ArgoCD/Flux. Reserve Terraform for the EKS cluster itself (control plane), not for the workloads running inside it.
2. Container images
resource "docker_image" "app" {name = "myapp:latest"
build {context = "./app"
}
}
A container image is a build-time artifact, not infrastructure. Builds belong in CI (GitHub Actions, Buildah, Kaniko), and results belong in a registry (ECR, Docker Hub). Terraform should reference an image tag, not build one.
Use instead: docker buildx, Packer (for AMIs), and a CI pipeline for builds.
3. Application deployment
resource "aws_codedeploy_deployment" "release" {deployment_group_name = aws_codedeploy_deployment_group.app.deployment_group_name
revision { ... }}
Circular, rolling, and canary deployments are the domain of tools like Spinnaker, ArgoCD, and Flagger. They handle:
- Health checks between batches.
- Auto-rollback.
- Flexible deployment strategies.
- Multi-environment promotion.
Terraform can trigger a deployment via aws_codedeploy_deployment, but that
is a one-shot create, not ongoing deployment management.
Use instead: a CD tool (Spinnaker, ArgoCD, Flux, Octopus). Use Terraform for the CodeDeploy application and group, not for the deployments themselves.
4. Config management inside instances
resource "null_resource" "configure_app" { provisioner "remote-exec" {inline = [
"sed -i 's/foo/bar/' /etc/app/config",
"systemctl restart app",
]
}
}
This is the job of Ansible, Chef, Puppet, or Salt. Terraform provisioners have no idempotency, no dry-run, no inventory management, and no retries.
Use instead: Ansible, or an immutable pattern where you rebuild the AMI with Packer and let Terraform launch fresh instances.
5. Database migrations
resource "null_resource" "db_migrate" { provisioner "local-exec" {command = "psql -f migrations/001.sql"
}
}
Schema migrations need to be:
- Versioned (Flyway, Liquibase, Alembic).
- Reversible (down migrations).
- Tracked separately from infrastructure.
Terraform has no concept of "what has already been applied." Its state records "this database exists with this configuration," nothing more.
Use instead: Flyway, Liquibase, or Alembic in CI. Terraform handles the RDS instance only.
6. Secret management
resource "aws_secretsmanager_secret_version" "db" {secret_string = "supersecret"
}
A secret in HCL is a secret in git. Terraform should not own secrets. It can read them via a data source. Creation and rotation belong in a separate flow.
See tf-secrets-in-state.
7. Long-running operations
If apply takes more than 30 minutes, either too much is being done in one pass or some of it does not belong in Terraform.
- Creating an EKS cluster: 15-20 min, acceptable.
- Creating EKS + 50 nodes + 200 manifests: move the manifests to Helm/ArgoCD.
The plan-review-apply cycle loses its value when apply runs for an hour. Nobody waits that long.
Smell Test
Signs that you are using Terraform in the wrong place:
null_resourceorterraform_datawithlocal-execmakes up more than 10 percent of your state. You are using Terraform as a bash orchestrator.- Apply takes more than 30 minutes. Too much work in one cycle.
- You frequently use
-replaceby hand. You want idempotency; a significant redesign is needed. - HCL contains app config (application YAML files, dotenv files). That is not infrastructure.
- Someone says "Terraform deploys our app." Apply is not deploy.
What Belongs to Terraform
A clean scope:
- VPC, subnets, routes, NAT, IGW.
- EC2 base AMI (without internal config; use Packer to build the AMI).
- RDS, EKS cluster, ECS cluster, Lambda function.
- S3, DynamoDB, CloudFront.
- IAM (roles, policies).
- Route53 records (top-level, TXT, MX, NS; weighted aliases for traffic management).
- CloudWatch alarms, log groups.
- Secrets Manager / SSM parameters (the container, not the contents).
That is an enormous surface area. Do not try to pull k8s workloads, app deployment, and config management into it. You will lose focus.
Pitfalls
-
"Terraform can do everything" is a myth. It can, but it should not. Architects who put everything in Terraform build a monolith state, slow apply runs, and a tedious pipeline.
-
The
null_resourcetemptation. "Just run this one command," easy. A year later you have 50 null_resources, each a special case, and nobody knows what any of them do. -
A provider exists does not mean you should use it. The kubernetes provider exists. The helm provider exists. Using them is not automatically a good idea.
-
HashiCorp pushes Stacks and more workflow features. That does not mean Terraform should do everything. It means Terraform works better in a multi-stack setup, but it is still about infra-API resources.
-
"We do everything in TF" is a religion. A good tool stack means choosing the right tool for each job. Full coverage with a single tool in large teams is cargo cult.
See Also in LinuxLab
- kubernetes-pod-lifecycle explains why K8s manifests are better deployed via ArgoCD/Flux/Helm, with Terraform left to the surrounding infrastructure (VPC, IRSA, RDS).
- helm-charts: the Helm provider in Terraform exists, but drift detection there is weak. Native Helm in CI is more flexible.
- image-signing-cosign: image builds and signing are a separate pipeline concern (Packer, docker build, cosign), not a Terraform provider task.