lesson ── terraform-production ── ~14 мин ── 6 шагов
The end of the production track. Drift is the gap between your HCL and the
cloud (someone edits by hand, default tags, a stray apply by another team).
You catch it with a scheduled job in CI: terraform plan -detailed-exitcode,
exit 2 means drift. In this lesson you build a baseline, break it through
aws-cli, and watch plan detect the drift.
интерактивный sandbox
Поднимется пара контейнеров: terraform 1.9 и localstack 3.8 в одной сети. В браузере откроется терминал, можно сразу terraform init. Каждый шаг проверяется автоматически. TTL 45 минут, без регистрации.
stack ── terraform · localstack · 1 GB RAM · самоуничтожается через 45 мин простоя
cd /home/student/tf-drift
cat > main.tf <<'EOF'
resource "aws_s3_bucket" "drift_demo" {bucket = "linuxlab-drift-demo"
tags = {ManagedBy = "terraform"
Owner = "student"
}
}
EOF
terraform init -no-color > /dev/null
terraform apply -auto-approve -no-color
The bucket is created, state matches reality. That is the baseline.
✓ Baseline is in place. State == cloud.
set +e
terraform plan -detailed-exitcode -no-color > /dev/null 2>&1
code=$?
set -e
echo "exit: $code"
It should be 0, clean, no drift. This is what we expect in production when nothing is broken.
If you add --refresh=false now, the exit code stays the same but the
check is weaker (see the pitfalls).
✓ Plan is clean. No drift, a clean cron pass.
"Someone went into the Console and edited the tags":
aws --endpoint-url=http://localstack:4566 \
s3api put-bucket-tagging \
--bucket linuxlab-drift-demo \
--tagging 'TagSet=[
{Key=ManagedBy,Value=manual}, {Key=Hacker,Value=was-here}]'
aws --endpoint-url=http://localstack:4566 \
s3api get-bucket-tagging --bucket linuxlab-drift-demo
The real bucket now has:
ManagedBy: manual (was terraform)Hacker: was-here (new)Owner goneTerraform state knows none of this.
✓ Drift introduced. Terraform will detect it next.
set +e
terraform plan -detailed-exitcode -no-color 2>&1 | tail -20
code=$?
set -e
echo "exit: $code"
It should show a diff (the tags differ) and exit 2. That is the drift signal.
In CI:
You can read the diff in detail:
terraform plan -no-color -out=drift.tfplan 2>&1 | grep -A20 "drift_demo" | head -40
You can see exactly what diverged, Terraform wants to return the tags to the HCL description.
✓ Drift caught. exit 2, the signal for a cron alert.
A production shell script for the scheduled job:
cat > drift-check.sh <<'EOF'
#!/usr/bin/env bash
set -uo pipefail
cd /home/student/tf-drift
terraform init -input=false -no-color > /dev/null
set +e
terraform plan \
-detailed-exitcode \
-input=false \
-no-color \
-lock-timeout=2m \
-out=drift.tfplan
code=$?
set -e
case $code in
0)
echo "drift-check: clean, no changes"
exit 0
;;
2)
echo "drift-check: DRIFT DETECTED"
terraform show -no-color drift.tfplan > drift.txt
# Here a webhook to Slack would go:
# curl -X POST -H 'Content-Type: application/json' \
# --data "{\"text\": \"Drift detected:\n$(cat drift.txt | head -50)\"}" \# "$SLACK_WEBHOOK_URL"
echo "--- begin drift ---"
head -30 drift.txt
echo "--- end drift ---"
exit 1 # CI treats drift as a failure
;;
*)
echo "drift-check: ERROR (exit $code)"
exit $code
;;
esac
EOF
chmod +x drift-check.sh
./drift-check.sh 2>&1 | tail -30
echo "script exit: $?"
It should show DRIFT DETECTED and exit 1 (because there is drift).
✓ The cron script is ready. In GitHub Actions this runs through a schedule cron.
OpenTofu keeps the CLI and state compatible with Terraform for the
commands in this step: migration usually goes through mv .terraform .terraform.bak; tofu init -upgrade. On a first switch, though, back up
the state and do a run on a feature branch, the differences cluster in
the newer features (variables in backend, state encryption, OCI
registry-backed modules). See tf-opentofu-parity for the full matrix.
There are two strategies:
1. Reconcile, apply returns the cloud to the HCL:
terraform apply -auto-approve drift.tfplan
This destroys the Hacker:was-here tag and restores ManagedBy:terraform
and Owner:student. Fits when the drift is unwanted.
2. Update HCL, the cloud is right:
# do nothing to the cloud, and in HCL add:
# tags = { ManagedBy = "manual", Owner = "student" }This is for when "someone edited the Console" but the change is wanted; you
legalize it in HCL. Then apply -refresh-only to sync state.
3. Ignore, there is drift, but it does not matter:
lifecycle {ignore_changes = [tags["Hacker"]]
}
Terraform stops reporting the Hacker tag. Use this when that tag is set
by another system (k8s-operator, AWS Config) and has nothing to do with
Terraform.
Reconcile it:
terraform apply -auto-approve -no-color > /dev/null
aws --endpoint-url=http://localstack:4566 \
s3api get-bucket-tagging --bucket linuxlab-drift-demo
The tags are back to the HCL version.
✓ Drift reconciled. The production track is done.
terraform plan sees drift only for resources that are in state.
If someone created a bucket by hand, it exists in the cloud, not in
state, and plan will not see it.
For that, driftctl:
# install
curl -L https://github.com/snyk/driftctl/releases/latest/download/driftctl_linux_amd64 \
-o /usr/local/bin/driftctl
chmod +x /usr/local/bin/driftctl
# scan
driftctl scan \
--from tfstate+file://terraform.tfstate \
--output console
It shows:
This is broader than terraform plan. In production the stack is: a cron
with terraform plan -detailed-exitcode (often) + driftctl scan (less
often, for example once a week).
The AWS alternative, AWS Config: a cloud-native service that logs every configuration change. Use it when you want cross-account / cross-team visibility, and an audit trail matters more than a terraform-specific signal.
See tf-drift-detection.
terraform plan -detailed-exitcode, exit 0 (clean), 1 (error),
2 (drift). A cron job in CI runs it once a day or hour, and on 2 it alerts
in Slack/PD/issue. The plan job reads state, it does not write, and the IAM
role is read-only.
команды
terraform plan -detailed-exitcode -no-colorthe canonical drift check. The exit code is the verdict.terraform plan -refresh-onlyrefresh only, no comparison with HCL, what changed in state from the cloud.aws s3api put-bucket-tagging --bucket X --tagging '...'an example of what a stray apply does, it creates drift.концепции