linuxlab.io
Tutorials▾
  • Linux & networking
    File system, processes, TCP/IP, BGP and OSPF
    →
  • Terraform & IaC
    HCL, state, plan/apply on a LocalStack sandbox
    →
  • Git & GitHub
    Object model, plumbing, branching, GitHub Actions
    →
All tutorials →
PricingAboutSign inCreate account
/
Intro
Lessons
Footer
linuxlab-TutorialsPricingAboutPrivacy & cookies
Copyright © 2026 LinuxLab. All rights reserved.
linuxlab.io
Tutorials▾
  • Linux & networking
    File system, processes, TCP/IP, BGP and OSPF
    →
  • Terraform & IaC
    HCL, state, plan/apply on a LocalStack sandbox
    →
  • Git & GitHub
    Object model, plumbing, branching, GitHub Actions
    →
All tutorials →
PricingAboutSign inCreate account
/
  • Введение
  • Уроки
  • How it works
  • База знаний
  • Шпаргалка
  • Capstone
  • Собеседование
home/terraform/lessons/tf-production-10-drift-detection

lesson ── terraform-production ── ~14 мин ── 6 шагов

Drift detection, scheduled plan, and alerting

The end of the production track. Drift is the gap between your HCL and the cloud (someone edits by hand, default tags, a stray apply by another team). You catch it with a scheduled job in CI: terraform plan -detailed-exitcode, exit 2 means drift. In this lesson you build a baseline, break it through aws-cli, and watch plan detect the drift.

▶ интерактивный sandbox

Поднимется пара контейнеров: terraform 1.9 и localstack 3.8 в одной сети. В браузере откроется терминал, можно сразу terraform init. Каждый шаг проверяется автоматически. TTL 45 минут, без регистрации.

запустить sandbox →

stack ── terraform · localstack · 1 GB RAM · самоуничтожается через 45 мин простоя

Шаги

  1. 01

    Build the baseline infra

    bash
    cd /home/student/tf-drift
    cat > main.tf <<'EOF'
    resource "aws_s3_bucket" "drift_demo" {
      bucket = "linuxlab-drift-demo"
      tags = {
        ManagedBy = "terraform"
        Owner     = "student"
      }
    }
    EOF
    terraform init -no-color > /dev/null
    terraform apply -auto-approve -no-color

    The bucket is created, state matches reality. That is the baseline.

    ✓ Baseline is in place. State == cloud.

  2. 02

    detailed-exitcode with no drift

    bash
    set +e
    terraform plan -detailed-exitcode -no-color > /dev/null 2>&1
    code=$?
    set -e
    echo "exit: $code"

    It should be 0, clean, no drift. This is what we expect in production when nothing is broken.

    If you add --refresh=false now, the exit code stays the same but the check is weaker (see the pitfalls).

    ✓ Plan is clean. No drift, a clean cron pass.

  3. 03

    Break reality through aws-cli

    "Someone went into the Console and edited the tags":

    bash
    aws --endpoint-url=http://localstack:4566 \
      s3api put-bucket-tagging \
      --bucket linuxlab-drift-demo \
      --tagging 'TagSet=[
        {Key=ManagedBy,Value=manual},
        {Key=Hacker,Value=was-here}
      ]'
    aws --endpoint-url=http://localstack:4566 \
      s3api get-bucket-tagging --bucket linuxlab-drift-demo

    The real bucket now has:

    • ManagedBy: manual (was terraform)
    • Hacker: was-here (new)
    • Owner gone

    Terraform state knows none of this.

    ✓ Drift introduced. Terraform will detect it next.

  4. 04

    plan -detailed-exitcode == 2

    bash
    set +e
    terraform plan -detailed-exitcode -no-color 2>&1 | tail -20
    code=$?
    set -e
    echo "exit: $code"

    It should show a diff (the tags differ) and exit 2. That is the drift signal.

    In CI:

    • exit 0 → clean, no action.
    • exit 1 → an error (state corrupt, provider failing, etc.).
    • exit 2 → drift; alert Slack/PD/issue.

    You can read the diff in detail:

    bash
    terraform plan -no-color -out=drift.tfplan 2>&1 | grep -A20 "drift_demo" | head -40

    You can see exactly what diverged, Terraform wants to return the tags to the HCL description.

    ✓ Drift caught. exit 2, the signal for a cron alert.

  5. 05

    A scheduled-drift script

    A production shell script for the scheduled job:

    bash
    cat > drift-check.sh <<'EOF'
    #!/usr/bin/env bash
    set -uo pipefail
    cd /home/student/tf-drift
    terraform init -input=false -no-color > /dev/null
    set +e
    terraform plan \
      -detailed-exitcode \
      -input=false \
      -no-color \
      -lock-timeout=2m \
      -out=drift.tfplan
    code=$?
    set -e
    case $code in
      0)
        echo "drift-check: clean, no changes"
        exit 0
        ;;
      2)
        echo "drift-check: DRIFT DETECTED"
        terraform show -no-color drift.tfplan > drift.txt
        # Here a webhook to Slack would go:
        # curl -X POST -H 'Content-Type: application/json' \
        #   --data "{\"text\": \"Drift detected:\n$(cat drift.txt | head -50)\"}" \
        #   "$SLACK_WEBHOOK_URL"
        echo "--- begin drift ---"
        head -30 drift.txt
        echo "--- end drift ---"
        exit 1  # CI treats drift as a failure
        ;;
      *)
        echo "drift-check: ERROR (exit $code)"
        exit $code
        ;;
    esac
    EOF
    chmod +x drift-check.sh
    ./drift-check.sh 2>&1 | tail -30
    echo "script exit: $?"

    It should show DRIFT DETECTED and exit 1 (because there is drift).

    ✓ The cron script is ready. In GitHub Actions this runs through a schedule cron.

    The same thing on OpenTofu

    OpenTofu keeps the CLI and state compatible with Terraform for the commands in this step: migration usually goes through mv .terraform .terraform.bak; tofu init -upgrade. On a first switch, though, back up the state and do a run on a feature branch, the differences cluster in the newer features (variables in backend, state encryption, OCI registry-backed modules). See tf-opentofu-parity for the full matrix.

    • → OpenTofu parity
  6. 06

    Reconcile vs ignore, what to do with drift

    There are two strategies:

    1. Reconcile, apply returns the cloud to the HCL:

    bash
    terraform apply -auto-approve drift.tfplan

    This destroys the Hacker:was-here tag and restores ManagedBy:terraform and Owner:student. Fits when the drift is unwanted.

    2. Update HCL, the cloud is right:

    bash
    # do nothing to the cloud, and in HCL add:
    # tags = { ManagedBy = "manual", Owner = "student" }

    This is for when "someone edited the Console" but the change is wanted; you legalize it in HCL. Then apply -refresh-only to sync state.

    3. Ignore, there is drift, but it does not matter:

    hcl
    lifecycle {
      ignore_changes = [tags["Hacker"]]
    }

    Terraform stops reporting the Hacker tag. Use this when that tag is set by another system (k8s-operator, AWS Config) and has nothing to do with Terraform.

    Reconcile it:

    bash
    terraform apply -auto-approve -no-color > /dev/null
    aws --endpoint-url=http://localstack:4566 \
      s3api get-bucket-tagging --bucket linuxlab-drift-demo

    The tags are back to the HCL version.

    ✓ Drift reconciled. The production track is done.

    When HCL does not cover everything that exists in the cloud

    terraform plan sees drift only for resources that are in state. If someone created a bucket by hand, it exists in the cloud, not in state, and plan will not see it.

    For that, driftctl:

    bash
    # install
    curl -L https://github.com/snyk/driftctl/releases/latest/download/driftctl_linux_amd64 \
      -o /usr/local/bin/driftctl
    chmod +x /usr/local/bin/driftctl
    # scan
    driftctl scan \
      --from tfstate+file://terraform.tfstate \
      --output console

    It shows:

    • Managed resources that have drift.
    • Unmanaged, in the cloud but not in state.
    • Deleted, in state but not in the cloud.

    This is broader than terraform plan. In production the stack is: a cron with terraform plan -detailed-exitcode (often) + driftctl scan (less often, for example once a week).

    The AWS alternative, AWS Config: a cloud-native service that logs every configuration change. Use it when you want cross-account / cross-team visibility, and an audit trail matters more than a terraform-specific signal.

    See tf-drift-detection.

    • → Drift detection theory
    • → Plan-as-artifact

Что ты узнал

terraform plan -detailed-exitcode, exit 0 (clean), 1 (error), 2 (drift). A cron job in CI runs it once a day or hour, and on 2 it alerts in Slack/PD/issue. The plan job reads state, it does not write, and the IAM role is read-only.

команды

  • terraform plan -detailed-exitcode -no-colorthe canonical drift check. The exit code is the verdict.
  • terraform plan -refresh-onlyrefresh only, no comparison with HCL, what changed in state from the cloud.
  • aws s3api put-bucket-tagging --bucket X --tagging '...'an example of what a stray apply does, it creates drift.

концепции

  • · detailed-exitcode 2, drift; 1, an error, not drift
  • · A read-only role matters, the drift job must never apply by accident
  • · False positives wear the team down; clear them with ignore_changes

← предыдущий

preconditions, postconditions, and the check block

следующий →

HCL hygiene: fmt, validate, console

Footer
linuxlab-
Copyright © 2026 LinuxLab. All rights reserved.
Tutorials
Pricing
About
Privacy & cookies