kb/advanced ── Advanced ── advanced

Provisioners, local-exec, remote-exec, and why you are better off without them

`local-exec`/`remote-exec` is a last-resort way to run a command during apply. HashiCorp recommends avoiding them: the approach is not idempotent, requires direct network access, and exit 0 is a weak success contract. Prefer cloud-init/user_data for bootstrap, Ansible after apply, native provider resources, or `terraform_data` with `triggers_replace` when you need a command that reacts to an input change.

view as markdownaka: terraform-provisioner, local-exec, remote-exec, terraform-data

What a provisioner is

A provisioner is a block inside a resource that runs a command at create or destroy time:

hcl
resource "aws_instance" "web" {
  ami           = "ami-..."
  instance_type = "t3.micro"
  provisioner "local-exec" {
    command = "echo ${self.public_ip} >> hosts.txt"
  }
  provisioner "remote-exec" {
    inline = [
      "sudo apt update",
      "sudo apt install -y nginx",
    ]
    connection {
      type        = "ssh"
      host        = self.public_ip
      user        = "ubuntu"
      private_key = file("~/.ssh/id_rsa")
    }
  }
}

There are three types:

  • local-exec: runs on the machine where Terraform is executing.
  • remote-exec: runs on the target resource over SSH or WinRM.
  • file: copies a file to the target resource.

Why HashiCorp recommends avoiding them

From the documentation (paraphrased): "provisioners are a last resort; they have no update model and they mix immutable infrastructure with imperative actions." The specific problems follow.

Not idempotent

hcl
provisioner "remote-exec" {
  inline = ["sudo apt install -y nginx"]
}

This runs once, at create time. If nginx is later removed by a direct session on the machine, Terraform does not know and will not run the provisioner again. To re-run it you must terraform apply -replace=aws_instance.web, which means destroy plus create, and that means downtime.

Network dependency

Terraform needs SSH access to the instance during apply. That requires:

  • The instance is in a private subnet and the Terraform runner is in the VPC or connected over VPN.
  • The security group allows SSH from the runner's IP.
  • The SSH key is available.

Any one of these can break independently of the Terraform config.

"Success" means exit 0

A provisioner is considered to have passed when its exit code is 0. That is a weak contract. The script may have failed to complete its work yet exited 0, or it may have produced side effects (such as creating an IAM role via the AWS CLI) that the state knows nothing about.

Destroy provisioners run on destroy

hcl
provisioner "local-exec" {
  when    = destroy
  command = "echo cleaning up"
}

This is deprecated. The real behavior is that a destroy provisioner may not run at all if the resource no longer exists, leaving Terraform stuck.

Complicates state

Every provisioner is attached to a resource. When you move a resource into a module with moved, the provisioner travels with it. That is sometimes unwanted.

Alternatives

1. cloud-init / user_data

Bootstrap EC2 through user_data:

hcl
resource "aws_instance" "web" {
  ami       = "ami-..."
  user_data = <<-EOF
    #!/bin/bash
    apt update
    apt install -y nginx
  EOF
}

This is idempotent at the instance level: user_data runs at creation. If you change it, Terraform will replace the instance because user_data is a ForceNew attribute (immutable field).

The cloud-init structured format (see tf-cloudinit-provider) gives more control: packages, write_files, runcmd, and multi-part sections.

2. Ansible (or another config-management tool)

Terraform provisions the bare infrastructure; Ansible configures it.

bash
terraform apply
terraform output -json instance_ips | jq -r '.[]' > inventory
ansible-playbook -i inventory site.yml

The split is clean: Terraform owns infrastructure, Ansible owns configuration. Each tool does what it is built for.

3. A dedicated resource from a provider

If you need to create something in the cloud, find the resource for it. Instead of local-exec "aws s3api create-bucket", use aws_s3_bucket. In 99% of cases a resource already exists.

4. terraform_data + triggers

When you genuinely need a command that reacts to a change in an input, the idiomatic approach is:

hcl
resource "terraform_data" "rebuild" {
  triggers_replace = [
    aws_lambda_function.demo.source_code_hash,
  ]
  provisioner "local-exec" {
    command = "echo 'lambda code changed, rebuilding cache'"
  }
}

terraform_data is a resource with no cloud backend. When any expression in triggers_replace changes, Terraform destroys and recreates the resource, which runs the provisioner. This is the idiomatic way to "do something when X changes."

Use this when:

  • No native provider resource exists.
  • You genuinely need a side effect when a value changes.
  • The side effect is itself idempotent (repeating it does not break anything).

Examples: busting a CDN cache (terraform_data plus curl commands), triggering CodeBuild (there are aws_codebuild_* resources, but sometimes you need an inline command).

5. Lambda + EventBridge

If the trigger belongs at runtime rather than at apply time, do not use Terraform for it. Lambda and EventBridge handle the orchestration; Terraform only describes the resources.

When a provisioner is still justified

A few scenarios where provisioners make sense:

  1. Vault init. Running vault operator init after a Vault cluster comes up. No provider does this, and init is a one-time operation.
  2. Debugging. In development, a local-exec with echo $instance_ip gives quick access to the value.
  3. Bootstrap-only secrets. Copying an SSH key when creating a bastion host (though the file provisioner is still a provisioner).

In these cases, document the reason in a comment.

Pitfalls

  • A provisioner runs ONCE at create time. It is not idempotent by design. Do not use it for ongoing configuration, only for initial bootstrap.

  • Destroy provisioners are deprecated. They will be removed in a future version. Do not write new ones.

  • on_failure = continue hides real problems. Apply will succeed with a green status while the infrastructure is broken. The default is fail, and that default is correct.

  • A connection block without an explicit user behaves differently across AMIs. Ubuntu uses ubuntu, Amazon Linux uses ec2-user, and so on. Do not guess; set the user explicitly.

  • remote-exec against a private IP does not work from the internet. The Terraform runner must have network access to the instance. AWS Session Manager can bypass SSH, but Terraform has no native support for it; you need custom scripts.

  • null_resource + local-exec is an outdated pattern. TF 1.4+ recommends terraform_data. The semantics are the same; the newer form is the current idiom.

  • Provisioner commands are NOT visible in the plan diff. When you change a command, the plan shows +/- terraform_data.X (replace) but not the commands themselves. CI cannot see what will execute. That is a real drawback for review.

See also in LinuxLab

  • ssh: remote-exec connects over SSH. Knowing how key pairs, agent forwarding, and timeouts work is essential when debugging a provisioner that hangs for five minutes.
  • cmd-systemctl: most tasks people reach for remote-exec to handle are better placed in a systemd unit or a cloud-init drop-in. That is the right tool, not Terraform.
  • signals: if a local-exec process hangs, Terraform sends SIGTERM when the timeout expires. The command must handle that signal, or the apply will sit in a long timeout.

§ commands

bash
terraform apply -replace=aws_instance.web

Force-replace the instance. This is the only way to re-run its provisioner.

bash
terraform apply -replace=terraform_data.rebuild

Reset the trigger resource so the associated command runs again.

bash
terraform plan | grep -B1 -A5 provisioner

Provisioners appear in the plan diff; this shows where they are present.

§ see also