linuxlab.io
Tutorials▾
  • Linux & networking
    File system, processes, TCP/IP, BGP and OSPF
    →
  • Terraform & IaC
    HCL, state, plan/apply on a LocalStack sandbox
    →
  • Git & GitHub
    Object model, plumbing, branching, GitHub Actions
    →
All tutorials →
PricingAboutSign inCreate account
/
Intro
Lessons
Footer
linuxlab-TutorialsPricingAboutPrivacy & cookies
Copyright © 2026 LinuxLab. All rights reserved.
linuxlab.io
Tutorials▾
  • Linux & networking
    File system, processes, TCP/IP, BGP and OSPF
    →
  • Terraform & IaC
    HCL, state, plan/apply on a LocalStack sandbox
    →
  • Git & GitHub
    Object model, plumbing, branching, GitHub Actions
    →
All tutorials →
PricingAboutSign inCreate account
/
  • Introduction
  • Lessons
  • How it works
  • Knowledge base
  • Cheat sheet
  • Capstone
  • Interview prep
home/terraform/kb/Providers/tf-cloudinit-provider

kb/providers ── Providers ── intermediate

cloudinit provider: user_data for EC2 and more

The `cloudinit` provider builds a multi-part MIME blob for EC2 `user_data`. `data "cloudinit_config"` takes several `part` blocks (cloud-config YAML, shell-script, jinja, and so on) and packs them into one blob. It replaces hand base64-encoding of a single string and lets you assemble the config from pieces.

view as markdownaka: terraform-cloudinit, terraform-user-data

Why cloudinit

On first boot, an EC2 instance runs cloud-init, which reads user_data (any text up to 16 KiB). These formats are supported:

  • cloud-config YAML: declarative commands (create a user, write a file, install a package).
  • shell script: #!/bin/bash and off you go.
  • boothook: runs before the cloud-init main run.
  • mime-multipart: several formats in one.

When the configuration is complex (you need both YAML and a script), people write the multi-part MIME by hand: boundaries, headers, base64-encode. That is tedious. The cloudinit provider assembles the MIME for you.

Install

hcl
terraform {
  required_providers {
    cloudinit = {
      source  = "hashicorp/cloudinit"
      version = "~> 2.3"
    }
  }
}

No configuration, everything is local.

cloudinit_config, the simple case

hcl
data "cloudinit_config" "bootstrap" {
  gzip          = false
  base64_encode = true
  part {
    content_type = "text/cloud-config"
    content      = yamlencode({
      package_update  = true
      package_upgrade = false
      packages = [
        "nginx",
        "jq",
      ]
      write_files = [
        {
          path    = "/etc/nginx/conf.d/app.conf"
          content = file("${path.module}/templates/nginx.conf")
          owner   = "root:root"
          permissions = "0644"
        },
      ]
      runcmd = [
        "systemctl enable nginx",
        "systemctl start nginx",
      ]
    })
  }
}
resource "aws_instance" "web" {
  ami           = data.aws_ami.ubuntu.id
  instance_type = "t3.micro"
  user_data     = data.cloudinit_config.bootstrap.rendered
}

What happened:

  • cloudinit_config assembled a MIME blob with one part (YAML).
  • base64_encode = true, so the result is base64 (EC2 user_data expects base64).
  • data.cloudinit_config.bootstrap.rendered is the ready blob.

Without the provider you would have to write MIME boundaries and base64-encode by hand.

Multi-part: YAML + bash

Sometimes you need both cloud-config (declarative) and a shell script (for what cloud-config cannot do).

hcl
data "cloudinit_config" "app" {
  gzip          = true
  base64_encode = true
  part {
    filename     = "init.cfg"
    content_type = "text/cloud-config"
    content      = yamlencode({
      users = [
        {
          name                = "appuser"
          sudo                = "ALL=(ALL) NOPASSWD:ALL"
          shell               = "/bin/bash"
          ssh_authorized_keys = [tls_private_key.deploy.public_key_openssh]
        },
      ]
      package_update = true
      packages       = ["docker.io", "awscli"]
    })
  }
  part {
    filename     = "register.sh"
    content_type = "text/x-shellscript"
    content      = templatefile("${path.module}/scripts/register.sh.tpl", {
      env       = var.env
      cluster   = aws_ecs_cluster.app.name
    })
  }
}

Two parts: cloud-config plus shell. Cloud-init runs the YAML first (create the user, install packages), then the shell (register in the ECS cluster through the AWS CLI).

gzip = true

user_data on EC2 has a 16 KiB limit. Gzip is effective: a 30KB config compresses to 5-7KB. Cloud-init unpacks it on its own.

templatefile for dynamic content

cloudinit and templatefile are an ideal pair. A file template with variables:

templates/register.sh.tpl:

bash
#!/usr/bin/env bash
set -euo pipefail
ENV="${env}"
CLUSTER="${cluster}"
echo "Registering in $CLUSTER ($ENV)" >> /var/log/register.log
aws ecs register-container-instance \
  --cluster "$CLUSTER" \
  --instance-identity-document "$(curl -s http://169.254.169.254/latest/dynamic/instance-identity/document)"

In HCL:

hcl
part {
  content_type = "text/x-shellscript"
  content      = templatefile("${path.module}/templates/register.sh.tpl", {
    env     = var.env
    cluster = aws_ecs_cluster.app.name
  })
}

templatefile substitutes ${env} and ${cluster} at plan time. The result is bash that is ready to run.

Full example: EC2 with a stack

hcl
resource "tls_private_key" "deploy" {
  algorithm = "ED25519"
}
resource "aws_key_pair" "deploy" {
  key_name   = "deploy"
  public_key = tls_private_key.deploy.public_key_openssh
}
data "cloudinit_config" "web" {
  gzip          = true
  base64_encode = true
  part {
    content_type = "text/cloud-config"
    content = yamlencode({
      users = [{
        name                = "deploy"
        sudo                = "ALL=(ALL) NOPASSWD:ALL"
        ssh_authorized_keys = [tls_private_key.deploy.public_key_openssh]
      }]
      package_update = true
      packages       = ["nginx"]
    })
  }
  part {
    content_type = "text/x-shellscript"
    content = templatefile("${path.module}/templates/configure.sh.tpl", {
      site_name = var.site_name
    })
  }
}
resource "aws_instance" "web" {
  ami                    = data.aws_ami.ubuntu.id
  instance_type          = "t3.micro"
  key_name               = aws_key_pair.deploy.key_name
  vpc_security_group_ids = [aws_security_group.web.id]
  user_data              = data.cloudinit_config.web.rendered
  tags = {
    Name = "${var.site_name}-web"
  }
  lifecycle {
    ignore_changes = [user_data]  # see Gotchas
  }
}

Gotchas

  • user_data runs ONCE. Only on the first boot. Change user_data and Terraform wants to recreate the instance. If you do not want that (you just have a new config for future instances), use lifecycle.ignore_changes = [user_data]. Otherwise any edit to the scripts means a planned teardown of the fleet.

  • The 16 KiB limit. Before base64-encoding. With gzip = true much more fits, but it is still not a file store. Pull large artifacts from S3 inside the user_data itself.

  • Errors in cloud-config are invisible to Terraform. The apply succeeds, the instance starts, the user_data "ran." For what is inside, run tail -f /var/log/cloud-init.log on the instance itself. Terraform does not know that one of your packages failed to install.

  • YAML indentation. yamlencode() solves most of it, but if you write inline YAML the indentation is critical. Cloud-init YAML is strict.

  • runcmd runs AFTER packages. If runcmd depends on an installed package, you are fine. If it runs before, it fails. bootcmd runs earlier.

  • gzip is not for everyone. Old cloud-init (CentOS 6, Amazon Linux 1) may fail to unpack it. A current one (Ubuntu 20+, AL2/AL2023) is OK.

  • Sensitive data in user_data. If a part holds a password or a token, mark the output sensitive and protect the state. cloudinit_config will not mark it for you.

  • Test locally with cloud-init schema --config-file .... The cloud-init utility can validate the YAML before it goes to the instance. Useful in pre-commit.

§ команды

bash
terraform console -chdir=.

You can pull data.cloudinit_config.x.rendered and base64 -d it to see the final MIME.

bash
echo $rendered | base64 -d | gunzip | head -50

See the assembled multi-part with your own eyes. Useful when debugging 'why did cloud-init not run'.

bash
cloud-init schema --config-file cloud-config.yaml

Local validation of cloud-config YAML before user_data flies to the instance.

§ см. также

  • tf-utility-providersUtility providers: random, time, null, terraform_dataProviders that do not manage a cloud, they help HCL itself. `random` generates IDs and passwords. `time` handles delays and timestamp marks. `null` is the deprecated "non-resource" for triggers. `terraform_data` is the modern replacement for `null_resource`, built into Terraform. Each one removes a specific limitation of the declarative approach.
  • tf-archive-external-httparchive, external, http: outside data in HCLThree providers for pulling data into Terraform from the outside. `archive` packs files into a zip (lambda code, layers). `external` runs any script with JSON I/O. `http` does a GET request to a URL and parses the response. All three are data sources: they read, they do not write. They are useful where declarative HCL falls short.
  • aws-providerAWS provider: configuration and where Terraform finds your keysThe AWS provider looks for credentials in several places in order: env variables, ~/.aws/credentials, the instance IAM role. Usually `aws configure` locally or a role on EC2 is enough, and you configure nothing else.
  • tf-resource-blockResource block: the main building block of TerraformA resource block tells Terraform "create this thing in the cloud." It has three parts: the resource type (what it is), the name (how you refer to it internally), and the arguments (how to configure it). Writing these blocks is what you spend 90% of your time doing in Terraform.
Footer
linuxlab-
Copyright © 2026 LinuxLab. All rights reserved.
Tutorials
Pricing
About
Privacy & cookies