cloudinit provider: user_data for EC2 and more

Why cloudinit

On first boot, an EC2 instance runs cloud-init, which reads user_data (any text up to 16 KiB). These formats are supported:

cloud-config YAML: declarative commands (create a user, write a file, install a package).
shell script: #!/bin/bash and off you go.
boothook: runs before the cloud-init main run.
mime-multipart: several formats in one.

When the configuration is complex (you need both YAML and a script), people write the multi-part MIME by hand: boundaries, headers, base64-encode. That is tedious. The cloudinit provider assembles the MIME for you.

Install

hcl

terraform {

  required_providers {

    cloudinit = {

      source  = "hashicorp/cloudinit"

      version = "~> 2.3"

No configuration, everything is local.

cloudinit_config, the simple case

hcl

data "cloudinit_config" "bootstrap" {

  gzip          = false

  base64_encode = true

  part {

    content_type = "text/cloud-config"

    content      = yamlencode({

      package_update  = true

      package_upgrade = false

      packages = [

        "nginx",

        "jq",

      write_files = [

          path    = "/etc/nginx/conf.d/app.conf"

          content = file("${path.module}/templates/nginx.conf")

          owner   = "root:root"

          permissions = "0644"

},

      runcmd = [

        "systemctl enable nginx",

        "systemctl start nginx",

})

resource "aws_instance" "web" {

  ami           = data.aws_ami.ubuntu.id

  instance_type = "t3.micro"

  user_data     = data.cloudinit_config.bootstrap.rendered

What happened:

cloudinit_config assembled a MIME blob with one part (YAML).
base64_encode = true, so the result is base64 (EC2 user_data expects base64).
data.cloudinit_config.bootstrap.rendered is the ready blob.

Without the provider you would have to write MIME boundaries and base64-encode by hand.

Multi-part: YAML + bash

Sometimes you need both cloud-config (declarative) and a shell script (for what cloud-config cannot do).

hcl

data "cloudinit_config" "app" {

  gzip          = true

  base64_encode = true

  part {

    filename     = "init.cfg"

    content_type = "text/cloud-config"

    content      = yamlencode({

      users = [

          name                = "appuser"

          sudo                = "ALL=(ALL) NOPASSWD:ALL"

          shell               = "/bin/bash"

          ssh_authorized_keys = [tls_private_key.deploy.public_key_openssh]

},

      package_update = true

      packages       = ["docker.io", "awscli"]

})

  part {

    filename     = "register.sh"

    content_type = "text/x-shellscript"

    content      = templatefile("${path.module}/scripts/register.sh.tpl", {

      env       = var.env

      cluster   = aws_ecs_cluster.app.name

})

Two parts: cloud-config plus shell. Cloud-init runs the YAML first (create the user, install packages), then the shell (register in the ECS cluster through the AWS CLI).

`gzip = true`

user_data on EC2 has a 16 KiB limit. Gzip is effective: a 30KB config compresses to 5-7KB. Cloud-init unpacks it on its own.

templatefile for dynamic content

cloudinit and templatefile are an ideal pair. A file template with variables:

templates/register.sh.tpl:

bash

#!/usr/bin/env bash

set -euo pipefail

ENV="${env}"

CLUSTER="${cluster}"

echo "Registering in $CLUSTER ($ENV)" >> /var/log/register.log

aws ecs register-container-instance \

  --cluster "$CLUSTER" \

  --instance-identity-document "$(curl -s http://169.254.169.254/latest/dynamic/instance-identity/document)"

In HCL:

hcl

part {

  content_type = "text/x-shellscript"

  content      = templatefile("${path.module}/templates/register.sh.tpl", {

    env     = var.env

    cluster = aws_ecs_cluster.app.name

})

templatefile substitutes ${env} and ${cluster} at plan time. The result is bash that is ready to run.

Full example: EC2 with a stack

hcl

resource "tls_private_key" "deploy" {

  algorithm = "ED25519"

resource "aws_key_pair" "deploy" {

  key_name   = "deploy"

  public_key = tls_private_key.deploy.public_key_openssh

data "cloudinit_config" "web" {

  gzip          = true

  base64_encode = true

  part {

    content_type = "text/cloud-config"

    content = yamlencode({

      users = [{

        name                = "deploy"

        sudo                = "ALL=(ALL) NOPASSWD:ALL"

        ssh_authorized_keys = [tls_private_key.deploy.public_key_openssh]

}]

      package_update = true

      packages       = ["nginx"]

})

  part {

    content_type = "text/x-shellscript"

    content = templatefile("${path.module}/templates/configure.sh.tpl", {

      site_name = var.site_name

})

resource "aws_instance" "web" {

  ami                    = data.aws_ami.ubuntu.id

  instance_type          = "t3.micro"

  key_name               = aws_key_pair.deploy.key_name

  vpc_security_group_ids = [aws_security_group.web.id]

  user_data              = data.cloudinit_config.web.rendered

  tags = {

    Name = "${var.site_name}-web"

  lifecycle {

    ignore_changes = [user_data]  # see Gotchas

Gotchas

user_data runs ONCE. Only on the first boot. Change user_data and Terraform wants to recreate the instance. If you do not want that (you just have a new config for future instances), use lifecycle.ignore_changes = [user_data]. Otherwise any edit to the scripts means a planned teardown of the fleet.
The 16 KiB limit. Before base64-encoding. With gzip = true much more fits, but it is still not a file store. Pull large artifacts from S3 inside the user_data itself.
Errors in cloud-config are invisible to Terraform. The apply succeeds, the instance starts, the user_data "ran." For what is inside, run tail -f /var/log/cloud-init.log on the instance itself. Terraform does not know that one of your packages failed to install.
YAML indentation. yamlencode() solves most of it, but if you write inline YAML the indentation is critical. Cloud-init YAML is strict.
runcmd runs AFTER packages. If runcmd depends on an installed package, you are fine. If it runs before, it fails. bootcmd runs earlier.
gzip is not for everyone. Old cloud-init (CentOS 6, Amazon Linux 1) may fail to unpack it. A current one (Ubuntu 20+, AL2/AL2023) is OK.
Sensitive data in user_data. If a part holds a password or a token, mark the output sensitive and protect the state. cloudinit_config will not mark it for you.
Test locally with cloud-init schema --config-file .... The cloud-init utility can validate the YAML before it goes to the instance. Useful in pre-commit.

Why cloudinit

On first boot, an EC2 instance runs cloud-init, which reads user_data (any text up to 16 KiB). These formats are supported:

cloud-config YAML: declarative commands (create a user, write a file, install a package).
shell script: #!/bin/bash and off you go.
boothook: runs before the cloud-init main run.
mime-multipart: several formats in one.

Install

hcl

terraform {

  required_providers {

    cloudinit = {

      source  = "hashicorp/cloudinit"

      version = "~> 2.3"

No configuration, everything is local.

cloudinit_config, the simple case

hcl

data "cloudinit_config" "bootstrap" {

  gzip          = false

  base64_encode = true

  part {

    content_type = "text/cloud-config"

    content      = yamlencode({

      package_update  = true

      package_upgrade = false

      packages = [

        "nginx",

        "jq",

      write_files = [

          path    = "/etc/nginx/conf.d/app.conf"

          content = file("${path.module}/templates/nginx.conf")

          owner   = "root:root"

          permissions = "0644"

},

      runcmd = [

        "systemctl enable nginx",

        "systemctl start nginx",

})

resource "aws_instance" "web" {

  ami           = data.aws_ami.ubuntu.id

  instance_type = "t3.micro"

  user_data     = data.cloudinit_config.bootstrap.rendered

What happened:

cloudinit_config assembled a MIME blob with one part (YAML).
base64_encode = true, so the result is base64 (EC2 user_data expects base64).
data.cloudinit_config.bootstrap.rendered is the ready blob.

Without the provider you would have to write MIME boundaries and base64-encode by hand.

Multi-part: YAML + bash

Sometimes you need both cloud-config (declarative) and a shell script (for what cloud-config cannot do).

hcl

data "cloudinit_config" "app" {

  gzip          = true

  base64_encode = true

  part {

    filename     = "init.cfg"

    content_type = "text/cloud-config"

    content      = yamlencode({

      users = [

          name                = "appuser"

          sudo                = "ALL=(ALL) NOPASSWD:ALL"

          shell               = "/bin/bash"

          ssh_authorized_keys = [tls_private_key.deploy.public_key_openssh]

},

      package_update = true

      packages       = ["docker.io", "awscli"]

})

  part {

    filename     = "register.sh"

    content_type = "text/x-shellscript"

    content      = templatefile("${path.module}/scripts/register.sh.tpl", {

      env       = var.env

      cluster   = aws_ecs_cluster.app.name

})

Two parts: cloud-config plus shell. Cloud-init runs the YAML first (create the user, install packages), then the shell (register in the ECS cluster through the AWS CLI).

`gzip = true`

user_data on EC2 has a 16 KiB limit. Gzip is effective: a 30KB config compresses to 5-7KB. Cloud-init unpacks it on its own.

templatefile for dynamic content

cloudinit and templatefile are an ideal pair. A file template with variables:

templates/register.sh.tpl:

bash

#!/usr/bin/env bash

set -euo pipefail

ENV="${env}"

CLUSTER="${cluster}"

echo "Registering in $CLUSTER ($ENV)" >> /var/log/register.log

aws ecs register-container-instance \

  --cluster "$CLUSTER" \

  --instance-identity-document "$(curl -s http://169.254.169.254/latest/dynamic/instance-identity/document)"

In HCL:

hcl

part {

  content_type = "text/x-shellscript"

  content      = templatefile("${path.module}/templates/register.sh.tpl", {

    env     = var.env

    cluster = aws_ecs_cluster.app.name

})

templatefile substitutes ${env} and ${cluster} at plan time. The result is bash that is ready to run.

Full example: EC2 with a stack

hcl

resource "tls_private_key" "deploy" {

  algorithm = "ED25519"

resource "aws_key_pair" "deploy" {

  key_name   = "deploy"

  public_key = tls_private_key.deploy.public_key_openssh

data "cloudinit_config" "web" {

  gzip          = true

  base64_encode = true

  part {

    content_type = "text/cloud-config"

    content = yamlencode({

      users = [{

        name                = "deploy"

        sudo                = "ALL=(ALL) NOPASSWD:ALL"

        ssh_authorized_keys = [tls_private_key.deploy.public_key_openssh]

}]

      package_update = true

      packages       = ["nginx"]

})

  part {

    content_type = "text/x-shellscript"

    content = templatefile("${path.module}/templates/configure.sh.tpl", {

      site_name = var.site_name

})

resource "aws_instance" "web" {

  ami                    = data.aws_ami.ubuntu.id

  instance_type          = "t3.micro"

  key_name               = aws_key_pair.deploy.key_name

  vpc_security_group_ids = [aws_security_group.web.id]

  user_data              = data.cloudinit_config.web.rendered

  tags = {

    Name = "${var.site_name}-web"

  lifecycle {

    ignore_changes = [user_data]  # see Gotchas

Gotchas

user_data runs ONCE. Only on the first boot. Change user_data and Terraform wants to recreate the instance. If you do not want that (you just have a new config for future instances), use lifecycle.ignore_changes = [user_data]. Otherwise any edit to the scripts means a planned teardown of the fleet.
The 16 KiB limit. Before base64-encoding. With gzip = true much more fits, but it is still not a file store. Pull large artifacts from S3 inside the user_data itself.
Errors in cloud-config are invisible to Terraform. The apply succeeds, the instance starts, the user_data "ran." For what is inside, run tail -f /var/log/cloud-init.log on the instance itself. Terraform does not know that one of your packages failed to install.
YAML indentation. yamlencode() solves most of it, but if you write inline YAML the indentation is critical. Cloud-init YAML is strict.
runcmd runs AFTER packages. If runcmd depends on an installed package, you are fine. If it runs before, it fails. bootcmd runs earlier.
gzip is not for everyone. Old cloud-init (CentOS 6, Amazon Linux 1) may fail to unpack it. A current one (Ubuntu 20+, AL2/AL2023) is OK.
Sensitive data in user_data. If a part holds a password or a token, mark the output sensitive and protect the state. cloudinit_config will not mark it for you.
Test locally with cloud-init schema --config-file .... The cloud-init utility can validate the YAML before it goes to the instance. Useful in pre-commit.

cloudinit provider: user_data for EC2 and more

Why cloudinit

Install

cloudinit_config, the simple case

Multi-part: YAML + bash

`gzip = true`

templatefile for dynamic content

Full example: EC2 with a stack

Gotchas

§ команды

§ см. также

cloudinit provider: user_data for EC2 and more

Why cloudinit

Install

cloudinit_config, the simple case

Multi-part: YAML + bash

`gzip = true`

templatefile for dynamic content

Full example: EC2 with a stack

Gotchas

§ команды

§ см. также