linuxlab.io
Tutorials▾
  • Linux & networking
    File system, processes, TCP/IP, BGP and OSPF
    →
  • Terraform & IaC
    HCL, state, plan/apply on a LocalStack sandbox
    →
  • Git & GitHub
    Object model, plumbing, branching, GitHub Actions
    →
All tutorials →
PricingAboutSign inCreate account
/
Intro
Lessons
Footer
linuxlab-TutorialsPricingAboutPrivacy & cookies
Copyright © 2026 LinuxLab. All rights reserved.
linuxlab.io
Tutorials▾
  • Linux & networking
    File system, processes, TCP/IP, BGP and OSPF
    →
  • Terraform & IaC
    HCL, state, plan/apply on a LocalStack sandbox
    →
  • Git & GitHub
    Object model, plumbing, branching, GitHub Actions
    →
All tutorials →
PricingAboutSign inCreate account
/
  • Introduction
  • Lessons
  • How it works
  • Knowledge base
  • Cheat sheet
  • Capstone
  • Interview prep
home/terraform/kb/Providers/tf-archive-external-http

kb/providers ── Providers ── intermediate

archive, external, http: outside data in HCL

Three providers for pulling data into Terraform from the outside. `archive` packs files into a zip (lambda code, layers). `external` runs any script with JSON I/O. `http` does a GET request to a URL and parses the response. All three are data sources: they read, they do not write. They are useful where declarative HCL falls short.

view as markdownaka: terraform-archive, terraform-external, terraform-http-data

Why

HCL is declarative. Sometimes you need to build something: pack a directory into a zip, read JSON from a CI secret store, ask ifconfig.io for your current IP. These three data providers close that gap while you stay in the "everything is described in HCL" paradigm.

archive_file

Packs local files into a zip. The main case is aws_lambda_function.filename.

hcl
terraform {
  required_providers {
    archive = {
      source  = "hashicorp/archive"
      version = "~> 2.6"
    }
  }
}
data "archive_file" "lambda" {
  type        = "zip"
  source_file = "${path.module}/lambda/handler.py"
  output_path = "${path.module}/lambda.zip"
}
resource "aws_lambda_function" "demo" {
  function_name    = "demo"
  filename         = data.archive_file.lambda.output_path
  source_code_hash = data.archive_file.lambda.output_base64sha256
  handler          = "handler.main"
  runtime          = "python3.12"
  role             = aws_iam_role.lambda.arn
}

The key point:

  • output_base64sha256 is the hash of the zip contents. You pass it to Lambda as source_code_hash. When the code changes, the hash changes, Lambda notices, and it redeploys. Without this, Terraform never sees the change in your sources and never updates the function.

A whole directory

hcl
data "archive_file" "lambda" {
  type        = "zip"
  source_dir  = "${path.module}/lambda/"
  output_path = "${path.module}/lambda.zip"
  excludes = [
    "__pycache__",
    "*.pyc",
    "tests/**",
  ]
}

Handy for functions with requirements/node_modules.

Inline source

hcl
data "archive_file" "config" {
  type        = "zip"
  output_path = "${path.module}/config.zip"
  source {
    content  = jsonencode({ feature_flags = { ... } })
    filename = "config.json"
  }
}

The file is generated from an HCL value. Useful for config payloads that are computed dynamically.

external

Runs an arbitrary script. The script reads JSON from stdin and writes JSON to stdout.

hcl
terraform {
  required_providers {
    external = {
      source  = "hashicorp/external"
      version = "~> 2.3"
    }
  }
}
data "external" "git_info" {
  program = ["bash", "${path.module}/scripts/git-info.sh"]
  query = {
    repo = path.cwd
  }
}
output "git_commit" {
  value = data.external.git_info.result.commit
}

scripts/git-info.sh:

bash
#!/usr/bin/env bash
set -euo pipefail
eval "$(jq -r '@sh "REPO=\(.repo)"')"
COMMIT=$(git -C "$REPO" rev-parse HEAD)
BRANCH=$(git -C "$REPO" rev-parse --abbrev-ref HEAD)
jq -n --arg c "$COMMIT" --arg b "$BRANCH" '{commit:$c, branch:$b}'

What matters:

  • The script must return a JSON object where every value is a string. No numbers, no nested objects. That is an API limit.
  • The script must be idempotent. Terraform may call it on every plan. If the script changes the state of the world, that is bad.
  • The script runs with the permissions of whoever ran terraform.

Use it when:

  • You need to read a value from a vault or CI secret manager that has no Terraform provider.
  • You need to compute a value through some involved logic (curl + jq + sed).
  • Build-time integration: "pull the docker tag from CI".

Anti-case: do not use external for side effects. It is a data source, not a resource.

http

A GET request to a URL, with the response landing in a data source.

hcl
terraform {
  required_providers {
    http = {
      source  = "hashicorp/http"
      version = "~> 3.4"
    }
  }
}
data "http" "my_ip" {
  url = "https://ifconfig.io"
}
resource "aws_security_group_rule" "allow_my_ip" {
  type              = "ingress"
  from_port         = 22
  to_port           = 22
  protocol          = "tcp"
  cidr_blocks       = ["${chomp(data.http.my_ip.response_body)}/32"]
  security_group_id = aws_security_group.bastion.id
}

Arguments:

  • url is the endpoint.
  • request_headers is a map, for auth or User-Agent.
  • method defaults to GET; you can use POST/PUT/DELETE.
  • request_body is the body for POST.

Attributes:

  • response_body is the response text.
  • status_code is the HTTP code.
  • response_headers is a map.

Validating status_code:

hcl
data "http" "config" {
  url = "https://config.example.com/feature-flags"
  lifecycle {
    postcondition {
      condition     = self.status_code == 200
      error_message = "Config server returned ${self.status_code}, expected 200"
    }
  }
}

If it fails, apply stops with a clear message. See the lesson on precondition/postcondition.

When to use what

TaskProvider
Pack lambda code into a ziparchive_file
Read a value from an arbitrary source (vault, CLI)external
Get JSON from a REST APIhttp (or external + curl)
Read something from git or the AWS CLIexternal
Find the current public IP dynamicallyhttp
Generate a file from a variablearchive_file with source.content

Pitfalls

  • All three are data sources. They are evaluated on every plan. If external calls a slow script or http hits a slow endpoint, you pay that cost on every plan. Cache it on the side of the script or API itself.

  • An external script trips on anything that is not an int/string. Return 42 as a number and Terraform fails. Make it a string: {"count":"42"}. Parse the number back with tonumber(data.external.x.result.count).

  • http does not follow redirects by default. You turn it on with follow_redirects = true. To catch the first 301, leave the default.

  • archive_file uses an mtime, not a content hash, for idempotency. More precisely, it uses a hash of the contents inside. That is correct, and terraform plan stays stable. But if files that change on every build land in the zip (timestamps in byte-code), every plan shows a diff. Clean such files out with excludes.

  • http has no auth; it is for public endpoints. If the endpoint needs a bearer token, pass it through request_headers = { Authorization = "Bearer ${var.token}" }. The token ends up in state, so mark it as sensitive.

  • external is a security risk. The script runs with the permissions of the terraform user. If the HCL comes from an untrusted source, this is remote code execution. Do not use external in reusable modules that pull other people's code.

  • http.response_body is always a string. A JSON response has to be parsed with jsondecode(data.http.x.response_body). Do not try to write data.http.x.response_body.field: that is string indexing, not object access.

  • LocalStack is not needed for these providers. None of the three go to AWS. Tests with them are pure HCL.

§ команды

bash
terraform plan

archive_file rebuilds the zip if the files changed. external/http run again. You see the reads at the start of the plan.

bash
terraform refresh

Re-read all data sources. Useful when the external source has changed and you are not ready to apply yet.

bash
terraform console

You can pull data.archive_file.x.output_base64sha256 and compare it with the previous value. Good for debugging 'why is the function recreated'.

§ см. также

  • tf-data-sourcedata block: reading what already exists in the clouddata is a block that queries existing infrastructure and returns its attributes to HCL. Terraform creates nothing; it only reads. Use it to reference resources that were not created by Terraform or that live in a different project.
  • tf-utility-providersUtility providers: random, time, null, terraform_dataProviders that do not manage a cloud, they help HCL itself. `random` generates IDs and passwords. `time` handles delays and timestamp marks. `null` is the deprecated "non-resource" for triggers. `terraform_data` is the modern replacement for `null_resource`, built into Terraform. Each one removes a specific limitation of the declarative approach.
  • tf-cloudinit-providercloudinit provider: user_data for EC2 and moreThe `cloudinit` provider builds a multi-part MIME blob for EC2 `user_data`. `data "cloudinit_config"` takes several `part` blocks (cloud-config YAML, shell-script, jinja, and so on) and packs them into one blob. It replaces hand base64-encoding of a single string and lets you assemble the config from pieces.
  • tf-resource-blockResource block: the main building block of TerraformA resource block tells Terraform "create this thing in the cloud." It has three parts: the resource type (what it is), the name (how you refer to it internally), and the arguments (how to configure it). Writing these blocks is what you spend 90% of your time doing in Terraform.
Footer
linuxlab-
Copyright © 2026 LinuxLab. All rights reserved.
Tutorials
Pricing
About
Privacy & cookies