Why
HCL is declarative. Sometimes you need to build something: pack a
directory into a zip, read JSON from a CI secret store, ask ifconfig.io
for your current IP. These three data providers close that gap while you
stay in the "everything is described in HCL" paradigm.
archive_file
Packs local files into a zip. The main case is aws_lambda_function.filename.
terraform { required_providers { archive = {source = "hashicorp/archive"
version = "~> 2.6"
}
}
}
data "archive_file" "lambda" {type = "zip"
source_file = "${path.module}/lambda/handler.py" output_path = "${path.module}/lambda.zip"}
resource "aws_lambda_function" "demo" {function_name = "demo"
filename = data.archive_file.lambda.output_path
source_code_hash = data.archive_file.lambda.output_base64sha256
handler = "handler.main"
runtime = "python3.12"
role = aws_iam_role.lambda.arn
}
The key point:
output_base64sha256is the hash of the zip contents. You pass it to Lambda assource_code_hash. When the code changes, the hash changes, Lambda notices, and it redeploys. Without this, Terraform never sees the change in your sources and never updates the function.
A whole directory
data "archive_file" "lambda" {type = "zip"
source_dir = "${path.module}/lambda/" output_path = "${path.module}/lambda.zip"excludes = [
"__pycache__",
"*.pyc",
"tests/**",
]
}
Handy for functions with requirements/node_modules.
Inline source
data "archive_file" "config" {type = "zip"
output_path = "${path.module}/config.zip" source { content = jsonencode({ feature_flags = { ... } })filename = "config.json"
}
}
The file is generated from an HCL value. Useful for config payloads that are computed dynamically.
external
Runs an arbitrary script. The script reads JSON from stdin and writes JSON to stdout.
terraform { required_providers { external = {source = "hashicorp/external"
version = "~> 2.3"
}
}
}
data "external" "git_info" { program = ["bash", "${path.module}/scripts/git-info.sh"] query = {repo = path.cwd
}
}
output "git_commit" {value = data.external.git_info.result.commit
}
scripts/git-info.sh:
#!/usr/bin/env bash
set -euo pipefail
eval "$(jq -r '@sh "REPO=\(.repo)"')"
COMMIT=$(git -C "$REPO" rev-parse HEAD)
BRANCH=$(git -C "$REPO" rev-parse --abbrev-ref HEAD)
jq -n --arg c "$COMMIT" --arg b "$BRANCH" '{commit:$c, branch:$b}'What matters:
- The script must return a JSON object where every value is a string. No numbers, no nested objects. That is an API limit.
- The script must be idempotent. Terraform may call it on every plan. If the script changes the state of the world, that is bad.
- The script runs with the permissions of whoever ran
terraform.
Use it when:
- You need to read a value from a vault or CI secret manager that has no Terraform provider.
- You need to compute a value through some involved logic (curl + jq + sed).
- Build-time integration: "pull the docker tag from CI".
Anti-case: do not use external for side effects. It is a data source, not
a resource.
http
A GET request to a URL, with the response landing in a data source.
terraform { required_providers { http = {source = "hashicorp/http"
version = "~> 3.4"
}
}
}
data "http" "my_ip" {url = "https://ifconfig.io"
}
resource "aws_security_group_rule" "allow_my_ip" {type = "ingress"
from_port = 22
to_port = 22
protocol = "tcp"
cidr_blocks = ["${chomp(data.http.my_ip.response_body)}/32"]security_group_id = aws_security_group.bastion.id
}
Arguments:
urlis the endpoint.request_headersis a map, for auth or User-Agent.methoddefaults to GET; you can use POST/PUT/DELETE.request_bodyis the body for POST.
Attributes:
response_bodyis the response text.status_codeis the HTTP code.response_headersis a map.
Validating status_code:
data "http" "config" {url = "https://config.example.com/feature-flags"
lifecycle { postcondition {condition = self.status_code == 200
error_message = "Config server returned ${self.status_code}, expected 200"}
}
}
If it fails, apply stops with a clear message. See the lesson on precondition/postcondition.
When to use what
| Task | Provider |
|---|---|
| Pack lambda code into a zip | archive_file |
| Read a value from an arbitrary source (vault, CLI) | external |
| Get JSON from a REST API | http (or external + curl) |
| Read something from git or the AWS CLI | external |
| Find the current public IP dynamically | http |
| Generate a file from a variable | archive_file with source.content |
Pitfalls
-
All three are
datasources. They are evaluated on every plan. Ifexternalcalls a slow script orhttphits a slow endpoint, you pay that cost on every plan. Cache it on the side of the script or API itself. -
An
externalscript trips on anything that is not an int/string. Return42as a number and Terraform fails. Make it a string:{"count":"42"}. Parse the number back withtonumber(data.external.x.result.count). -
httpdoes not follow redirects by default. You turn it on withfollow_redirects = true. To catch the first 301, leave the default. -
archive_fileuses an mtime, not a content hash, for idempotency. More precisely, it uses a hash of the contents inside. That is correct, andterraform planstays stable. But if files that change on every build land in the zip (timestamps in byte-code), every plan shows a diff. Clean such files out withexcludes. -
httphas no auth; it is for public endpoints. If the endpoint needs a bearer token, pass it throughrequest_headers = { Authorization = "Bearer ${var.token}" }. The token ends up in state, so mark it as sensitive. -
externalis a security risk. The script runs with the permissions of the terraform user. If the HCL comes from an untrusted source, this is remote code execution. Do not useexternalin reusable modules that pull other people's code. -
http.response_bodyis always a string. A JSON response has to be parsed withjsondecode(data.http.x.response_body). Do not try to writedata.http.x.response_body.field: that is string indexing, not object access. -
LocalStack is not needed for these providers. None of the three go to AWS. Tests with them are pure HCL.