archive, external, http: outside data in HCL

Why

HCL is declarative. Sometimes you need to build something: pack a directory into a zip, read JSON from a CI secret store, ask ifconfig.io for your current IP. These three data providers close that gap while you stay in the "everything is described in HCL" paradigm.

archive_file

Packs local files into a zip. The main case is aws_lambda_function.filename.

hcl

terraform {

  required_providers {

    archive = {

      source  = "hashicorp/archive"

      version = "~> 2.6"

data "archive_file" "lambda" {

  type        = "zip"

  source_file = "${path.module}/lambda/handler.py"

  output_path = "${path.module}/lambda.zip"

resource "aws_lambda_function" "demo" {

  function_name    = "demo"

  filename         = data.archive_file.lambda.output_path

  source_code_hash = data.archive_file.lambda.output_base64sha256

  handler          = "handler.main"

  runtime          = "python3.12"

  role             = aws_iam_role.lambda.arn

The key point:

output_base64sha256 is the hash of the zip contents. You pass it to Lambda as source_code_hash. When the code changes, the hash changes, Lambda notices, and it redeploys. Without this, Terraform never sees the change in your sources and never updates the function.

A whole directory

hcl

data "archive_file" "lambda" {

  type        = "zip"

  source_dir  = "${path.module}/lambda/"

  output_path = "${path.module}/lambda.zip"

  excludes = [

    "__pycache__",

    "*.pyc",

    "tests/**",

Handy for functions with requirements/node_modules.

Inline source

hcl

data "archive_file" "config" {

  type        = "zip"

  output_path = "${path.module}/config.zip"

  source {

    content  = jsonencode({ feature_flags = { ... } })

    filename = "config.json"

The file is generated from an HCL value. Useful for config payloads that are computed dynamically.

external

Runs an arbitrary script. The script reads JSON from stdin and writes JSON to stdout.

hcl

terraform {

  required_providers {

    external = {

      source  = "hashicorp/external"

      version = "~> 2.3"

data "external" "git_info" {

  program = ["bash", "${path.module}/scripts/git-info.sh"]

  query = {

    repo = path.cwd

output "git_commit" {

  value = data.external.git_info.result.commit

scripts/git-info.sh:

bash

#!/usr/bin/env bash

set -euo pipefail

eval "$(jq -r '@sh "REPO=\(.repo)"')"

COMMIT=$(git -C "$REPO" rev-parse HEAD)

BRANCH=$(git -C "$REPO" rev-parse --abbrev-ref HEAD)

jq -n --arg c "$COMMIT" --arg b "$BRANCH" '{commit:$c, branch:$b}'

What matters:

The script must return a JSON object where every value is a string. No numbers, no nested objects. That is an API limit.
The script must be idempotent. Terraform may call it on every plan. If the script changes the state of the world, that is bad.
The script runs with the permissions of whoever ran terraform.

Use it when:

You need to read a value from a vault or CI secret manager that has no Terraform provider.
You need to compute a value through some involved logic (curl + jq + sed).
Build-time integration: "pull the docker tag from CI".

Anti-case: do not use external for side effects. It is a data source, not a resource.

http

A GET request to a URL, with the response landing in a data source.

hcl

terraform {

  required_providers {

    http = {

      source  = "hashicorp/http"

      version = "~> 3.4"

data "http" "my_ip" {

  url = "https://ifconfig.io"

resource "aws_security_group_rule" "allow_my_ip" {

  type              = "ingress"

  from_port         = 22

  to_port           = 22

  protocol          = "tcp"

  cidr_blocks       = ["${chomp(data.http.my_ip.response_body)}/32"]

  security_group_id = aws_security_group.bastion.id

Arguments:

url is the endpoint.
request_headers is a map, for auth or User-Agent.
method defaults to GET; you can use POST/PUT/DELETE.
request_body is the body for POST.

Attributes:

response_body is the response text.
status_code is the HTTP code.
response_headers is a map.

Validating status_code:

hcl

data "http" "config" {

  url = "https://config.example.com/feature-flags"

  lifecycle {

    postcondition {

      condition     = self.status_code == 200

      error_message = "Config server returned ${self.status_code}, expected 200"

If it fails, apply stops with a clear message. See the lesson on precondition/postcondition.

When to use what

Task	Provider
Pack lambda code into a zip	`archive_file`
Read a value from an arbitrary source (vault, CLI)	`external`
Get JSON from a REST API	`http` (or `external` + curl)
Read something from git or the AWS CLI	`external`
Find the current public IP dynamically	`http`
Generate a file from a variable	`archive_file` with `source.content`

Pitfalls

All three are data sources. They are evaluated on every plan. If external calls a slow script or http hits a slow endpoint, you pay that cost on every plan. Cache it on the side of the script or API itself.
An external script trips on anything that is not an int/string. Return 42 as a number and Terraform fails. Make it a string: {"count":"42"}. Parse the number back with tonumber(data.external.x.result.count).
http does not follow redirects by default. You turn it on with follow_redirects = true. To catch the first 301, leave the default.
archive_file uses an mtime, not a content hash, for idempotency. More precisely, it uses a hash of the contents inside. That is correct, and terraform plan stays stable. But if files that change on every build land in the zip (timestamps in byte-code), every plan shows a diff. Clean such files out with excludes.
http has no auth; it is for public endpoints. If the endpoint needs a bearer token, pass it through request_headers = { Authorization = "Bearer ${var.token}" }. The token ends up in state, so mark it as sensitive.
external is a security risk. The script runs with the permissions of the terraform user. If the HCL comes from an untrusted source, this is remote code execution. Do not use external in reusable modules that pull other people's code.
http.response_body is always a string. A JSON response has to be parsed with jsondecode(data.http.x.response_body). Do not try to write data.http.x.response_body.field: that is string indexing, not object access.
LocalStack is not needed for these providers. None of the three go to AWS. Tests with them are pure HCL.

Why

archive_file

Packs local files into a zip. The main case is aws_lambda_function.filename.

hcl

terraform {

  required_providers {

    archive = {

      source  = "hashicorp/archive"

      version = "~> 2.6"

data "archive_file" "lambda" {

  type        = "zip"

  source_file = "${path.module}/lambda/handler.py"

  output_path = "${path.module}/lambda.zip"

resource "aws_lambda_function" "demo" {

  function_name    = "demo"

  filename         = data.archive_file.lambda.output_path

  source_code_hash = data.archive_file.lambda.output_base64sha256

  handler          = "handler.main"

  runtime          = "python3.12"

  role             = aws_iam_role.lambda.arn

The key point:

output_base64sha256 is the hash of the zip contents. You pass it to Lambda as source_code_hash. When the code changes, the hash changes, Lambda notices, and it redeploys. Without this, Terraform never sees the change in your sources and never updates the function.

A whole directory

hcl

data "archive_file" "lambda" {

  type        = "zip"

  source_dir  = "${path.module}/lambda/"

  output_path = "${path.module}/lambda.zip"

  excludes = [

    "__pycache__",

    "*.pyc",

    "tests/**",

Handy for functions with requirements/node_modules.

Inline source

hcl

data "archive_file" "config" {

  type        = "zip"

  output_path = "${path.module}/config.zip"

  source {

    content  = jsonencode({ feature_flags = { ... } })

    filename = "config.json"

The file is generated from an HCL value. Useful for config payloads that are computed dynamically.

external

Runs an arbitrary script. The script reads JSON from stdin and writes JSON to stdout.

hcl

terraform {

  required_providers {

    external = {

      source  = "hashicorp/external"

      version = "~> 2.3"

data "external" "git_info" {

  program = ["bash", "${path.module}/scripts/git-info.sh"]

  query = {

    repo = path.cwd

output "git_commit" {

  value = data.external.git_info.result.commit

scripts/git-info.sh:

bash

#!/usr/bin/env bash

set -euo pipefail

eval "$(jq -r '@sh "REPO=\(.repo)"')"

COMMIT=$(git -C "$REPO" rev-parse HEAD)

BRANCH=$(git -C "$REPO" rev-parse --abbrev-ref HEAD)

jq -n --arg c "$COMMIT" --arg b "$BRANCH" '{commit:$c, branch:$b}'

What matters:

The script must return a JSON object where every value is a string. No numbers, no nested objects. That is an API limit.
The script must be idempotent. Terraform may call it on every plan. If the script changes the state of the world, that is bad.
The script runs with the permissions of whoever ran terraform.

Use it when:

You need to read a value from a vault or CI secret manager that has no Terraform provider.
You need to compute a value through some involved logic (curl + jq + sed).
Build-time integration: "pull the docker tag from CI".

Anti-case: do not use external for side effects. It is a data source, not a resource.

http

A GET request to a URL, with the response landing in a data source.

hcl

terraform {

  required_providers {

    http = {

      source  = "hashicorp/http"

      version = "~> 3.4"

data "http" "my_ip" {

  url = "https://ifconfig.io"

resource "aws_security_group_rule" "allow_my_ip" {

  type              = "ingress"

  from_port         = 22

  to_port           = 22

  protocol          = "tcp"

  cidr_blocks       = ["${chomp(data.http.my_ip.response_body)}/32"]

  security_group_id = aws_security_group.bastion.id

Arguments:

url is the endpoint.
request_headers is a map, for auth or User-Agent.
method defaults to GET; you can use POST/PUT/DELETE.
request_body is the body for POST.

Attributes:

response_body is the response text.
status_code is the HTTP code.
response_headers is a map.

Validating status_code:

hcl

data "http" "config" {

  url = "https://config.example.com/feature-flags"

  lifecycle {

    postcondition {

      condition     = self.status_code == 200

      error_message = "Config server returned ${self.status_code}, expected 200"

If it fails, apply stops with a clear message. See the lesson on precondition/postcondition.

When to use what

Task	Provider
Pack lambda code into a zip	`archive_file`
Read a value from an arbitrary source (vault, CLI)	`external`
Get JSON from a REST API	`http` (or `external` + curl)
Read something from git or the AWS CLI	`external`
Find the current public IP dynamically	`http`
Generate a file from a variable	`archive_file` with `source.content`

Pitfalls

All three are data sources. They are evaluated on every plan. If external calls a slow script or http hits a slow endpoint, you pay that cost on every plan. Cache it on the side of the script or API itself.
An external script trips on anything that is not an int/string. Return 42 as a number and Terraform fails. Make it a string: {"count":"42"}. Parse the number back with tonumber(data.external.x.result.count).
http does not follow redirects by default. You turn it on with follow_redirects = true. To catch the first 301, leave the default.
archive_file uses an mtime, not a content hash, for idempotency. More precisely, it uses a hash of the contents inside. That is correct, and terraform plan stays stable. But if files that change on every build land in the zip (timestamps in byte-code), every plan shows a diff. Clean such files out with excludes.
http has no auth; it is for public endpoints. If the endpoint needs a bearer token, pass it through request_headers = { Authorization = "Bearer ${var.token}" }. The token ends up in state, so mark it as sensitive.
external is a security risk. The script runs with the permissions of the terraform user. If the HCL comes from an untrusted source, this is remote code execution. Do not use external in reusable modules that pull other people's code.
http.response_body is always a string. A JSON response has to be parsed with jsondecode(data.http.x.response_body). Do not try to write data.http.x.response_body.field: that is string indexing, not object access.
LocalStack is not needed for these providers. None of the three go to AWS. Tests with them are pure HCL.

archive, external, http: outside data in HCL

Why

archive_file

A whole directory

Inline source

external

http

When to use what

Pitfalls

§ команды

§ см. также

archive, external, http: outside data in HCL

Why

archive_file

A whole directory

Inline source

external

http

When to use what

Pitfalls

§ команды

§ см. также