data block: reading what already exists in the cloud

Why data exists

resource creates something new. data reads something that already exists. The difference is fundamental:

resource "aws_s3_bucket" "demo" { ... }: Terraform owns this bucket. It creates, modifies, and deletes it.
data "aws_caller_identity" "current" {}: Terraform asks AWS "who am I right now?" and stores the answer in a variable. No state management involved.

Data blocks are useful when:

Your code needs the current AWS account ID and there is no other way to get it.
Another team created the VPC and you need its ID for a new resource.
You want to pick the latest Ubuntu AMI without hardcoding it.

Syntax

It looks like a resource block but uses the data keyword:

hcl

data "aws_caller_identity" "current" {}

data "aws_region" "current" {}

output "my_account_id" {

  value = data.aws_caller_identity.current.account_id

output "my_region" {

  value = data.aws_region.current.name

The address format is data.type.name.attribute. The data. prefix is required; without it Terraform treats the reference as a resource.

Common AWS data sources

These appear most often in real projects:

hcl

# Who am I right now (account ID, ARN, user ID)

data "aws_caller_identity" "current" {}

# Which region we are working in

data "aws_region" "current" {}

# List of AZs in this region

data "aws_availability_zones" "available" {

  state = "available"

# Find a specific AMI

data "aws_ami" "ubuntu" {

  most_recent = true

  owners      = ["099720109477"]   # Canonical

  filter {

    name   = "name"

    values = ["ubuntu/images/hvm-ssd/ubuntu-jammy-22.04-amd64-server-*"]

# Use the AMI

resource "aws_instance" "web" {

  ami           = data.aws_ami.ubuntu.id

  instance_type = "t3.micro"

When to use data vs resource

The rule: if you manage the object (create and destroy it), use resource. If it comes from outside, use data.

Signs that you need data:

The object already exists and you did not create it.
Another team or a separate Terraform project manages it.
It is auxiliary information (account ID, region, latest AMI).

data in LocalStack

Not all AWS data sources behave the same way in LocalStack:

Work well: aws_caller_identity, aws_region, aws_availability_zones (return hardcoded values for tests).
Work partially: aws_ami (returns an empty list; LocalStack has no AMIs).
Do not work: anything tied to real AWS services that the LocalStack community edition does not emulate.

In tutorial exercises, use only the data sources that work reliably.

Pitfalls

A data block does not lock the object from changes. If you read data "aws_s3_bucket" "shared" and someone deletes that bucket externally, the next plan will error, but the values from the last apply may still be in state. This is a desync.
data runs on every plan and apply. Each run makes an API call. If you have 50 data blocks with slow queries, plan will be slow.
(known after apply) applies to data too. If a data block depends on a resource that has not been created yet, its attributes will be "known after apply". This creates a chain: create resource, fetch data, use the attribute, create the next resource.
Do not use data to check whether something exists. If the data block finds no object, it fails with an error. It is not a way to write if exists then ... else .... For conditional logic, use the try() and can() functions.
data has no lifecycle. The lifecycle { ... } block does not work inside data; there is nothing to manage.

Why data exists

resource creates something new. data reads something that already exists. The difference is fundamental:

resource "aws_s3_bucket" "demo" { ... }: Terraform owns this bucket. It creates, modifies, and deletes it.
data "aws_caller_identity" "current" {}: Terraform asks AWS "who am I right now?" and stores the answer in a variable. No state management involved.

Data blocks are useful when:

Your code needs the current AWS account ID and there is no other way to get it.
Another team created the VPC and you need its ID for a new resource.
You want to pick the latest Ubuntu AMI without hardcoding it.

Syntax

It looks like a resource block but uses the data keyword:

hcl

data "aws_caller_identity" "current" {}

data "aws_region" "current" {}

output "my_account_id" {

  value = data.aws_caller_identity.current.account_id

output "my_region" {

  value = data.aws_region.current.name

The address format is data.type.name.attribute. The data. prefix is required; without it Terraform treats the reference as a resource.

Common AWS data sources

These appear most often in real projects:

hcl

# Who am I right now (account ID, ARN, user ID)

data "aws_caller_identity" "current" {}

# Which region we are working in

data "aws_region" "current" {}

# List of AZs in this region

data "aws_availability_zones" "available" {

  state = "available"

# Find a specific AMI

data "aws_ami" "ubuntu" {

  most_recent = true

  owners      = ["099720109477"]   # Canonical

  filter {

    name   = "name"

    values = ["ubuntu/images/hvm-ssd/ubuntu-jammy-22.04-amd64-server-*"]

# Use the AMI

resource "aws_instance" "web" {

  ami           = data.aws_ami.ubuntu.id

  instance_type = "t3.micro"

When to use data vs resource

The rule: if you manage the object (create and destroy it), use resource. If it comes from outside, use data.

Signs that you need data:

The object already exists and you did not create it.
Another team or a separate Terraform project manages it.
It is auxiliary information (account ID, region, latest AMI).

data in LocalStack

Not all AWS data sources behave the same way in LocalStack:

Work well: aws_caller_identity, aws_region, aws_availability_zones (return hardcoded values for tests).
Work partially: aws_ami (returns an empty list; LocalStack has no AMIs).
Do not work: anything tied to real AWS services that the LocalStack community edition does not emulate.

In tutorial exercises, use only the data sources that work reliably.

Pitfalls

A data block does not lock the object from changes. If you read data "aws_s3_bucket" "shared" and someone deletes that bucket externally, the next plan will error, but the values from the last apply may still be in state. This is a desync.
data runs on every plan and apply. Each run makes an API call. If you have 50 data blocks with slow queries, plan will be slow.
(known after apply) applies to data too. If a data block depends on a resource that has not been created yet, its attributes will be "known after apply". This creates a chain: create resource, fetch data, use the attribute, create the next resource.
Do not use data to check whether something exists. If the data block finds no object, it fails with an error. It is not a way to write if exists then ... else .... For conditional logic, use the try() and can() functions.
data has no lifecycle. The lifecycle { ... } block does not work inside data; there is nothing to manage.

data block: reading what already exists in the cloud

Why data exists

Syntax

Common AWS data sources

When to use data vs resource

data in LocalStack

Pitfalls

§ команды

§ см. также

data block: reading what already exists in the cloud

Why data exists

Syntax

Common AWS data sources

When to use data vs resource

data in LocalStack

Pitfalls

§ команды

§ см. также