Why data exists
resource creates something new. data reads something that already exists. The difference is fundamental:
resource "aws_s3_bucket" "demo" { ... }: Terraform owns this bucket. It creates, modifies, and deletes it.data "aws_caller_identity" "current" {}: Terraform asks AWS "who am I right now?" and stores the answer in a variable. No state management involved.
Data blocks are useful when:
- Your code needs the current AWS account ID and there is no other way to get it.
- Another team created the VPC and you need its ID for a new resource.
- You want to pick the latest Ubuntu AMI without hardcoding it.
Syntax
It looks like a resource block but uses the data keyword:
data "aws_caller_identity" "current" {}data "aws_region" "current" {}output "my_account_id" {value = data.aws_caller_identity.current.account_id
}
output "my_region" {value = data.aws_region.current.name
}
The address format is data.type.name.attribute. The data. prefix is required; without it Terraform treats the reference as a resource.
Common AWS data sources
These appear most often in real projects:
# Who am I right now (account ID, ARN, user ID)
data "aws_caller_identity" "current" {}# Which region we are working in
data "aws_region" "current" {}# List of AZs in this region
data "aws_availability_zones" "available" {state = "available"
}
# Find a specific AMI
data "aws_ami" "ubuntu" {most_recent = true
owners = ["099720109477"] # Canonical
filter {name = "name"
values = ["ubuntu/images/hvm-ssd/ubuntu-jammy-22.04-amd64-server-*"]
}
}
# Use the AMI
resource "aws_instance" "web" {ami = data.aws_ami.ubuntu.id
instance_type = "t3.micro"
}
When to use data vs resource
The rule: if you manage the object (create and destroy it), use resource. If it comes from outside, use data.
Signs that you need data:
- The object already exists and you did not create it.
- Another team or a separate Terraform project manages it.
- It is auxiliary information (account ID, region, latest AMI).
data in LocalStack
Not all AWS data sources behave the same way in LocalStack:
- Work well:
aws_caller_identity,aws_region,aws_availability_zones(return hardcoded values for tests). - Work partially:
aws_ami(returns an empty list; LocalStack has no AMIs). - Do not work: anything tied to real AWS services that the LocalStack community edition does not emulate.
In tutorial exercises, use only the data sources that work reliably.
Pitfalls
-
A data block does not lock the object from changes. If you read
data "aws_s3_bucket" "shared"and someone deletes that bucket externally, the nextplanwill error, but the values from the lastapplymay still be in state. This is a desync. -
data runs on every plan and apply. Each run makes an API call. If you have 50 data blocks with slow queries, plan will be slow.
-
(known after apply)applies to data too. If a data block depends on a resource that has not been created yet, its attributes will be "known after apply". This creates a chain: create resource, fetch data, use the attribute, create the next resource. -
Do not use data to check whether something exists. If the data block finds no object, it fails with an error. It is not a way to write
if exists then ... else .... For conditional logic, use thetry()andcan()functions. -
data has no lifecycle. The
lifecycle { ... }block does not work inside data; there is nothing to manage.