lesson ── terraform-advanced ── ~16 мин ── 6 шагов

Large-scale state, breaking up the monolith

One big state with 1000 resources means lock contention, slow refresh, and the risk of a catastrophic failure. The fix is to split it into a hierarchy: network, apps, and so on. They talk to each other through terraform_remote_state. In this lesson you build the monolith, split it into network and apps, and watch them communicate.

интерактивный sandbox

Поднимется пара контейнеров: terraform 1.9 и localstack 3.8 в одной сети. В браузере откроется терминал, можно сразу terraform init. Каждый шаг проверяется автоматически. TTL 45 минут, без регистрации.

запустить sandbox →

stack ── terraform · localstack · 1 GB RAM · самоуничтожается через 45 мин простоя

Шаги

The baseline monolith

bash

cd /home/student/scale/monolith

cat > main.tf <<'EOF'

resource "aws_vpc" "main" {

  cidr_block = "10.0.0.0/16"

  tags = { Name = "scale-vpc" }

resource "aws_subnet" "private" {

  count             = 2

  vpc_id            = aws_vpc.main.id

  cidr_block        = cidrsubnet(aws_vpc.main.cidr_block, 8, count.index)

  availability_zone = "us-east-1${["a", "b"][count.index]}"

  tags = { Name = "private-${count.index}" }

resource "aws_s3_bucket" "app_logs" {

  bucket = "scale-app-logs"

resource "aws_s3_bucket" "app_data" {

  bucket = "scale-app-data"

EOF

terraform init -no-color > /dev/null

terraform apply -auto-approve -no-color > /dev/null

terraform state list

You see the VPC, 2 subnets, 2 buckets, all in one state.

✓ The monolith is created. Now you split it.

Move the VPC and subnets into a network state

The strategy is to create a new state and move the resources into it:

bash

cd /home/student/scale/network

cat > main.tf <<'EOF'

resource "aws_vpc" "main" {

  cidr_block = "10.0.0.0/16"

  tags = { Name = "scale-vpc" }

resource "aws_subnet" "private" {

  count             = 2

  vpc_id            = aws_vpc.main.id

  cidr_block        = cidrsubnet(aws_vpc.main.cidr_block, 8, count.index)

  availability_zone = "us-east-1${["a", "b"][count.index]}"

  tags = { Name = "private-${count.index}" }

output "vpc_id" {

  value = aws_vpc.main.id

output "private_subnet_ids" {

  value = aws_subnet.private[*].id

EOF

terraform init -no-color > /dev/null

Now move the resources: state pull from the monolith, state push into here:

bash

# pull only the network resources out of the monolith

cd ../monolith

terraform state pull > /tmp/full.tfstate

terraform state mv -state-out=/tmp/network.tfstate aws_vpc.main aws_vpc.main

terraform state mv -state-out=/tmp/network.tfstate 'aws_subnet.private[0]' 'aws_subnet.private[0]'

terraform state mv -state-out=/tmp/network.tfstate 'aws_subnet.private[1]' 'aws_subnet.private[1]'

terraform state list

# import into the network state

cd ../network

terraform state push -force /tmp/network.tfstate

terraform state list

The network state now holds the VPC and 2 subnets. The monolith has only the buckets.

✓ Network is extracted. There are two states now.

An apps state with terraform_remote_state

bash

cd /home/student/scale/apps

cat > main.tf <<'EOF'

data "terraform_remote_state" "network" {

  backend = "local"

  config = {

    path = "../network/terraform.tfstate"

resource "aws_s3_bucket" "app_logs" {

  bucket = "scale-app-logs"

  tags = {

    VPC = data.terraform_remote_state.network.outputs.vpc_id

resource "aws_s3_bucket" "app_data" {

  bucket = "scale-app-data"

  tags = {

    SubnetCount = length(data.terraform_remote_state.network.outputs.private_subnet_ids)

EOF

terraform init -no-color > /dev/null

Move the buckets out of the monolith:

bash

cd ../monolith

terraform state mv -state-out=/tmp/apps.tfstate aws_s3_bucket.app_logs aws_s3_bucket.app_logs

terraform state mv -state-out=/tmp/apps.tfstate aws_s3_bucket.app_data aws_s3_bucket.app_data

terraform state list

cd ../apps

terraform state push -force /tmp/apps.tfstate

terraform state list

The apps state has 2 buckets. The monolith is empty.

✓ Apps reads network through remote_state. The monolith is taken apart.

04
Check the cross-state reference
bash
cd /home/student/scale/apps
terraform plan -no-color 2>&1 | tail -10
The apps plan shows that the buckets want to add a tag with VPC = <vpc-id>, taken from the network state.

Apply it:
bash
terraform apply -auto-approve -no-color > /dev/null
terraform state show aws_s3_bucket.app_logs | grep -E "VPC|tags"
The tag is there, the vpc-id is real, apps actually read the output from the network state.
✓ The cross-state reference works. The network outputs are available in apps.
05
Apps and network now change independently
Scenario: you add one more bucket to apps. Network is left alone.
bash
cd /home/student/scale/apps
cat >> main.tf <<'EOF'
resource "aws_s3_bucket" "metrics" {
bucket = "scale-metrics"
}
EOF
terraform plan -no-color 2>&1 | tail -10
terraform apply -auto-approve -no-color > /dev/null
# the network state was left untouched
cd ../network
terraform plan -no-color 2>&1 | tail -5
The apps plan shows +1 resource. The network plan says No changes. The isolation works.

On a large project this means a PR to apps does not block network PRs, the locks on the two states are separate, and the blast radius is contained.
✓ Isolation proven. Apps changes without touching network.
The same thing on OpenTofu
OpenTofu keeps its CLI and state compatible with Terraform for the commands in this step: migration usually goes through mv .terraform .terraform.bak; tofu init -upgrade. On your first switch, though, back up the state and run it on a feature branch first, the differences cluster in the newer features (variables in the backend, state encryption, OCI registry-backed modules). See tf-opentofu-parity for the full matrix.
- → OpenTofu parity
06
"Blast radius", break apps, network survives
You simulate the destruction of the apps state (do not do this in prod without a backup):
bash
cd /home/student/scale/apps
cp terraform.tfstate /tmp/apps-backup.tfstate
echo '{"corrupt": true}' > terraform.tfstate
set +e
terraform plan -no-color 2>&1 | tail -5
code=$?
set -e
echo "apps plan exit: $code"
# restore it
cp /tmp/apps-backup.tfstate terraform.tfstate
The apps state is corrupt, the plan fails. But the network state is intact:
bash
cd ../network
terraform plan -no-color 2>&1 | tail -5
echo "network plan ok"
The VPC and subnets keep working. This is what an isolated blast radius means: one stack can burn down while the rest lives on.
✓ Network is protected from breakage in apps. That is blast-radius isolation.
When not to split
Splitting is not always a win. The overhead:
1. Orchestration. You have to apply in the right order. You need Terragrunt or a shell script.
2. Cross-state coupling. Deleting an output breaks the readers.
3. Harder for newcomers. "Where does the VPC id live?", the answer is longer now.
4. Replicated provider config. Every stack needs a provider. Across 10 stacks, that is 10 provider blocks.
Do not split if:
- You have fewer than 200 resources in the state.
- One env, one team.
- Plan under 30s, a reasonable apply.
Split on purpose, once the symptoms (slow plan, lock contention, fear of apply) show up. Not "for the future".

See tf-large-scale-state.
- → Large-scale state
- → state mv in detail

Что ты узнал

terraform_remote_state is a data source that reads outputs from another state file. Only outputs are visible cross-state; internal resources and locals are not. Each stack has its own backend key and its own lock.

команды

terraform_remote_state.network.outputs.Xa reference to an output from another state.
terraform state listwhat is in the current state. One state is not everything.
terraform state pull > backup.jsona backup before a split operation.

концепции

· One state per layer = blast-radius isolation
· Cross-state goes through outputs, version them, do not rename casually
· One DynamoDB lock table per org, a partition key per state

Шаги

The baseline monolith

bash

cd /home/student/scale/monolith

cat > main.tf <<'EOF'

resource "aws_vpc" "main" {

  cidr_block = "10.0.0.0/16"

  tags = { Name = "scale-vpc" }

resource "aws_subnet" "private" {

  count             = 2

  vpc_id            = aws_vpc.main.id

  cidr_block        = cidrsubnet(aws_vpc.main.cidr_block, 8, count.index)

  availability_zone = "us-east-1${["a", "b"][count.index]}"

  tags = { Name = "private-${count.index}" }

resource "aws_s3_bucket" "app_logs" {

  bucket = "scale-app-logs"

resource "aws_s3_bucket" "app_data" {

  bucket = "scale-app-data"

EOF

terraform init -no-color > /dev/null

terraform apply -auto-approve -no-color > /dev/null

terraform state list

You see the VPC, 2 subnets, 2 buckets, all in one state.

✓ The monolith is created. Now you split it.

Move the VPC and subnets into a network state

The strategy is to create a new state and move the resources into it:

bash

cd /home/student/scale/network

cat > main.tf <<'EOF'

resource "aws_vpc" "main" {

  cidr_block = "10.0.0.0/16"

  tags = { Name = "scale-vpc" }

resource "aws_subnet" "private" {

  count             = 2

  vpc_id            = aws_vpc.main.id

  cidr_block        = cidrsubnet(aws_vpc.main.cidr_block, 8, count.index)

  availability_zone = "us-east-1${["a", "b"][count.index]}"

  tags = { Name = "private-${count.index}" }

output "vpc_id" {

  value = aws_vpc.main.id

output "private_subnet_ids" {

  value = aws_subnet.private[*].id

EOF

terraform init -no-color > /dev/null

Now move the resources: state pull from the monolith, state push into here:

bash

# pull only the network resources out of the monolith

cd ../monolith

terraform state pull > /tmp/full.tfstate

terraform state mv -state-out=/tmp/network.tfstate aws_vpc.main aws_vpc.main

terraform state mv -state-out=/tmp/network.tfstate 'aws_subnet.private[0]' 'aws_subnet.private[0]'

terraform state mv -state-out=/tmp/network.tfstate 'aws_subnet.private[1]' 'aws_subnet.private[1]'

terraform state list

# import into the network state

cd ../network

terraform state push -force /tmp/network.tfstate

terraform state list

The network state now holds the VPC and 2 subnets. The monolith has only the buckets.

✓ Network is extracted. There are two states now.

An apps state with terraform_remote_state

bash

cd /home/student/scale/apps

cat > main.tf <<'EOF'

data "terraform_remote_state" "network" {

  backend = "local"

  config = {

    path = "../network/terraform.tfstate"

resource "aws_s3_bucket" "app_logs" {

  bucket = "scale-app-logs"

  tags = {

    VPC = data.terraform_remote_state.network.outputs.vpc_id

resource "aws_s3_bucket" "app_data" {

  bucket = "scale-app-data"

  tags = {

    SubnetCount = length(data.terraform_remote_state.network.outputs.private_subnet_ids)

EOF

terraform init -no-color > /dev/null

Move the buckets out of the monolith:

bash

cd ../monolith

terraform state mv -state-out=/tmp/apps.tfstate aws_s3_bucket.app_logs aws_s3_bucket.app_logs

terraform state mv -state-out=/tmp/apps.tfstate aws_s3_bucket.app_data aws_s3_bucket.app_data

terraform state list

cd ../apps

terraform state push -force /tmp/apps.tfstate

terraform state list

The apps state has 2 buckets. The monolith is empty.

✓ Apps reads network through remote_state. The monolith is taken apart.

Check the cross-state reference

bash

cd /home/student/scale/apps

terraform plan -no-color 2>&1 | tail -10

The apps plan shows that the buckets want to add a tag with VPC = <vpc-id>, taken from the network state.

Apply it:

bash

terraform apply -auto-approve -no-color > /dev/null

terraform state show aws_s3_bucket.app_logs | grep -E "VPC|tags"

The tag is there, the vpc-id is real, apps actually read the output from the network state.

✓ The cross-state reference works. The network outputs are available in apps.

Apps and network now change independently

Scenario: you add one more bucket to apps. Network is left alone.

bash

cd /home/student/scale/apps

cat >> main.tf <<'EOF'

resource "aws_s3_bucket" "metrics" {

  bucket = "scale-metrics"

EOF

terraform plan -no-color 2>&1 | tail -10

terraform apply -auto-approve -no-color > /dev/null

# the network state was left untouched

cd ../network

terraform plan -no-color 2>&1 | tail -5

The apps plan shows +1 resource. The network plan says No changes. The isolation works.

On a large project this means a PR to apps does not block network PRs, the locks on the two states are separate, and the blast radius is contained.

✓ Isolation proven. Apps changes without touching network.

The same thing on OpenTofu

OpenTofu keeps its CLI and state compatible with Terraform for the commands in this step: migration usually goes through mv .terraform .terraform.bak; tofu init -upgrade. On your first switch, though, back up the state and run it on a feature branch first, the differences cluster in the newer features (variables in the backend, state encryption, OCI registry-backed modules). See tf-opentofu-parity for the full matrix.

→ OpenTofu parity

"Blast radius", break apps, network survives

You simulate the destruction of the apps state (do not do this in prod without a backup):

bash

cd /home/student/scale/apps

cp terraform.tfstate /tmp/apps-backup.tfstate

echo '{"corrupt": true}' > terraform.tfstate

set +e

terraform plan -no-color 2>&1 | tail -5

code=$?

set -e

echo "apps plan exit: $code"

# restore it

cp /tmp/apps-backup.tfstate terraform.tfstate

The apps state is corrupt, the plan fails. But the network state is intact:

bash

cd ../network

terraform plan -no-color 2>&1 | tail -5

echo "network plan ok"

The VPC and subnets keep working. This is what an isolated blast radius means: one stack can burn down while the rest lives on.

✓ Network is protected from breakage in apps. That is blast-radius isolation.

When not to split

Splitting is not always a win. The overhead:

Orchestration. You have to apply in the right order. You need Terragrunt or a shell script.
Cross-state coupling. Deleting an output breaks the readers.
Harder for newcomers. "Where does the VPC id live?", the answer is longer now.
Replicated provider config. Every stack needs a provider. Across 10 stacks, that is 10 provider blocks.

Do not split if:

You have fewer than 200 resources in the state.
One env, one team.
Plan under 30s, a reasonable apply.

Split on purpose, once the symptoms (slow plan, lock contention, fear of apply) show up. Not "for the future".

See tf-large-scale-state.

→ Large-scale state
→ state mv in detail

Что ты узнал

команды

terraform_remote_state.network.outputs.Xa reference to an output from another state.
terraform state listwhat is in the current state. One state is not everything.
terraform state pull > backup.jsona backup before a split operation.

концепции

· One state per layer = blast-radius isolation
· Cross-state goes through outputs, version them, do not rename casually
· One DynamoDB lock table per org, a partition key per state