
\ Large Terraform states are easy to create and hard to live with. \ A lot of teams start with a small stack: a virtual network, a few subnets, a storage account, maybe a compute service. The platform grows, more services are added, more teams depend on the same foundations, until one Terraform root module slowly becomes responsible for too much. \ When this happens, every change feels heavier than it should. \ A small network change triggers a plan across unrelated platform resources. A firewall update sits next to storage, monitoring, identity, DNS and application infrastructure. A harmless refactor produces a plan that is too large to review properly. Somebody accidentally removes a security group (not me, honest). Nobody wants to touch the state because the blast radius is unclear and it all looks very complicated. \ Splitting the state can help, but it should be treated carefully. Terraform state is the map between your configuration and the real objects in your cloud account. If that map is wrong, Terraform may try to create resources that already exist or destroy resources that should have been left alone. \ This article walks through a practical pattern for splitting a monolithic Terraform state into smaller layered states without losing track of resources. \ The Problem With One Large State A single state file is not always bad. For small systems, it keeps things simple. One backend, one plan, and one place to look. Easy. The problem starts when the state contains resources with different lifecycles: \ A single state file is not always bad. For small systems, it keeps things simple. One backend, one plan, and one place to look. Easy. The problem starts when the state contains resources with different lifecycles: \ Foundation resources like resource groups or shared identity objects. Core networking such as virtual networks, subnets and route tables. Connectivity resources including VPN gateways or private DNS resolvers. Security controls such as firewall policies or key vaults. Data platform resources such as warehouses, workspaces or storage accounts. Monitoring resources such as diagnostics and log destinations. \ These layers do not change at the same pace. They also do not carry the same risks as one another. A route table update should not require reviewing every unrelated resource in the estate. \ The goal of splitting state is to make each layer independently understandable and independently deployable, while keeping dependencies flowing in one direction. \ Lower layers should expose outputs. Higher layers should consume those outputs through remote state, data sources or clearly defined input variables. \ Start With Layer Boundaries Do not begin by moving resources. Begin by deciding what the final shape should be. Spend some time thinking about your platform and how to organise it. \ A typical split might look like this: 00-foundation 01-network-core 02-network-connectivity 03-private-dns 04-security 05-monitoring 06-storage 07-data-platform \ The exact names do not matter. What matters is cohesion. Move resources as groups that make operational sense, and avoid splitting tightly coupled resources just because you want neat folders. \ Migrating the State The sections above focus on why Terraform state might be split and how to define sensible layer boundaries. What follows is a practical migration process for moving resources from a monolithic state into separate layered state files. The exact commands and backend details will vary between environments, but the overall sequence remains the same: freeze changes, back up the existing state, move resource bindings, push the new states and verify the result before normal deployments resume. \ Freeze Changes Before Starting Before pulling or modifying state files, pause the normal deployment pipelines and make sure nobody else is running Terraform against the affected environment. \ Local state files used during the migration are not protected by the remote backend’s normal locking mechanism. A concurrent apply against the original backend could therefore invalidate the migration while it is in progress. \ Record the start of the maintenance window, the state version being migrated and the person responsible for the cutover. \ Inspect the Existing State Before moving anything, take a full backup and inspect what Terraform currently tracks. \ If your backend is remote, pull a local copy: terraform state pull > monolith-backup.tfstate cp monolith-backup.tfstate monolith-working.tfstate Then list the resources: terraform state list -state=monolith-working.tfstate \ I usually separate data sources from managed resources. Data sources are lookups. They do not need to be moved between states in the same way managed resources do. \ For a quick check use: terraform state list -state=monolith-working.tfstate \ | grep -vE '(^data\.|\.data\.)' \ This is only a convenience filter. Because it relies on address naming, it can produce false matches for modules named data . For automated validation, use Terraform’s JSON state output and filter resources where mode is managed . \ You can also inspect a specific module or naming pattern: terraform state list \ -state=monolith-working.tfstate \ module.network \ At this point, build a simple mapping: each current state address goes to one target layer. \ \ Create the Target Layer First Each new layer needs its own Terraform configuration and backend. \ For example: infra/ terraform/ monolith/ 01-network-core/ 02-network-connectivity/ 03-private-dns/ \ Inside the new layer, add the Terraform code for the resources you intend to move. Moving the state without moving the matching configuration simply creates a different problem: Terraform will still not know how to plan the resources correctly. \ From the target layer, initialise the backend: cd infra/terraform/01-network-core terraform init -backend-config="env/dev.backend.tfvars" \ Then return to the folder where you are managing the local state files. \ Move Resource Bindings, Not Cloud Resources The command you need is terraform state mv . \ The important mental model is this: you are moving Terraform’s binding to an existing remote object. You are not moving the cloud object itself. \ When moving between two local state files, use -state for the source and -state-out for the destination. src="monolith-working.tfstate" dst="network-core.tfstate" terraform state mv \ -state="$src" \ -state-out="$dst" \ azurerm_virtual_network.main \ azurerm_virtual_network.main \ Even when the address remains unchanged, Terraform expects both a source and destination address. \ For groups of resources, first capture the addresses and then move them one at a time: #!/usr/bin/env bash set -euo pipefail src="monolith-working.tfstate" dst="data-platform.tfstate" mapfile -t addresses < <( terraform state list -state="$src" \ | grep '^module\.data_platform' ) if ((${#addresses[@]} == 0)); then echo "No matching resources found." >&2 exit 1 fi for addr in "${addresses[@]}"; do echo "Moving $addr" terraform state mv \ -state="$src" \ -state-out="$dst" \ "$addr" \ "$addr" done \ Using set -euo pipefail ensures that the script stops if one of the state moves fails. \ This pattern is useful when a module already represents a clean boundary. If the target configuration changes resource addresses, provide the new address as the final argument: terraform state mv \ -state="$src" \ -state-out="$dst" \ azurerm_route_table.main \ module.network.azurerm_route_table.main \ The destination address must match the configuration in the new layer. \ Push the New State Once the resources have been moved, both local state files have changed: \ The working monolith state no longer contains the migrated resources. The new layer state now contains those resources. \ Both updated states must therefore be pushed to their respective backends. \ A cross state migration is not atomic. During the cutover, there will briefly be a period where the remote states do not fully reflect the intended ownership. This is why normal deployments must remain paused until both states have been pushed and verified. \ If your organisation uses a controlled process for publishing state to backend storage, follow that process instead of pushing directly from a workstation. \ Avoid using force unless you have investigated the failed safety checks and fully understand the consequences. \ Run a Plan in the New Layer Now run a plan from the new layer: terraform plan -var-file="env/dev.tfvars" \ This is the real test. Ideally, Terraform should show no infrastructure changes. In some cases, you may see output-only changes or harmless refresh differences. If outputs need to be recalculated, a refresh-only apply can be useful: terraform apply -refresh-only -var-file="env/dev.tfvars" \ What you do not want to see is Terraform trying to destroy and recreate resources you intended to preserve. If that happens, do not apply. Usually it means one of three things: \ The resource was moved to state but not added to the target configuration. The resource address in state does not match the new configuration. A dependency or input value differs between the old monolith and the new layer. \ Fix the configuration or state mapping, then plan again. Rinse and repeat. \ Prove You Did Not Lose Anything The most useful check is a parity comparison between the original monolith and the combined split states. \ The exact script will vary by project, but the idea is simple: \ List all managed resources from the original backup state. List all managed resources from the new split states. Sort both lists. Compare them. \ Here is a generic Bash script to get you started: #!/bin/bash # Define your original backup state file monolith="monolith-backup.tfstate" # Define the array of your newly split target state files states=( "foundation.tfstate" "network.tfstate" "security.tfstate" "data-platform.tfstate" ) # Create temporary files to hold the resource lists for comparison orig=$(mktemp) combined=$(mktemp) # 1. Extract and sort managed resources from the original monolith (excluding data sources) terraform state list -state "$monolith" \ | grep -vE '(^data\.|\.data\.)' \ | sort > "$orig" # 2. Extract managed resources from all the new split states for s in "${states[@]}"; do if [ -f "$s" ]; then terraform state list -state "$s" \ | grep -vE '(^data\.|\.data\.)' \ >> "$combined" fi done # 3. Sort and deduplicate the combined list of split resources sort -u "$combined" -o "$combined" # 4. Compare the two lists echo "=========================================" echo " STATE PARITY REPORT" echo "=========================================" echo "-- Missing from split (Exists in Monolith but NOT in Split) --" comm -23 "$orig" "$combined" echo "" echo "-- Extra in split (Exists in Split but NOT in Monolith) --" comm -13 "$orig" "$combined" echo "" echo "-- Resource Counts --" echo "Original Monolith: $(wc -l < "$orig")" echo "Combined Split: $(wc -l < "$combined")" echo "=========================================" # 5. Clean up temporary files rm -f "$orig" "$combined" Expected result: no missing resources and no unexpected extras. This does not replace terraform plan, but it gives you a quick sanity check that the split states collectively account for the same managed resources as the original state. \ Keep the Old State Until the Split Is Proven Keep the original pulled state file, the working state file and any migration notes until each new layer has produced a clean plan. You want a clear audit trail of what moved, where it moved, and how you verified it. Also make sure the old monolith is no longer used for normal deployments once resources have been moved out. Running plans against both the old and new structures is how teams end up with confusing drift and duplicate ownership. \ Common Failure Modes The first common issue is a destroy plan in the new layer. This usually means the configuration does not match the moved state. \ The second issue is missing outputs. Once a lower layer becomes independent, higher layers may need remote state outputs or data sources that did not exist before. \ The third issue is moving data sources. In most cases, do not. Data sources are lookups. Recreate them in each layer where they are needed. \ The fourth issue is making the layers too small. Splitting state is meant to reduce operational risk, not create a web of tiny stacks that nobody can reason about. \ CLI Migration vs Configuration Driven Migration This article uses terraform state mv with local source and destination state files. You should know that this is not the only way to move resources between states. \ Since Terraform 1.7, cross-state migrations can also be performed using configuration-driven removed and import blocks. This approach records the migration in Terraform configuration, allowing it to be reviewed through normal plans and pull requests. \ HashiCorp recommends the configuration driven approach. However, direct state transfer can still be practical when working with older Terraform versions, moving a large number of existing bindings, or avoiding the need to identify provider specific import IDs for every resource. \ The -state and -state-out options used in this guide are classified as legacy, but they remain supported. Because they modify state directly, they should be used as part of a controlled migration with secured backups and careful verification of both the source and destination states. \ Final Thoughts Splitting Terraform state is not something to do casually, but it is also not something to avoid forever. It is a conscious design decision. \ If a monolithic state is making every change slow, risky and difficult to review, it may be time to break it into smaller layers. The safest approach is a boring and methodical approach of defining the target layers, working from a backup, moving state bindings, planning every layer and proving parity against the original state. \ The real value is having clearer boundaries around what each deployment can change. If something goes wrong, the immediate blast radius is easier to understand and is usually limited to a smaller part of the platform. That gives engineers more confidence to make changes without worrying that an unrelated resource might change. \ Smaller layers also make Terraform projects easier to understand. A new engineer can start with the network layer or the data platform layer without first having to understand the entire estate. Plans are shorter, ownership is clearer and reviews can focus on the part of the infrastructure that is actually changing.
View original source — Hacker Noon ↗


