Performance

Easy Wins

Normally, it’s best practice to start by measuring performance before making any changes. This allows you to understand the impact of your changes, and to identify areas for improvement.

However, given the nature of the problems that Terragrunt solves, there are some obvious wins that you can make without measuring performance, if you’re aware of the tradeoffs.

Running Fewer Units

The fastest run is the one that doesn’t happen. Before tuning how quickly units run, make sure you aren’t running units that don’t need to run at all.

The --filter flag selects the units a command operates on using a flexible query language. The simplest form is a path expression, which is also the easiest way to control which units are discovered and run:

terragrunt run --all --filter './prod/**' -- plan

The biggest win is usually in CI: instead of planning every unit in the repository on every change, use a Git expression to target only the units affected by a change:

terragrunt run --all --filter '[main...HEAD]' -- plan

The --filter-affected flag is a shorthand for exactly this, comparing your repository’s default branch to HEAD:

terragrunt run --all --filter-affected -- plan

Filters can also select by name, path, attribute, and graph relationship (a unit plus everything that depends on it, say), and they can be combined. The older --queue-include-dir, --queue-include-units-reading, and related --queue-* flags still work as aliases for equivalent filter expressions.

Running Fewer Units - Gotchas

Filtering doesn't always reduce discovery

Filtering reduces how many units run, but not always how many are discovered. Depending on the expression, Terragrunt may still need to parse configurations beyond the filtered set to evaluate the filter and build the dependency graph, so discovery-phase costs can remain. The Skipping Auth During Discovery and Reducing Parse Overhead sections below address those.

Filter expression costs vary

Expressions differ in how much discovery work they cause:

Expression	Example	Discovery work
Name, path, and attributes like `type=`	`./prod/**`	Walks the directory tree; no parsing
`reading=`	`reading=common.yaml`	Parses candidate units to determine which files each one reads
Git	`[main...HEAD]`	Generates a temporary Git worktree for each reference being compared
Graph, dependencies	`vpc...`	Follows the dependencies declared by the target
Graph, dependents	`...vpc`	Searches from the target’s directory up to the repository root, parsing configurations along the way

Dependent discovery is the most expensive of these, and it can parse directories well outside your working directory.

Uncommitted changes aren't affected

The --filter-affected flag compares committed changes. Uncommitted local modifications may not be included, and Terragrunt logs a warning when it detects them.

Provider Cache Dir

One of the most expensive things that OpenTofu/Terraform does, from a bandwidth and disk utilization perspective, is download and install providers. These are large binary files that are downloaded from the internet, and not cached across units by default.

If you’re using OpenTofu >= 1.10 and the latest version of Terragrunt, you’ll use the Automatic Provider Cache Dir feature by default.

This feature automatically configures OpenTofu to use its built-in provider caching mechanism by setting the TF_PLUGIN_CACHE_DIR environment variable to a central location on the filesystem, allowing reuse of downloaded providers across multiple Terragrunt runs.

For most users at sensible scales, this is an automatic performance win that you don’t need to do anything to enable.

Provider Cache Dir - Gotchas

Lock contention at very large scales

At very large scales, you might find that the filesystem lock contention between OpenTofu processes to synchronize access to the provider cache directory is a bottleneck. The Provider Cache Server avoids this contention.

Not available in every setup

You can’t use the provider cache directory if you store your provider cache in a shared NFS mount, use Terraform, or use an older version of OpenTofu. In these scenarios, the Provider Cache Server can improve performance instead.

Provider Cache Server

You can significantly reduce the amount of time taken by Terragrunt runs by enabling the provider cache server, like this:

terragrunt run --all plan --provider-cache

Provider Cache - Gotchas

Built for many runs at once

The provider cache server is a single server that is used by all Terragrunt runs being performed in a given Terragrunt invocation. You will see the most benefit if you are using it in a command that will perform many OpenTofu/Terraform operations, like with the --all flag and the --graph flag.

Can be a net negative for single runs

When performing individual runs, like terragrunt plan, the provider cache server can be a net negative to performance, because starting and stopping the server might add more overhead than just downloading the providers (or using the Automatic Provider Cache Dir feature). Whether this is the case depends on many factors, including network speed, the number of providers being downloaded, and whether or not the providers are already cached in the Terragrunt provider cache.

When in doubt, measure the performance before and after enabling the provider cache server to see if it’s a net win for your use case.

Fetching Output From State

Under the hood, Terragrunt dependency blocks leverage the OpenTofu/Terraform output -json command to fetch outputs from one unit and leverage them in another.

The OpenTofu/Terraform output -json command does a bit more work than simply fetching output values from state, and a significant portion of that slowdown is loading providers, which it doesn’t really need in most cases.

You can significantly improve the performance of dependency blocks by using the dependency-fetch-output-from-state experiment. When the experiment is active, Terragrunt will resolve outputs by directly fetching the backend state file from S3 and parse it directly, avoiding any overhead incurred by calling the output -json command of OpenTofu/Terraform.

For example:

terragrunt run --all plan --experiment=dependency-fetch-output-from-state

Fetching Output From State - Gotchas

S3 backends only

The dependency-fetch-output-from-state experiment only works for S3 backends. If you are using a different backend, this experiment won’t do anything.

State file schemas may change

There is no guarantee that OpenTofu/Terraform will maintain the existing schema of their state files, so there is also no guarantee that the flag will work as expected in future versions of OpenTofu/Terraform.

Incompatible with OpenTofu state encryption

When client-side state encryption is enabled, the state file in S3 is encrypted before upload and cannot be parsed directly by Terragrunt, resulting in a hard failure. If you encounter JSON parsing errors when using this experiment, check whether you have OpenTofu state encryption enabled and disable the experiment with --no-dependency-fetch-output-from-state if so.

Skipping Auth During Discovery

If you use the --auth-provider-cmd flag to fetch credentials at runtime, Terragrunt runs that command once per parsed component during the discovery phase, so that configuration parsing can reliably resolve HCL functions like get_aws_account_id and run_cmd.

On large repositories, this can dominate wall-clock time, because the auth command runs for every discovered unit, not just the subset that will actually run. This is especially noticeable when using reading-based filters like --queue-include-units-reading, where discovery has to parse the whole tree to select a small number of units.

The --no-discovery-auth-provider-cmd flag (env: TG_NO_DISCOVERY_AUTH_PROVIDER_CMD) skips those discovery-time invocations. The auth provider command still runs normally for the units that actually execute.

terragrunt run --all \
  --auth-provider-cmd /path/to/auth-script.sh \
  --no-discovery-auth-provider-cmd \
  --queue-include-units-reading=./changed-file.txt \
  plan

Skipping Auth During Discovery - Gotchas

Parsing that needs credentials will fail

Discovery-time authentication exists for a reason: any configuration that needs credentials just to parse will fail without it. If a unit’s source URL, dependency paths, or other discovery-relevant blocks depend on values produced by --auth-provider-cmd (an account ID fetched via get_aws_account_id, or a run_cmd call that hits an authenticated API), parsing that unit will fail with the flag set.

Use the flag when you know your configurations parse successfully without any prior authentication, and credentials are only needed at run time.

Reducing Parse Overhead

Commands that operate on multiple units, like run --all, parse the configuration of every discovered unit before anything runs. Included files don’t get parsed once and shared: they are evaluated again in the context of each unit that includes them. Anything expensive in a root configuration is paid once per unit, not once per invocation, so small per-parse costs add up quickly on large repositories.

External Commands During Parsing

The run_cmd HCL function executes during parsing, and its results are cached per directory. A run_cmd call in a root configuration included by a hundred units runs a hundred times, once per unit, even when it produces the same output every time.

When the output doesn’t depend on the directory it runs in (resolving the current git commit for tagging resources, say), pass --terragrunt-global-cache so the command runs once per invocation and every other unit reuses the result:

locals {
  commit = run_cmd("--terragrunt-global-cache", "git", "rev-parse", "--short", "HEAD")
}

Static Configuration as JSON or YAML

The read_terragrunt_config HCL function evaluates the target file as HCL each time it’s called, including any function calls the file makes. When that file is included by many units, the whole evaluation repeats for each one.

That cost buys you dynamic behavior, and shared configuration often doesn’t need any: region names, tag maps, account IDs, and similar values are just static data. Storing static data as JSON or YAML and decoding it is cheaper than evaluating it as HCL:

locals {
  common = yamldecode(file(find_in_parent_folders("common.yaml")))
}

This also makes the shared file easier to reason about, since a .yaml or .json file can’t run commands or depend on parsing context.

Tuning Parallelism

When you use commands with the --all or --graph flags, Terragrunt queues up a run for every unit involved. The --parallelism flag (or the TG_PARALLELISM environment variable) caps how many of those runs are allowed to execute at the same time.

How Runs Are Scheduled

Before running anything, Terragrunt builds a graph of the dependencies between units. A unit becomes eligible to run once every unit it depends on has finished successfully. Eligible units start immediately as long as the number of runs already in flight is below the parallelism limit. Otherwise, they wait for a running unit to finish and free up a slot.

Each run is a separate OpenTofu/Terraform process, scheduled by the operating system like any other, and the parallelism limit caps how many of them run at once.

By default, there is no limit. Every eligible unit starts right away, no matter how many cores the machine has. This is usually fine, because OpenTofu/Terraform runs spend most of their time waiting on network requests to provider APIs rather than using the CPU, so a machine can typically make progress on many more runs than it has cores. It also means you can see far more OpenTofu/Terraform processes than you might expect, and that an update to a file shared by every unit in a large tree (a root configuration, say) can start every one of those units at once.

Picking a Value

There is no universally correct setting for --parallelism. The right value depends on how much memory, CPU, and network I/O your units consume, so treat tuning as an exercise in measurement.

Two trends are worth knowing before you start measuring:

Peak memory usage grows roughly in proportion to parallelism. Every concurrent run is its own OpenTofu/Terraform process, loading its own provider plugins and its own copy of state.
Speed improvements reach a diminishing return with increasing parallelism. Once there are enough runs in flight to keep the machine and network busy, additional concurrent runs just queue up, bottlenecked by the same resources.

Once speed gains level off, higher parallelism costs you memory (and pressure on provider API rate limits) without making anything faster.

A reasonable procedure is to start with parallelism set to the number of CPU cores on your machine, then double or halve it while measuring wall clock time and peak memory usage until you stop seeing improvements.

If your runs happen on shared infrastructure, like CI runners used by other jobs, consider setting a value below the optimum you measure in isolation. Even when day-to-day runs never come close to the limit, a cap bounds the worst-case resource usage of a run that touches every unit at once. An out-of-memory kill from an unbounded full-tree run is a worse outcome than a slower, bounded one.

Parallelism Within a Single Run

OpenTofu/Terraform has its own -parallelism flag, which limits how many resource operations a single run performs concurrently (10 by default). Terragrunt’s --parallelism flag doesn’t touch this setting. The two multiply: total concurrent operations against your cloud provider scale with both the number of units in flight and the parallelism within each of them.

If you need to adjust within-run parallelism across units, use an extra_arguments block with the get_terraform_commands_that_need_parallelism function:

terraform {
  extra_arguments "parallelism" {
    commands  = get_terraform_commands_that_need_parallelism()
    arguments = ["-parallelism=5"]
  }
}

Measuring Performance

Before diving into any particular performance optimization, it’s important to first measure performance, and to make sure that you measure performance after any changes so that you understand the impact of your changes.

To measure performance, you can use multiple tools, depending on your role.

End User

As an end user, you’re advised to use the following tools to get a better understanding of the performance of Terragrunt.

OpenTelemetry

Use OpenTelemetry to collect traces from Terragrunt runs so that you can analyze the performance of individual operations when using Terragrunt.

This can be useful both to identify bottlenecks in Terragrunt, and to understand when performance changes can be attributed to integrations with other tools, like OpenTofu or Terraform.

Benchmark Usage

Use benchmarking tools like Hyperfine to run benchmarks of your Terragrunt usage to compare the performance of different versions of Terragrunt, or with different configurations.

You can use configurations like the --warmup flag to do some warmup runs before the actual benchmarking. This is useful to get a more accurate measurement of the performance of Terragrunt with cache populated, etc.

Here’s an example of how to use Hyperfine to benchmark the performance of Terragrunt with two different configurations:

hyperfine -w 3 -r 5 'terragrunt run --all plan' 'terragrunt run --all plan --experiment=dependency-fetch-output-from-state'

Terragrunt Developer

As a Terragrunt developer, you’re advised to use the following tools to improve the performance of Terragrunt when improving the codebase.

Benchmark Tests

Use Benchmark tests to measure the performance of particular subroutines in Terragrunt.

These benchmarks give you a good indication of the performance of a particular part of Terragrunt, and can help you identify areas for improvement. You can run benchmark tests like this:

go test -bench=BenchmarkSomeFunction

You can also run benchmarks with different configurations, like the following for getting memory allocation information as well:

go test -bench=BenchmarkSomeFunction -benchmem

You can learn more about benchmarking in Go by reading the official documentation.

Profiling

Use profiling tools like pprof to get a more detailed view of the performance of Terragrunt.

For example, you could use the following command to profile a particular test:

go test -run 'SomeTest' -cpuprofile=cpu.prof -memprofile=mem.prof

You can then use the go tool pprof command to analyze the profile data:

go tool pprof cpu.prof

It can be helpful to use the web interface to view the profile data using flame graphs, etc.

go tool pprof -http=:8080 cpu.prof

You can learn more about profiling in Go by reading the official documentation.