Normally, it’s best practice to start by measuring performance before making any changes. This allows you to understand the impact of your changes, and to identify areas for improvement.
However, given the nature of the problems that Terragrunt solves, there are some obvious wins that you can make without measuring performance, if you’re aware of the tradeoffs.
The fastest run is the one that doesn’t happen. Before tuning how quickly units run, make sure you aren’t running units that don’t need to run at all.
The --filter flag selects the units a command operates on using a flexible query language. The simplest form is a path expression, which is also the easiest way to control which units are discovered and run:
Terminal window
terragruntrun--all--filter'./prod/**'--plan
The biggest win is usually in CI: instead of planning every unit in the repository on every change, use a Git expression to target only the units affected by a change:
Terminal window
terragruntrun--all--filter'[main...HEAD]'--plan
The --filter-affected flag is a shorthand for exactly this, comparing your repository’s default branch to HEAD:
Terminal window
terragruntrun--all--filter-affected--plan
Filters can also select by name, path, attribute, and graph relationship (a unit plus everything that depends on it, say), and they can be combined. The older --queue-include-dir, --queue-include-units-reading, and related --queue-* flags still work as aliases for equivalent filter expressions.
Filtering reduces how many units run, but not always how many are discovered. Depending on the expression, Terragrunt may still need to parse configurations beyond the filtered set to evaluate the filter and build the dependency graph, so discovery-phase costs can remain. The Skipping Auth During Discovery and Reducing Parse Overhead sections below address those.
Filter expression costs vary
Expressions differ in how much discovery work they cause:
Searches from the target’s directory up to the repository root, parsing configurations along the way
Dependent discovery is the most expensive of these, and it can parse directories well outside your working directory.
Uncommitted changes aren't affected
The --filter-affected flag compares committed changes. Uncommitted local modifications may not be included, and Terragrunt logs a warning when it detects them.
One of the most expensive things that OpenTofu/Terraform does, from a bandwidth and disk utilization perspective, is download and install providers. These are large binary files that are downloaded from the internet, and not cached across units by default.
If you’re using OpenTofu >= 1.10 and the latest version of Terragrunt, you’ll use the Automatic Provider Cache Dir feature by default.
This feature automatically configures OpenTofu to use its built-in provider caching mechanism by setting the TF_PLUGIN_CACHE_DIR environment variable to a central location on the filesystem, allowing reuse of downloaded providers across multiple Terragrunt runs.
For most users at sensible scales, this is an automatic performance win that you don’t need to do anything to enable.
At very large scales, you might find that the filesystem lock contention between OpenTofu processes to synchronize access to the provider cache directory is a bottleneck. The Provider Cache Server avoids this contention.
Not available in every setup
You can’t use the provider cache directory if you store your provider cache in a shared NFS mount, use Terraform, or use an older version of OpenTofu. In these scenarios, the Provider Cache Server can improve performance instead.
The provider cache server is a single server that is used by all Terragrunt runs being performed in a given Terragrunt invocation. You will see the most benefit if you are using it in a command that will perform many OpenTofu/Terraform operations, like with the --all flag and the --graph flag.
Can be a net negative for single runs
When performing individual runs, like terragrunt plan, the provider cache server can be a net negative to performance, because starting and stopping the server might add more overhead than just downloading the providers (or using the Automatic Provider Cache Dir feature). Whether this is the case depends on many factors, including network speed, the number of providers being downloaded, and whether or not the providers are already cached in the Terragrunt provider cache.
When in doubt, measure the performance before and after enabling the provider cache server to see if it’s a net win for your use case.
Under the hood, Terragrunt dependency blocks leverage the OpenTofu/Terraform output -json command to fetch outputs from one unit and leverage them in another.
The OpenTofu/Terraform output -json command does a bit more work than simply fetching output values from state, and a significant portion of that slowdown is loading providers, which it doesn’t really need in most cases.
You can significantly improve the performance of dependency blocks by using the dependency-fetch-output-from-state experiment. When the experiment is active, Terragrunt will resolve outputs by directly fetching the backend state file from S3 and parse it directly, avoiding any overhead incurred by calling the output -json command of OpenTofu/Terraform.
The dependency-fetch-output-from-state experiment only works for S3 backends. If you are using a different backend, this experiment won’t do anything.
State file schemas may change
There is no guarantee that OpenTofu/Terraform will maintain the existing schema of their state files, so there is also no guarantee that the flag will work as expected in future versions of OpenTofu/Terraform.
Incompatible with OpenTofu state encryption
When client-side state encryption is enabled, the state file in S3 is encrypted before upload and cannot be parsed directly by Terragrunt, resulting in a hard failure. If you encounter JSON parsing errors when using this experiment, check whether you have OpenTofu state encryption enabled and disable the experiment with --no-dependency-fetch-output-from-state if so.
If you use the --auth-provider-cmd flag to fetch credentials at runtime, Terragrunt runs that command once per parsed component during the discovery phase, so that configuration parsing can reliably resolve HCL functions like get_aws_account_id and run_cmd.
On large repositories, this can dominate wall-clock time, because the auth command runs for every discovered unit, not just the subset that will actually run. This is especially noticeable when using reading-based filters like --queue-include-units-reading, where discovery has to parse the whole tree to select a small number of units.
The --no-discovery-auth-provider-cmd flag (env: TG_NO_DISCOVERY_AUTH_PROVIDER_CMD) skips those discovery-time invocations. The auth provider command still runs normally for the units that actually execute.
Discovery-time authentication exists for a reason: any configuration that needs credentials just to parse will fail without it. If a unit’s source URL, dependency paths, or other discovery-relevant blocks depend on values produced by --auth-provider-cmd (an account ID fetched via get_aws_account_id, or a run_cmd call that hits an authenticated API), parsing that unit will fail with the flag set.
Use the flag when you know your configurations parse successfully without any prior authentication, and credentials are only needed at run time.
Commands that operate on multiple units, like run --all, parse the configuration of every discovered unit before anything runs. Included files don’t get parsed once and shared: they are evaluated again in the context of each unit that includes them. Anything expensive in a root configuration is paid once per unit, not once per invocation, so small per-parse costs add up quickly on large repositories.
The run_cmd HCL function executes during parsing, and its results are cached per directory. A run_cmd call in a root configuration included by a hundred units runs a hundred times, once per unit, even when it produces the same output every time.
When the output doesn’t depend on the directory it runs in (resolving the current git commit for tagging resources, say), pass --terragrunt-global-cache so the command runs once per invocation and every other unit reuses the result:
The read_terragrunt_config HCL function evaluates the target file as HCL each time it’s called, including any function calls the file makes. When that file is included by many units, the whole evaluation repeats for each one.
That cost buys you dynamic behavior, and shared configuration often doesn’t need any: region names, tag maps, account IDs, and similar values are just static data. Storing static data as JSON or YAML and decoding it is cheaper than evaluating it as HCL:
locals {
common = yamldecode(file(find_in_parent_folders("common.yaml")))
}
This also makes the shared file easier to reason about, since a .yaml or .json file can’t run commands or depend on parsing context.
When you use commands with the --all or --graph flags, Terragrunt queues up a run for every unit involved. The --parallelism flag (or the TG_PARALLELISM environment variable) caps how many of those runs are allowed to execute at the same time.
Before running anything, Terragrunt builds a graph of the dependencies between units. A unit becomes eligible to run once every unit it depends on has finished successfully. Eligible units start immediately as long as the number of runs already in flight is below the parallelism limit. Otherwise, they wait for a running unit to finish and free up a slot.
Each run is a separate OpenTofu/Terraform process, scheduled by the operating system like any other, and the parallelism limit caps how many of them run at once.
By default, there is no limit. Every eligible unit starts right away, no matter how many cores the machine has. This is usually fine, because OpenTofu/Terraform runs spend most of their time waiting on network requests to provider APIs rather than using the CPU, so a machine can typically make progress on many more runs than it has cores. It also means you can see far more OpenTofu/Terraform processes than you might expect, and that an update to a file shared by every unit in a large tree (a root configuration, say) can start every one of those units at once.
There is no universally correct setting for --parallelism. The right value depends on how much memory, CPU, and network I/O your units consume, so treat tuning as an exercise in measurement.
Two trends are worth knowing before you start measuring:
Peak memory usage grows roughly in proportion to parallelism. Every concurrent run is its own OpenTofu/Terraform process, loading its own provider plugins and its own copy of state.
Speed improvements reach a diminishing return with increasing parallelism. Once there are enough runs in flight to keep the machine and network busy, additional concurrent runs just queue up, bottlenecked by the same resources.
Once speed gains level off, higher parallelism costs you memory (and pressure on provider API rate limits) without making anything faster.
A reasonable procedure is to start with parallelism set to the number of CPU cores on your machine, then double or halve it while measuring wall clock time and peak memory usage until you stop seeing improvements.
If your runs happen on shared infrastructure, like CI runners used by other jobs, consider setting a value below the optimum you measure in isolation. Even when day-to-day runs never come close to the limit, a cap bounds the worst-case resource usage of a run that touches every unit at once. An out-of-memory kill from an unbounded full-tree run is a worse outcome than a slower, bounded one.
OpenTofu/Terraform has its own -parallelism flag, which limits how many resource operations a single run performs concurrently (10 by default). Terragrunt’s --parallelism flag doesn’t touch this setting. The two multiply: total concurrent operations against your cloud provider scale with both the number of units in flight and the parallelism within each of them.
Before diving into any particular performance optimization, it’s important to first measure performance, and to make sure that you measure performance after any changes so that you understand the impact of your changes.
To measure performance, you can use multiple tools, depending on your role.
Use OpenTelemetry to collect traces from Terragrunt runs so that you can analyze the performance of individual operations when using Terragrunt.
This can be useful both to identify bottlenecks in Terragrunt, and to understand when performance changes can be attributed to integrations with other tools, like OpenTofu or Terraform.
Use benchmarking tools like Hyperfine to run benchmarks of your Terragrunt usage to compare the performance of different versions of Terragrunt, or with different configurations.
You can use configurations like the --warmup flag to do some warmup runs before the actual benchmarking. This is useful to get a more accurate measurement of the performance of Terragrunt with cache populated, etc.
Here’s an example of how to use Hyperfine to benchmark the performance of Terragrunt with two different configurations:
Terminal window
hyperfine-w3-r5'terragrunt run --all plan''terragrunt run --all plan --experiment=dependency-fetch-output-from-state'
Use Benchmark tests to measure the performance of particular subroutines in Terragrunt.
These benchmarks give you a good indication of the performance of a particular part of Terragrunt, and can help you identify areas for improvement. You can run benchmark tests like this:
Terminal window
gotest-bench=BenchmarkSomeFunction
You can also run benchmarks with different configurations, like the following for getting memory allocation information as well: