Finally understanding why a poor Design Philosophy doomed me to failure

4 min readMar 19, 2022

My cloud infrastructure went up in flames

Nearly two years ago I published a story about provisioning cloud infrastructure with Terraform and configuring servers with Ansible. While the process will indeed get you Nginx running on a cloud server, it has a glaring flaw: when I wrote the story I had no idea of the agony and suffering that using a local-exec provisioner would cause.

Why? Because I did not have a clear design philosophy to guide my architecture.

What was going on

Terraform is an open-source tool that provisions cloud infrastructure by codifying the cloud APIs into declarative configuration files.

Ansible is an open-source tool that install programs and makes configuration changes to servers over SSH.

It is possible — though I no longer recommend — to use a Terraform Provisioner to execute an Ansible playbook on an instance while the instance is still being deployed. That’s kind of neat because by the time you get the green “Apply Complete” from Terraform, you know the instance is both provisioned and configured. Boom! Automation!

Until the infrastructure starts to increase in complexity. Or until an instance crashes and needs to be recreated. Or until the app changes. Or really just until real life happens.

Here is the weird part. The Terraform docs say in huge bold letters “Provisioners are a Last Resort!” Yet there are plenty of resources out there about how to use them, and it really just didn’t sink in why. So I went to the last resort as my first resort.

The (lack of) design philosophy that I had

At first, I was looking for a one-click solution to whiz-bang up a web app. As time went on, I wanted a system that I could reason about and make reasonable changes to. This process diagram shows why that was really hard.

Model of Ansible executed by a Terraform Provisioner. It’s messy.

As the complexity of my architecture continued to increase, I found it impossible to know precisely what was deployed, difficult to make changes, and inconceivable to conduct automated tests.

The design philosophy I should have had

I’ve since discovered that most of the hard work of creating relevant design philosophies for this architecture has already been done. I just needed to pull it together and apply it.

HashiCorp, the company behind Terraform, uses The Tao of HashiCorp as the foundation of their vision and product design.
Terraform is written in Go, and Ardan Labs Ultimate Go is good
Finally, Ansible’s own best practices

Modularity

Components should be small blocks that are functional on their own, and can be combined in new and innovative ways.

Immutability and Versioning

Immutable infrastructure leads to more robust systems that are simpler to operate, debug, version and visualize. All processes should be written as code, stored, and versioned. Tools run against infrastructure should be idempotent, meaning that the result of performing an operation once is exactly the same as the result of performing it repeatedly.

Resilience and Error Handling

Errors must be expected and handled gracefully. The system must collect real-time information through functionally independent components. These components will provide the tooling to self-heal and auto-recover.

System Mental Model

Bias toward a system that is easy to understand and reason about. Focus time on structuring code that provides the best mental model possible.

Putting the philosophy into practice

After all that, the answer is as simple as “let the tools do what they were designed to do!”

First, have Terraform provision the infrastructure; leverage its built in error handling if need be. Then place the appropriate outputs into an Ansible host file. Finally, run Ansible on those hosts.

Using Terraform and Ansible with a sane Design Philosophy makes the system easy to reason about.

This modular approach is easier to reason about, more resilient, and produces a more dependable infrastructure.

Further improvement: true immutability

But wait! What about configuration drift as the needs of the app change? For example, older servers will have settings and artifacts that are different from fresh servers.

Well, if you aren’t striving for zero downtime and the number of servers is small, you can simply taint old servers in Terraform, which then forces them to be re-created on the next Apply. Ansible can then be run against the set of hosts.

As infrastructure gets more complex, you’ll probably need to look at truly immutable options, such as Docker or Packer.

Finally understanding why a poor Design Philosophy doomed me to failure

What was going on

The (lack of) design philosophy that I had

The design philosophy I should have had

Modularity

Immutability and Versioning

Resilience and Error Handling

System Mental Model

Putting the philosophy into practice

Further improvement: true immutability

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Brian Yarbrough

No responses yet