Finally understanding why a poor Design Philosophy doomed me to failure
Nearly two years ago I published a story about provisioning cloud infrastructure with Terraform and configuring servers with Ansible. While the process will indeed get you Nginx running on a cloud server, it has a glaring flaw: when I wrote the story I had no idea of the agony and suffering that using a local-exec provisioner would cause.
Why? Because I did not have a clear design philosophy to guide my architecture.
What was going on
Terraform is an open-source tool that provisions cloud infrastructure by codifying the cloud APIs into declarative configuration files.
Ansible is an open-source tool that install programs and makes configuration changes to servers over SSH.
It is possible — though I no longer recommend — to use a Terraform Provisioner to execute an Ansible playbook on an instance while the instance is still being deployed. That’s kind of neat because by the time you get the green “Apply Complete” from Terraform, you know the instance is both provisioned and configured. Boom! Automation!
Until the infrastructure starts to increase in complexity. Or until an instance crashes and needs to be recreated. Or until the app changes. Or really just until real life happens.
Here is the weird part. The Terraform docs say in huge bold letters “Provisioners are a Last Resort!” Yet there are plenty of resources out there about how to use them, and it really just didn’t sink in why. So I went to the last resort as my first resort.
The (lack of) design philosophy that I had
At first, I was looking for a one-click solution to whiz-bang up a web app. As time went on, I wanted a system that I could reason about and make reasonable changes to. This process diagram shows why that was really hard.
As the complexity of my architecture continued to increase, I found it impossible to know precisely what was deployed, difficult to make changes, and inconceivable to conduct automated tests.
The design philosophy I should have had
I’ve since discovered that most of the hard work of creating relevant design philosophies for this architecture has already been done. I just needed to pull it together and apply it.
- HashiCorp, the company behind Terraform, uses The Tao of HashiCorp as the foundation of their vision and product design.
- Terraform is written in Go, and Ardan Labs Ultimate Go is good
- Finally, Ansible’s own best practices
Modularity
Components should be small blocks that are functional on their own, and can be combined in new and innovative ways.
Immutability and Versioning
Immutable infrastructure leads to more robust systems that are simpler to operate, debug, version and visualize. All processes should be written as code, stored, and versioned. Tools run against infrastructure should be idempotent, meaning that the result of performing an operation once is exactly the same as the result of performing it repeatedly.
Resilience and Error Handling
Errors must be expected and handled gracefully. The system must collect real-time information through functionally independent components. These components will provide the tooling to self-heal and auto-recover.
System Mental Model
Bias toward a system that is easy to understand and reason about. Focus time on structuring code that provides the best mental model possible.
Putting the philosophy into practice
After all that, the answer is as simple as “let the tools do what they were designed to do!”
First, have Terraform provision the infrastructure; leverage its built in error handling if need be. Then place the appropriate outputs into an Ansible host file. Finally, run Ansible on those hosts.
This modular approach is easier to reason about, more resilient, and produces a more dependable infrastructure.
Further improvement: true immutability
But wait! What about configuration drift as the needs of the app change? For example, older servers will have settings and artifacts that are different from fresh servers.
Well, if you aren’t striving for zero downtime and the number of servers is small, you can simply taint old servers in Terraform, which then forces them to be re-created on the next Apply. Ansible can then be run against the set of hosts.
As infrastructure gets more complex, you’ll probably need to look at truly immutable options, such as Docker or Packer.