In this post, I will explore some development patterns that I’ve seen
(and done!) with Chef cookbooks, and then explain how we can evolve to
a new level of cookbook development. The examples here come from
which is a refactored version of an old
splunk42 cookbook. While
there is a public
splunk cookbook on the Chef community site, it
shares some of the issues that I saw with our old one, which are
partially subject matter of this post.
Anyway, on to the evolution!
These are the general patterns I’m going to address.
- Composing URLs from multiple local variables or attributes
- Large conditional logic branches like case statements in recipes
- Not using definitions when it is best to do so
- Knowledge of how node run lists are composed for search, or
searching for “
- Repeated resources across multiple orthogonal recipes
- Plaintext secrets in attributes or data bag items
Cookbook development is a wide and varied topic, so there are many other patterns to consider, but these are the ones most relevant to the refactored cookbook.
It may seem like a good idea, to compose URL strings as attributes or
local variables in a recipe based on other attributes and local
variables. For example, in our
splunk42 cookbook we have this:
1 2 3 4 5
These get used in the following
1 2 3 4
We reused the filename variable, and composed the URL to the file to
download. Then to upgrade, we can simply modify the
splunk_build, as Splunk uses a consistent naming theme for their
package URLs (thanks, Splunk!). The filename itself is built from a
case statement (more on that in the next section). We could further
make the version and build attributes, so users can update to newer
versions by simply changing the attribute.
So what is bad about this? Two things.
- This is in the
splunk42::clientrecipe, and repeated again in the
splunk42::serverrecipe with only minor differences (the package name, splunk vs splunkforwarder).
- Ruby has excellent libraries for manipulating URIs and paths as strings, and it is easier to break up a string than compose a new one.
How can this be improved? First, we can set attributes for the full URL. The actual code for that is below, but suffice to say, it will look like this (note the version is different because the new cookbook installs a new Splunk version).
Second, we have helper libraries distributed with the cookbook that break up the URI so we can return just the package filename.
1 2 3 4 5
remote_file resource is rewritten like this:
1 2 3 4
As a bonus, the helper methods are available in other places like other cookbooks and recipes, rather than the local scope of local variables.
Conditional Logic Branches
One of the wonderful things about Chef is that simple Ruby conditionals can be used in recipes to selectively set values for resource attributes, define resources that should be used, and other decisions. One of the horrible things about Chef is that simple Ruby conditionals can be used in recipes and often end up being far more complicated than originally intended, especially when handling multiple platforms and versions.
In the earlier example, we had a
splunk_file local variable set in a
recipe. I mentioned it was built from a case statement, which looks
like this, in full:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Splunk itself supports many platforms, and not all of them are covered
by this conditional, so it’s easy to imagine how this can get further
out of control and make the recipe even harder to follow. Also
consider that this is just the
client portion for the
splunkforwarder package, this same block is repeated in the
recipe, for the
So why is this bad? There are three reasons.
- We have a large block of conditionals that sit in front of a user reading a recipe.
- This logic isn’t reusable elsewhere, so it has to be duplicated in the other recipe.
- This is only the logic for the package filename, but we care about the entire URL. I’ve also covered that composing URLs isn’t delightful.
What is a better approach? Use the full URL as I mentioned before, and
set it as an attribute. We will still have the gnarly case statement,
but it will be tucked away in the
attributes/default.rb file, and
hidden from anyone reading the recipe (which is the thing they
probably care most about reading).
1 2 3 4 5 6 7 8 9 10 11
The the complete case block can be viewed in the repository. Also, since this is an attribute, consumers of this cookbook can set the URL to whatever they want, including a local HTTP server.
Another example of gnarly conditional logic looks like this, also from
1 2 3 4 5 6 7 8 9 10 11 12 13
Why is this bad? After all, we’re selecting the proper package
resource to install from a local file on disk. The main issue is the
conditional creates different resources that can’t be looked up in the
resource collection. Our recipe doesn’t do this, but perhaps a wrapper
cookbook would. The consumer wrapping the cookbook has to duplicate
this logic in their own. Instead, it is better to select the provider
for a single
1 2 3 4 5 6 7 8 9 10
Definitions Aren’t Bad
Definitions are simply defined as recipe “macros.” They are not actually Chef Resources themselves, they just look like them, and contain their own Chef resources. This has some disadvantages, such as lack of metaparameters (like action), which has lead people to prefer using the “Lightweight Resource/Provider” (LWRP) DSL instead. In fact, some feel that definitions are bad, and that one should feel bad for using them. I argue that they have their place. One advantage is their relative simplicity.
splunk42 cookbook, the client and server recipes duplicate a
lot of logic. As mentioned a lot of this is case statements for the
Splunk package file. They also repeat the same logic for choosing the
provider to install the package. I snipped the content from the
"omnios" block, but it looks like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
(Note: the logic for setting the provider is required since we’re not using the default over-the-network package providers, and installing from a local file on the system.)
This isn’t too bad on its own, but needs to be repeated again in the
server recipe if one wanted to run a Splunk server on OmniOS. The
actual differences between the client and server package installation
are the package name,
splunk. The earlier URL
attribute example established a
Using a definition, named
splunk_installer, allows us to simplify
the package installation used by the client and server recipes to look
1 2 3 4 5 6
How is this better than an LWRP? Simply that there was less ceremony in creating it. There is less cognitive load for a cookbook developer to worry about. Definitions by their very nature of containing resources are already idempotent and convergent with no additional effort. They also automatically support why-run mode, whereas in an LWRP that must be done by the developer. Finally, between resources in the definition and the rest of the Chef run, notifications may be sent.
Contrast this to an LWRP, we need
directories, and the attributes of the resource need to be defined in
the resource. Then the action methods need to be written in the
provider. If we’re using inline resources (which we are) we need to
declare those so any notifications work. Finally, we should ensure
that why-run works properly.
The actual definition is ~40 lines, and can be viewed in the cookbook repository. I don’t have a comparable LWRP for this, but suffice to say that it would be longer and more complicated than the definition.
Reasonability About Search
Search is one of the killer features of running a Chef Server. Dynamically configuring load balancer configuration, or finding the master database server is simple with a search. Because we often think about the functionality a service provides based on the role it serves, we end up doing searches that look like this:
Then we do something with
splunk_servers, like send it to a
template. What if someone doesn’t like the role name?
Then we have to do something like this:
Then consumers of the cookbook can use whatever server role name they want, and just update the attribute for it. But, the internet has said that roles are bad, so we shouldn’t use them (even though they aren’t ;)). So instead, we need something like one of these queries:
1 2 3
The problem with the first is similar to the problem with the first
role:splunk-server), we need knowledge about the run list in order
to search properly. The problem with the second is that we now have to
worry about constructing a query properly as a string that gets
How can we improve this? I think it is more “Chef-like” to use an
attribute on the server’s node object itself that informs queries the
intention that the node is in fact a Splunk server. In our
chef-splunk cookbook, we use
query looks like this:
This reads clearly, and the
is_server attribute can be set in one of
15 places (for good or bad, but that’s a different post).
Repeating Resources, Composable Recipes
In the past, it was deemed okay to repeat resources across recipes when those recipes were not included on the same node. For example, client and server recipes that have similar resource requirements, but may pass in separate data. Another example is in the haproxy) cookbook I wrote where one recipe statically manages the configuration files, and the other uses a Chef search to populate the configuration.
As I have mentioned above, a lot of code was duplicated between the
client and server recipes for our
splunk42 cookbook: user and group,
the case statements, package resources, execute statements (that
haven’t been shared here), and the service resource. It is definitely
important to ensure that all the resources needed to converge a recipe
are defined, particularly when using notifications. That is why
sometimes a recipe will have a
service resource with no actions like
However Chef 11will generate a warning about cloned resources when they are repeated in the same Chef run.
Why is this bad? Well, CHEF-3694 explains in more detail that particular issue, of cloned resources. The other reason is that it makes recipes harder to reuse when they have a larger scope than absolutely necessary. How can we make this better? A solution to this is to write small, composable recipes that contain resources that may be optional for certain use cases. For example, we can put the service resource in a recipe and include that:
1 2 3 4 5
Then when we need to make sure we have the
available (e.g., for notifications):
1 2 3 4 5 6 7
Note that the service is included after the resource that notifies it. This is a feature of the notification system, where the notified resource can appear anywhere in the resource collection, and brings up another excellent practice, which is to declare service resources after other resources which affect their configuration. This prevents a race condition where, if a bad config is deployed, the service would attempt to start, fail, and cause the Chef run to exit before the config file could correct the problem.
Making recipes composable in this way means that users can pick and
choose the ones they want. Our
chef-splunk cookbook has a
prescriptive default recipe, but the client and server recipes mainly
include the others they need. If someone doesn’t share our opinion on
this for their use case, they can pick and choose the ones they want.
Perhaps they have the
splunk user and group created on systems
through some other means. They won’t need the
recipe, and can write their own wrapper to handle that. Overall this
is good, though it does mean there are multiple places where a user
must look to follow a recipe.
Managing secrets is one of the hardest problems to solve in system
administration and configuration management. In Chef, it is very easy
to simply set attributes, or use data bag items for authentication
credentials. Our old
splunk42 cookbook had this:
node[:splunk][:auth] was set in a role with the
username:password. This isn’t particularly bad since our Chef
server runs on a private network and is secured with HTTPS and RSA
keys, but a defense in depth security posture has more controls in
place for secrets.
How can this be improved? At Chef, we started using
Chef Vault to manage
secrets. I wrote a
post about chef-vault
a few months ago, so I won’t dig too deep into the details here. The
chef-splunk cookbook loads the authentication information
1 2 3 4 5 6 7 8 9 10 11 12 13
The first line loads the authentication information from the encrypted-with-chef-vault data bag item. Then we make a couple of convenient local variables, and change the password from Splunk’s built-in default. Then, we control convergence of the execute by writing a file that indicates that the password has been set.
The advantage of this over attributes or data bag items is that the content is encrypted. The advantage over regular encrypted data bags is that we don’t need to distribute the secret key out to every system, we can update the list of nodes that have access with a knife command.
Neither Chef (the company), nor I are here to tell anyone how to
write cookbooks. One of the benefits of Chef (the product) is its
flexibility, allowing users to write blocks of Ruby code in recipes
that quickly solve an immediate problem. That’s how we got to where we
splunk42, and we certainly have other cookbooks that can
be refactored similarly. When it comes to sharing cookbooks with the
community, well-factored, easy to follow, understand, and use code is
Many of the ideas here came from community members like Miah Johnson, Noah Kantrowitz, Jamie Winsor, and Mike Fiedler. I owe them thanks for challenging me over the years on a lot of the older patterns that I held onto. Together we can build better automation through cookbooks, and a strong collaborative community. I hope this information is helpful to those goals.