jtimberman's Code Blog

Chef, Ops, Ruby, Linux/Unix. Opinions are mine, not my employer's (Chef).

Chef 12: Fix Untrusted Self Sign Certs

Scenario: You’ve started up a brand new Chef Server using version 12, and you have installed Chef 12 on your local system. You log into the Management Console to create a user and organization (or do this with the command-line chef-server-ctl commands), and you’re ready to rock with this knife.rb:

1
2
3
4
5
node_name                'jtimberman'
client_key               'jtimberman.pem'
validation_client_name   'tester-validator'
validation_key           'tester-validator.pem'
chef_server_url          'https://chef-server.example.com/organizations/tester'

However, when you try to check things out with knife:

1
2
3
% knife client list
ERROR: SSL Validation failure connecting to host: chef-server.example.com - SSL_connect returned=1 errno=0 state=SSLv3 read server certificate B: certificate verify failed
ERROR: OpenSSL::SSL::SSLError: SSL_connect returned=1 errno=0 state=SSLv3 read server certificate B: certificate verify failed

This is because Chef client 12 has SSL verification enabled by default for all requests. Since the certificate generated by the Chef Server 12 installation is self-signed, there isn’t a signing CA that can be verified, and this fails. Never fear intrepid user, for you can get the SSL certificate from the server and store it as a “trusted” certificate. To find out how, use knife ssl check.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
Connecting to host chef-server.example.com:443
ERROR: The SSL certificate of chef-server.example.com could not be verified
Certificate issuer data: /C=US/ST=WA/L=Seattle/O=YouCorp/OU=Operations/CN=chef-server.example.com/emailAddress=you@example.com

Configuration Info:

OpenSSL Configuration:
* Version: OpenSSL 1.0.1j 15 Oct 2014
* Certificate file: /opt/chefdk/embedded/ssl/cert.pem
* Certificate directory: /opt/chefdk/embedded/ssl/certs
Chef SSL Configuration:
* ssl_ca_path: nil
* ssl_ca_file: nil
* trusted_certs_dir: "/Users/jtimberman/Downloads/chef-repo/.chef/trusted_certs"

TO FIX THIS ERROR:

If the server you are connecting to uses a self-signed certificate, you must
configure chef to trust that server's certificate.

By default, the certificate is stored in the following location on the host
where your chef-server runs:

  /var/opt/chef-server/nginx/ca/SERVER_HOSTNAME.crt

Copy that file to your trusted_certs_dir (currently: /Users/jtimberman/Downloads/chef-repo/.chef/trusted_certs)
using SSH/SCP or some other secure method, then re-run this command to confirm
that the server's certificate is now trusted.

(note, this chef-server location is incorrect, it’s /var/opt/opscode)

There is a fetch plugin for knife too. Let’s download the certificate to the automatically preconfigured trusted certificate location mentioned in the output above.

1
2
3
4
5
6
7
8
% knife ssl fetch
WARNING: Certificates from chef-server.example.com will be fetched and placed in your trusted_cert
directory (/Users/jtimberman/Downloads/chef-repo/.chef/trusted_certs).

Knife has no means to verify these are the correct certificates. You should
verify the authenticity of these certificates after downloading.

Adding certificate for chef-server.example.com in /Users/jtimberman/Downloads/chef-repo/.chef/trusted_certs/chef-server.example.com.crt

The certificate should be verified that what was downloaded is in fact the same as the certificate on the Chef Server. For example, I compared SHA256 checksums:

1
2
3
4
% ssh ubuntu@chef-server.example.com sudo sha256sum /var/opt/opscode/nginx/ca/chef-server.example.com.crt
043728b55144861ed43a426c67addca357a5889158886aee50685cf1422b5ebf  /var/opt/opscode/nginx/ca/chef-server.example.com.crt
% gsha256sum .chef/trusted_certs/chef-server.example.com.crt
043728b55144861ed43a426c67addca357a5889158886aee50685cf1422b5ebf  .chef/trusted_certs/chef-server.example.com.crt

Now check knife client list again.

1
2
% knife client list
tester-validator

Victory!

Now, we need to get the ceritficate out to every node in the infrastructure in its trusted_certs_dir – by default this is /etc/chef/trusted_certs. The most simple way to do this is to use knife ssh to run knife on the target nodes.

1
2
3
4
5
6
7
8
% knife ssh 'name:*' 'sudo knife ssl fetch -c /etc/chef/client.rb'
node-output.example.com WARNING: Certificates from chef-server-example.com will be fetched and placed in your trusted_cert
node-output.example.com directory (/etc/chef/trusted_certs).
node-output.example.com
node-output.example.com Knife has no means to verify these are the correct certificates. You should
node-output.example.com verify the authenticity of these certificates after downloading.
node-output.example.com
node-output.example.com Adding certificate for chef-server.example.com in /etc/chef/trusted_certs/chef-server.example.com.crt

The output will be interleaved for all the nodes returned by knife ssh. Of course, we should verify the SHA256 checksums like before, which can be done again with knife ssh.

Reflecting on Six Years With Chef

It actually started a bit over seven years ago. I saw the writing on the wall at IBM; my job was soon to be outsourced. I found an open position with the SANS institute, accepted an offer there, and was due to start work in a couple of weeks. Around the same time, my friends Adam Jacob and Nathan Haneysmith had started HJK Solutions. They invited me to join them then, but it wasn’t the right time for me. Adam told me that at SANS I should at least use the automation tools and general infrastructure management model they planned to use. It turned out this was sage advice, for a number of reasons.

Around April, 2008, Adam told me he was working on “Chef,” a Ruby based configuration management and system integration framework. I was excited about its potential, and a few months later on July 2, 2008, I started with HJK Solutions as a Linux system administration consultant. I got familiar with HJK’s puppet-based stack, and ancillary Ruby tools like iClassify while working on their customer infrastructures over the coming months. After Opscode was founded and we released Chef 0.5, my primary focus was porting HJK’s puppet modules to chef cookbooks.

opscode/cookbooks

Adam had started the repository to give new users a place to begin using Chef with full working examples. I continued their development, and had the opportunity to solve hard problems of integration web application stacks with them. There were three important reasons for the repository to exist:

  1. We have a body of knowledge as a tribe, and that can be codified.
  2. Infrastructure as code is real, and it can be reusable.
  3. The best way to learn Chef is to use Chef, and I had a goal to know Chef well enough to teach it to new users and companies.

The development of general purpose cookbooks ends up being harder than any of us really imagined, I think. Every platform is different, so not only did I have to learn Chef, I had to learn how different platforms behave for common (and uncommon) pieces of software in web operations stacks. Over the years of managing these cookbooks, I learned a lot about how the community was developing workflows for using Chef, and how they differed from our opinions. I learned also learned how to manage and contribute to open source projects at a rather large scale, and how to have compassion and empathy for new or frustrated users.

Training and Services

In my time at CHEF, nee Opscode, I’ve had several job role changes. After several months of working on cookbooks, I added package and release management (RIP, apt.opscode.com) to my repertoire. I then switched to technical evangelism and training. With mentorship from John Willis, I drafted the initial version of Chef Fundamentals, and delivered our inaugural training class in Seattle.

I worked with the team John built to deliver training, speak at conferences, and work directly with customers to help make them successful with Chef. Eventually, John left the company to build an awesome team at Enstratius. I took on the role of Director of the team, but eventually I discovered that the management track was not the future of my career.

Open Source and Community

I came back to working on the cookbooks, which I had previously split into separate repositories. I was also working more directly in the community, doing public training classes only (our consulting team did private/onsite classes), participating in our IRC channels and mailing lists. We had some organization churn, and I was moved around between four different managers, eventually reporting to the inimitable Nathen Harvey.

During one of our 1-1 discussions, he said, “You know, Joshua. You write a lot of cookbooks to automate infrastructure. But you haven’t actually worked on any infrastructure in years. You should do something about that.”

Around that time, there was a “senior system administrator” job posting on our very own careers site. I talked to our VP of Operations, and after a brief transition period, moved completely over to the ops team. I was able to bring with me the great practices from the community for developing cookbooks: testing with chefspec and serverspec, code consistency with rubocop and foodcritic, and wrapping it all up with test kitchen.

The Future

I’ve had the privilege to do work that I love, which is automating hard problems using Chef. I’ve also had the privilege of being part of the web operations, infrastructure as code, devops, and Chef communities over the past six years. I’ve been to all four Chef summits, and all three ChefConfs. A thing I’ve noticed over the years is that many conversations keep coming up at the summits and ChefConf. Fresh on my mind because the last summit was so recent is the topic of cookbook reusability. See, during the time that I managed opscode/cookbooks, I eventually saw the point people in the community were making about these being real software repositories that need to be managed like other complex software projects. We split up the repository into individual repositories per cookbook. We started adding test coverage, and conforming to consistency via syntax and style lint checking. That didn’t make cookboks more reusable, but it lowered the barrier of contribution, which in turn made them more reusable as more use cases could be covered. I got to be a part of that evolution, and it’s been awesome.

While using Chef is one of my favorite technical things to do, I have come to the conclusion that based on my experience the best thing I can do is be a facilitator of stronger technical discipline with regard to using Chef. Primarily, this means improving how CHEF uses Chef to build Chef for our community and customers. We’re already really good at using Chef to build Chef (the product), and run Hosted Chef (the service). However, awesome tools from the community such as Test Kitchen, Berkshelf, ChefSpec, and Foodcritic did not exist when we started out. Between new, awesome tools, and growing our organization with new awesome people, we need to improve on getting our team members up to speed on the process and workflow that helps us deliver higher quality products.

That is why I’m moving into a new role at CHEF. The sixth year marks as good a time as any to make a change, and I’m no stranger to that. I’m joining a team of quality advocacy led by Joseph Smith, as part of Jez Humble’s “Office of Continuous Improvement and Velocity.” In this new role, I will focus on improving our overall technical excellence so we can deliver better products to our community and customers, and so we can have awesome use cases and examples for managing Chef Server and its add-ons at scale.

My first short term goal in this new role is a workstation automation cookbook that can be used and extended by our internal teams for ensuring everyone has a consistent set of tools to work on the product. This will be made an open source project that the community can use and extend as well. We’ll have more information about this as it becomes “a thing.”

Next, I want to improve how we work on incidents. We’ve had sporadic blog posts about issues in Hosted Chef and Supermarket, and I’d like to see this get better.

I’m also interested in managing Chef Server 12 clusters, including all the add-ons. Recently I worked on the chef-server-cluster cookbook, which will become the way CHEF deploys and manages Hosted Chef using the version 12 packages. Part of the earliest days of opscode/cookbooks, I maintained cookbooks to setup the open source Chef Server. Long time users may remember the “chef solo bootstrap” stack. Since then, CHEF has continued to iterate on that idea, and the “ctl” management commands largely use chef-solo under the hood. The new cookbook combines and wraps up manual processes and the “ctl” commands to enable us, our community, and our customers to build scalable Chef Server clusters using the omnibus packages. The cookbook uses chef-provisioning to do much of the heavy lifting.

It should be easy for organizations to be successful with Chef. That includes CHEF! My goal in my new position is to fuel the love of Chef internally and externally, whip up awesome, and stir up more delight. I also look forward to seeing what our community and customers do with Chef in their organizations.

Thank you

I’d like to thank the friends and mentors I’ve had along this journey. You’re all important, and we’ve shared some good times and code, and sometimes hugs. It’s been amazing to see so many people become successful with Chef.

Above all, I’d like to thank Adam Jacob: for the opportunity to join in this ride, for inspiration to be a better system administrator and operations professional, for mentorship along the way, and for writing Chef in the first place. Cheers, my friend.

Here’s to many more years of whipping up awesome!

Chef Reporting API and Resource Updates

Have you ever wanted to find a list of nodes that updated a specific resource in a period of time? Such as “show me all the nodes in production that had an application service restart in the last hour”? Or, “which nodes have updated their apt cache recently?” For example,

1
2
3
4
5
6
7
8
% knife report resource 'role:supermarket-app AND chef_environment:supermarket-prod' execute 'apt-get update'
execute[apt-get update] changed in the following runs:
app-supermarket1.example.com 2230cf30-6d95-4e43-be18-211137eaf802 @ 2014-10-07T14:07:03Z
app-supermarket2.example.com c5e4d7bf-95a6-4385-9d8e-c6f5617ed79b @ 2014-10-07T14:14:04Z
app-supermarket3.example.com c4c4b4bb-91b6-4f73-9876-b24b093c7f1e @ 2014-10-07T14:09:54Z
app-supermarket4.example.com 3eb09034-7539-4a3c-af6d-5b01d35bc63f @ 2014-10-07T13:31:56Z
app-supermarket5.example.com aa48c1d3-da91-4031-a43d-582a577cbf2d @ 2014-10-07T13:35:15Z
Use `knife runs show` with the run UUID to get more info

I have released a new knife plugin to do that, but first some background.

At CHEF, we run the community’s cookbook site, Supermarket. We monitor the systems that run the site with Sensu. The current infrastructure runs instances on Amazon Web Services EC2, with an Elastic Load Balancer (ELB) in front of them. As a corrective action for a Supermarket outage, CHEF’s operations team added a new check for elevated HTTP 500 responses from the application servers behind the ELB. One thing we found was that when Supermarket was deployed, and the unicorn server restarted, we would see elevated 500’s, but the site often wouldn’t actually be impacted.

The Sensu check is run from a “relay” node. That is, it isn’t run on the application servers or the Sensu server – it’s run out of band since it’s for the ELB. One might imagine we could have similar checks for other services that aren’t run on “managed nodes,” but that’s neither here nor there. The issue is that we get an alert message that looks like this:

1
Sensu Alerts  ALERT - [i-d1dfd5d9/check-elb-backend-500] - CheckELBMetrics CRITICAL: supermarket-elb; Sum of HTTPCode_Backend_5XX is 2538.0. (expected lower than 30.0); (HTTPCode_Backend_5XX '2538.0' within 300 seconds between 2014-08-19 13:33:36 +0000 to 2014-08-19 13:38:36 +0000) [Playbook].

The first part, [i-d1dfd5d9/check-elb-backend-500] is the node name and the check that alerted. The node name here is the monitoring relay that runs the check, not the actual node or nodes where Supermarket was deployed and restarted. This is where Chef Reporting comes into play. In Chef Reporting, we can view information about recent Chef client runs, which gives us a graph like this.

If we go look at the reports in the Chef Manage console, we can drill down to something like this.

This shows that unicorn was restarted in this run. That’s great, but if I’m getting this alert at a time when I’m not particularly coherent (e.g, 2AM), I want a command in a playbook that I can run to get more information quickly without having to log into the webui and click around imprecisely. CHEF publishes a knife-reporting gem that has a couple handy sub-commands to retrieve this run data. For example, we can list runs.

1
2
3
4
5
6
7
8
9
10
% knife runs list
node_name:  i-3022aa3b
run_id:     9eccd8f6-876b-4a57-87ac-0b3e7b7ef1e7
start_time: 2014-08-21T17:03:56Z
status:     started

node_name:  i-a09424a8
run_id:     f2b7871a-149b-4fd3-abdc-d74a838d719a
start_time: 2014-08-21T17:00:23Z
status:     success

Or, we can display a specific run.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
% knife runs show eecb04fb-11df-438a-8e81-dd610eb66616
run_detail:
  data:
  end_time:          2014-08-20T17:50:12Z
  node_name:         i-9f22aa94
  run_id:            eecb04fb-11df-438a-8e81-dd610eb66616
  run_list:          ["role[base]","role[supermarket-app]"]
  start_time:        2014-08-20T17:45:37Z
  status:            success
  total_res_count:   261
  updated_res_count: 17
run_resources:
  cookbook_name:    supermarket
  cookbook_version: 2.7.2
  duration:         209
  final_state:
    enabled: false
    running: true
  id:               unicorn
  initial_state:
    enabled: false
    running: true
  name:             unicorn
  result:           restart
  type:             service
  uri:              https://api.opscode.com/organizations/supermarket/reports/org/runs/eecb04fb-11df-438a-8e81-dd610eb66616/15

This is handy, but a little limited. What if I want to display only the runs containing the service[unicorn] resource?

That’s where my knife-report-resource plugin helps. At first, it was very much specific to finding unicorn restarts on Supermarket app servers. However, I wanted to make it more general purpose as I think people would want to be able to find when arbitrary resources were updated. This is how it works:

  1. Query the Chef Server for a particular set of nodes. For example, 'role:supermarket-app AND chef_environment:supermarket-prod'.
  2. Get all the Chef client runs for a specified time period up until the current time. By default, it starts from one hour ago, but we can pass an ISO8601 timestamp.
  3. Iterate over all the runs looking for runs by the nodes that were returned by the search query, gathering the specified resource type and name.
  4. Display some nice output with the node’s FQDN, the run’s UUID, and a timestamp.

From the earlier example:

1
2
3
4
5
6
7
8
% knife report resource 'role:supermarket-app AND chef_environment:supermarket-prod' execute 'apt-get update'
execute[apt-get update] changed in the following runs:
app-supermarket1.example.com 2230cf30-6d95-4e43-be18-211137eaf802 @ 2014-10-07T14:07:03Z
app-supermarket2.example.com c5e4d7bf-95a6-4385-9d8e-c6f5617ed79b @ 2014-10-07T14:14:04Z
app-supermarket3.example.com c4c4b4bb-91b6-4f73-9876-b24b093c7f1e @ 2014-10-07T14:09:54Z
app-supermarket4.example.com 3eb09034-7539-4a3c-af6d-5b01d35bc63f @ 2014-10-07T13:31:56Z
app-supermarket5.example.com aa48c1d3-da91-4031-a43d-582a577cbf2d @ 2014-10-07T13:35:15Z
Use `knife runs show` with the run UUID to get more info

Then, we can drill down further into one of these runs with the knife-reporting plugin.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
% knife runs show 2230cf30-6d95-4e43-be18-211137eaf802
run_detail:
  data:
  end_time:          2014-10-07T14:07:03Z
  node_name:         i-d7fed0df
  run_id:            2230cf30-6d95-4e43-be18-211137eaf802
  run_list:          ["role[base]","role[supermarket-app]"]
  start_time:        2014-10-07T14:03:59Z
  status:            success
  total_res_count:   271
  updated_res_count: 12
run_resources:
  cookbook_name:    chef-client
  cookbook_version: 3.6.0
  duration:         99
  final_state:
    enabled: true
    running: false
  id:               chef-client
  initial_state:
    enabled: true
    running: true
  name:             chef-client
  result:           enable
  type:             runit_service
  uri:              https://api.opscode.com/organizations/supermarket/reports/org/runs/2230cf30-6d95-4e43-be18-211137eaf802/0
...
  cookbook_name:    supermarket
  cookbook_version: 2.11.0
  duration:         8506
  final_state:
  id:               apt-get update
  initial_state:
  name:             apt-get update
  result:           run
  type:             execute
  uri:              https://api.opscode.com/organizations/supermarket/reports/org/runs/2230cf30-6d95-4e43-be18-211137eaf802/5

Hopefully you find this plugin useful! It is a RubyGem, and is available on RubyGems.org, and the source is available on GitHub.

Chef::Node.debug_value

Update: As mentioned by Dan DeLeo, he discussed this feature on Chef 11 In-Depth: Attributes Changes last year when Chef 11 was released. I somehow never got a chance to use it, and thought this post would be a helpful example.

Earlier today I was reminded by Steven Danna about a newer feature of Chef called debug_value. This is a method on the node object (Chef::Node) which will show where in Chef’s attribute hierarchy a particular attribute or sub-attribute was set on the node.

Fire up a chef shell in client mode on the node you want to see:

1
chef-shell -z

For example, I’ll use my minecraft server, using the excellent minecraft cookbook.

1
2
chef > node.run_list
 => role[minecraft-server]

The cookbook itself sets a node['minecraft'] attribute hash.

1
2
chef > node['minecraft']
 => {"user"=>"mcserver", "group"=>"mcserver", "install_dir"=>"/srv/minecraft", "install_type"=>"vanilla" ... omg a huge hash of attributes}

Of note are the server properties attributes, which I customize in the role. Here is the node['minecraft']['properties'] attributes hash on my node:

1
2
chef > node['minecraft']['properties']
 => {"allow-flight"=>false, "allow-nether"=>true, "difficulty"=>"1", "enable-query"=>false, "enable-rcon"=>false, "enable-command-block"=>false, "force-gamemode"=>true, "gamemode"=>"0", "generate-structures"=>true, "hardcore"=>false, "level-name"=>"creative-survival", "level-seed"=>"", "level-type"=>"DEFAULT", "max-build-height"=>"256", "max-players"=>"20", "motd"=>"It's the will to survive", "online-mode"=>true, "op-permission-level"=>4, "player-idle-timeout"=>0, "pvp"=>"false", "query.port"=>"25565", "rcon.password"=>"", "rcon.port"=>"25575", "server-ip"=>"", "server-name"=>"Housepub", "server-port"=>"25565", "snooper-enabled"=>"false", "spawn-animals"=>true, "spawn-monsters"=>true, "spawn-npcs"=>true, "spawn-protection"=>1, "texture-pack"=>"", "view-distance"=>10, "white-list"=>false}

And I can see where these were set using the #debug_value method. Each sub-attribute should be passed as an argument.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
chef > pp node.debug_value('minecraft', 'properties')
[["set_unless_enabled?", false],
 ["default",
  {"allow-flight"=>false,
   "allow-nether"=>true,
   "difficulty"=>1,
   "enable-query"=>false,
   "enable-rcon"=>false,
   "enable-command-block"=>false,
   "force-gamemode"=>false,
   "gamemode"=>0,
   "generate-structures"=>true,
   "hardcore"=>false,
   "level-name"=>"world",
   "level-seed"=>"",
   "level-type"=>"DEFAULT",
   "max-build-height"=>"256",
   "max-players"=>"20",
   "motd"=>"A Minecraft Server",
   "online-mode"=>true,
   "op-permission-level"=>4,
   "player-idle-timeout"=>0,
   "pvp"=>true,
   "query.port"=>"25565",
   "rcon.password"=>"",
   "rcon.port"=>"25575",
   "server-ip"=>"",
   "server-name"=>"Unknown Server",
   "server-port"=>"25565",
   "snooper-enabled"=>true,
   "spawn-animals"=>true,
   "spawn-monsters"=>true,
   "spawn-npcs"=>true,
   "spawn-protection"=>16,
   "texture-pack"=>"",
   "view-distance"=>10,
   "white-list"=>false}],
 ["env_default", :not_present],
 ["role_default",
  {"difficulty"=>"1",
   "gamemode"=>"0",
   "force-gamemode"=>true,
   "motd"=>"It's the will to survive",
   "pvp"=>"false",
   "server-name"=>"Housepub",
   "level-name"=>"creative-survival",
   "spawn-protection"=>1,
   "snooper-enabled"=>"false"}],
 ["force_default", :not_present],
 ["normal", :not_present],
 ["override", :not_present],
 ["role_override", :not_present],
 ["env_override", :not_present],
 ["force_override", :not_present],
 ["automatic", :not_present]]

From the role, we can see some properties attributes are set:

1
2
3
4
5
6
7
8
9
10
 ["role_default",
  {"difficulty"=>"1",
   "gamemode"=>"0",
   "force-gamemode"=>true,
   "motd"=>"It's the will to survive",
   "pvp"=>"false",
   "server-name"=>"Housepub",
   "level-name"=>"creative-survival",
   "spawn-protection"=>1,
   "snooper-enabled"=>"false"}],

Note that even though these are also set by default, we get them in the output here too.

Load_current_resource and Chef-shell

This post will illustrate load_current_resource and a basic use of chef-shell.

The chef-shell is an irb-based REPL (read-eval-print-loop). Everything I do is Ruby code, just like in Chef recipes or other cookbook components. I’m going to use a package resource example, so need privileged access (sudo).

1
% sudo chef-shell

The chef-shell program loads its configuration, determines what session type, and displays a banner. In this case, we’re taking all the defaults, which means no special configuration, and a standalone session.

1
2
3
4
5
6
7
8
9
10
11
12
loading configuration: none (standalone session)
Session type: standalone
Loading...done.

This is the chef-shell.
 Chef Version: 11.14.0.rc.2
 http://www.opscode.com/chef
 http://docs.opscode.com/

run `help' for help, `exit' or ^D to quit.

Ohai2u jtimberman@jenkins.int.housepub.org!

To evaluate resources as we’d write them in a recipe, we need to switch to recipe mode.

1
chef > recipe_mode

I can do anything here that I can do in a recipe. I could paste in my own recipes. Here, I’m just going to add a package resource to manage the vim package. Note that this works like the “compile” phase of a chef-client run. The resource will be added to the Chef::ResourceCollection object. We’ll look at this in a little more detail shortly.

1
2
chef:recipe > package "vim"
 => <package[vim] @name: "vim" @noop: nil @before: nil @params: {} @provider: nil @allowed_actions: [:nothing, :install, :upgrade, :remove, :purge, :reconfig] @action: :install @updated: false @updated_by_last_action: false @supports: {} @ignore_failure: false @retries: 0 @retry_delay: 2 @source_line: "(irb#1):1:in `irb_binding'" @guard_interpreter: :default @elapsed_time: 0 @sensitive: false @candidate_version: nil @options: nil @package_name: "vim" @resource_name: :package @response_file: nil @response_file_variables: {} @source: nil @version: nil @timeout: 900 @cookbook_name: nil @recipe_name: nil>

I’m done adding resources/writing code to test, so I’ll initiate a Chef run with the run_chef method (this is a special method in chef-shell).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
chef:recipe > run_chef
[2014-07-21T09:04:51-06:00] INFO: Processing package[vim] action install ((irb#1) line 1)
[2014-07-21T09:04:51-06:00] DEBUG: Chef::Version::Comparable does not know how to parse the platform version: jessie/sid
[2014-07-21T09:04:51-06:00] DEBUG: Chef::Version::Comparable does not know how to parse the platform version: jessie/sid
[2014-07-21T09:04:51-06:00] DEBUG: package[vim] checking package status for vim
vim:
  Installed: 2:7.4.335-1
  Candidate: 2:7.4.335-1
  Version table:
 *** 2:7.4.335-1 0
        500 http://ftp.us.debian.org/debian/ testing/main amd64 Packages
        100 /var/lib/dpkg/status
[2014-07-21T09:04:51-06:00] DEBUG: package[vim] current version is 2:7.4.335-1
[2014-07-21T09:04:51-06:00] DEBUG: package[vim] candidate version is 2:7.4.335-1
[2014-07-21T09:04:51-06:00] DEBUG: package[vim] is already installed - nothing to do

Let’s take a look at what’s happening. Note that we have INFO and DEBUG output. By default, chef-shell runs with Chef::Log#level set to :debug. In a normal Chef Client run with :info output, we see the first line, but not the others. I’ll show each line, and then explain what Chef did.

1
[2014-07-21T09:04:51-06:00] INFO: Processing package[vim] action install ((irb#1) line 1)

There is a timestamp, the resource, package[vim], the action install Chef will take, and the location in the recipe where this was encountered. I didn’t specify one in the resource, that’s the default action for package resources. The irb#1 line 1 just means that it was the first line of the irb in recipe mode.

1
2
[2014-07-21T09:04:51-06:00] DEBUG: Chef::Version::Comparable does not know how to parse the platform version: jessie/sid
[2014-07-21T09:04:51-06:00] DEBUG: Chef::Version::Comparable does not know how to parse the platform version: jessie/sid

Chef chooses the default provider for each resource based on a mapping of platforms and their versions. It uses an internal class, Chef::Version::Comparable to do this. The system I’m using is a Debian “testing” system, which has the codename jessie, but it isn’t a specific release number. Chef knows that for all debian platforms to use the apt package provider, and that’ll do here.

1
2
3
4
5
6
7
8
9
10
[2014-07-21T09:04:51-06:00] DEBUG: package[vim] checking package status for vim
vim:
  Installed: 2:7.4.335-1
  Candidate: 2:7.4.335-1
  Version table:
 *** 2:7.4.335-1 0
        500 http://ftp.us.debian.org/debian/ testing/main amd64 Packages
        100 /var/lib/dpkg/status
[2014-07-21T09:04:51-06:00] DEBUG: package[vim] current version is 2:7.4.335-1
[2014-07-21T09:04:51-06:00] DEBUG: package[vim] candidate version is 2:7.4.335-1

This output is the load_current_resource method implemented in the apt package provider.

The check_package_state method does all the heavy lifting. It runs apt-cache policy and parses the output looking for the version number. If we used the :update action, and the installed version wasn’t the same as the candidate version, Chef would install the candidate version.

Chef resources are convergent. They only get updated if they need to be. In this case, the vim package is installed already (our implicitly specified action), so we see the following line:

1
[2014-07-21T09:04:51-06:00] DEBUG: package[vim] is already installed - nothing to do

Nothing to do, Chef finishes its run.

Modifying Existing Resources

We can manipulate the state of the resources in the resource collection. This isn’t common in most recipes. It’s required for certain kinds of development patterns like “wrapper” cookbooks. As an example, I’m going to modify the resource object so I don’t have to log into the system again and run apt-get remove vim, to show the next section.

First, I’m going to create a local variable in the context of the recipe. This is just like any other variable in Ruby. For its value, I’m going to use the #resources() method to look up a resource in the resource collection.

1
2
chef:recipe > local_package_variable = resources("package[vim]")
 => <package[vim] @name: "vim" @noop: nil @before: nil @params: {} @provider: nil @allowed_actions: [:nothing, :install, :upgrade, :remove, :purge, :reconfig] @action: :install @updated: false @updated_by_last_action: false @supports: {} @ignore_failure: false @retries: 0 @retry_delay: 2 @source_line: "(irb#1):1:in `irb_binding'" @guard_interpreter: :default @elapsed_time: 0.029617095 @sensitive: false @candidate_version: nil @options: nil @package_name: "vim" @resource_name: :package @response_file: nil @response_file_variables: {} @source: nil @version: nil @timeout: 900 @cookbook_name: nil @recipe_name: nil>

The return value is the package resource object:

1
2
chef:recipe > local_package_variable.class
 => Chef::Resource::Package

(#class is a method on the Ruby Object class that returns the class of the object)

To remove the vim package, I use the #run_action method (available to all Chef::Resource subclasses), specifying the :remove action as a symbol:

1
2
3
chef:recipe > local_package_variable.run_action(:remove)
[2014-07-21T09:11:50-06:00] INFO: Processing package[vim] action remove ((irb#1) line 1)
[2014-07-21T09:11:52-06:00] INFO: package[vim] removed

There is no additional debug to display. Chef will run apt-get remove vim to converge the resource with this action.

Load Current Resource Redux

Now that the package has been removed from the system, what happens if we run Chef again? Well, Chef is convergent, and it takes idempotent actions on the system to ensure that the managed resources are in the desired state. That means it will install the vim package.

1
2
chef:recipe > run_chef
[2014-07-21T09:11:57-06:00] INFO: Processing package[vim] action install ((irb#1) line 1)

We’ll see some familiar messages here about the version, then:

1
2
3
4
5
6
7
8
9
[2014-07-21T09:11:57-06:00] DEBUG: package[vim] checking package status for vim
vim:
  Installed: (none)
  Candidate: 2:7.4.335-1
  Version table:
     2:7.4.335-1 0
        500 http://ftp.us.debian.org/debian/ testing/main amd64 Packages
[2014-07-21T09:11:57-06:00] DEBUG: package[vim] current version is nil
[2014-07-21T09:11:57-06:00] DEBUG: package[vim] candidate version is 2:7.4.335-1

This is load_current_resource working as expected. As we can see from the apt-cache policy output, the package is not installed, and as the action to take is :install, Chef will do what we think:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
Reading package lists...
Building dependency tree...
Reading state information...
The following packages were automatically installed and are no longer required:
  g++-4.8 geoclue geoclue-hostip geoclue-localnet geoclue-manual
  geoclue-nominatim gstreamer0.10-plugins-ugly libass4 libblas3gf libcolord1
  libcolorhug1 libgeoclue0 libgnustep-base1.22 libgnutls28 libminiupnpc8
  libpoppler44 libqmi-glib0 libstdc++-4.8-dev python3-ply xulrunner-29
Use 'apt-get autoremove' to remove them.
Suggested packages:
  vim-doc vim-scripts
The following NEW packages will be installed:
  vim
0 upgraded, 1 newly installed, 0 to remove and 28 not upgraded.
Need to get 0 B/905 kB of archives.
After this operation, 2,088 kB of additional disk space will be used.
Selecting previously unselected package vim.
(Reading database ... 220338 files and directories currently installed.)
Preparing to unpack .../vim_2%3a7.4.335-1_amd64.deb ...
Unpacking vim (2:7.4.335-1) ...
Setting up vim (2:7.4.335-1) ...
update-alternatives: using /usr/bin/vim.basic to provide /usr/bin/vim (vim) in auto mode
update-alternatives: using /usr/bin/vim.basic to provide /usr/bin/vimdiff (vimdiff) in auto mode
update-alternatives: using /usr/bin/vim.basic to provide /usr/bin/rvim (rvim) in auto mode
update-alternatives: using /usr/bin/vim.basic to provide /usr/bin/rview (rview) in auto mode
update-alternatives: using /usr/bin/vim.basic to provide /usr/bin/vi (vi) in auto mode
update-alternatives: using /usr/bin/vim.basic to provide /usr/bin/view (view) in auto mode
update-alternatives: using /usr/bin/vim.basic to provide /usr/bin/ex (ex) in auto mode

This should be familiar to anyone that uses Debian/Ubuntu, it’s standard apt-get install output. Of course, this is a development system so I have some cruft, but we’ll ignore that ;).

If we run_chef again, we get the output we saw in the original example in this post:

1
[2014-07-21T09:50:06-06:00] DEBUG: package[vim] is already installed - nothing to do

ChefDK and Ruby

Recently, Chef released ChefDK, the “Chef Development Kit.” This is a self-contained package of everything required to run Chef, work with Chef cookbooks, and includes the best of breed community tools, test frameworks, and other utility programs that are commonly used when working with Chef in infrastructure as code. ChefDK version 0.1.0 was released last week. A new feature mentioned in the README.md is very important, in my opinion.

Using ChefDK as your primary development environment

What does that mean?

It means that if the only reason you have Ruby installed on your local system is to do Chef development or otherwise work with Chef, you no longer have to maintain a separate Ruby installation. That means you won’t need any of these:

  • rbenv
  • rvm
  • chruby (*note)
  • “system ruby” (e.g., OS X’s included /usr/bin/ruby, or the ruby package from your Linux distro)
  • poise ruby)

(*note: You can optionally use chruby with ChefDK if it’s part of your workflow and you have other Rubies installed.)

Do not misunderstand me: These are all extremely good solutions for getting and using Ruby on your system. They definitely have their place if you do other Ruby development, such as web applications. This is especially true if you have to work with multiple versions of Ruby. However, if you’re like me and mainly use Ruby for Chef, then ChefDK has you covered.

In this post, I will describe how I have set up my system with ChefDK, and use its embedded Ruby by default.

Getting Started

Download ChefDK from the downloads page. At the time of this blog post, the available builds are limited to OS X and Linux (Debian/Ubuntu or RHEL), but Chef is working on Windows packages.

For example, here’s what I did on my Ubuntu 14.04 system:

1
2
wget https://opscode-omnibus-packages.s3.amazonaws.com/ubuntu/13.10/x86_64/chefdk_0.1.0-1_amd64.deb
sudo dpkg -i /tmp/chefdk_0.1.0-1_amd64.deb

OS X users will be happy to know that the download is a .DMG, which includes a standard OS X .pkg (complete with developer signing). Simply install it like many other products on OS X.

For either Linux or OS X, in omnibus fashion, the post-installation creates several symbolic links in /usr/bin for tools that are included in ChefDK:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
% ls -l /usr/bin | grep chefdk
lrwxrwxrwx 1 root root 21 Apr 30 22:13 berks -> /opt/chefdk/bin/berks
lrwxrwxrwx 1 root root 20 Apr 30 22:13 chef -> /opt/chefdk/bin/chef
lrwxrwxrwx 1 root root 26 Apr 30 22:13 chef-apply -> /opt/chefdk/bin/chef-apply
lrwxrwxrwx 1 root root 27 Apr 30 22:13 chef-client -> /opt/chefdk/bin/chef-client
lrwxrwxrwx 1 root root 26 Apr 30 22:13 chef-shell -> /opt/chefdk/bin/chef-shell
lrwxrwxrwx 1 root root 25 Apr 30 22:13 chef-solo -> /opt/chefdk/bin/chef-solo
lrwxrwxrwx 1 root root 25 Apr 30 22:13 chef-zero -> /opt/chefdk/bin/chef-zero
lrwxrwxrwx 1 root root 23 Apr 30 22:13 fauxhai -> /opt/chefdk/bin/fauxhai
lrwxrwxrwx 1 root root 26 Apr 30 22:13 foodcritic -> /opt/chefdk/bin/foodcritic
lrwxrwxrwx 1 root root 23 Apr 30 22:13 kitchen -> /opt/chefdk/bin/kitchen
lrwxrwxrwx 1 root root 21 Apr 30 22:13 knife -> /opt/chefdk/bin/knife
lrwxrwxrwx 1 root root 20 Apr 30 22:13 ohai -> /opt/chefdk/bin/ohai
lrwxrwxrwx 1 root root 23 Apr 30 22:13 rubocop -> /opt/chefdk/bin/rubocop
lrwxrwxrwx 1 root root 20 Apr 30 22:13 shef -> /opt/chefdk/bin/shef
lrwxrwxrwx 1 root root 22 Apr 30 22:13 strain -> /opt/chefdk/bin/strain
lrwxrwxrwx 1 root root 24 Apr 30 22:13 strainer -> /opt/chefdk/bin/strainer

These should cover the 80% use case of ChefDK: using the various Chef and Chef Community tools so users can follow their favorite workflow, without shaving the yak of managing a Ruby environment.

But, as I noted, and the thesis of this post, is that one could use this Ruby environment included in ChefDK as their own! So where is that?

ChefDK’s Ruby

Tucked away in every “omnibus” package is a directory of “embedded” software – the things that were required to meet the end goal. In the case of Chef or ChefDK, this is Ruby, openssl, zlib, libpng, and so on. This is a fully contained directory tree, complete with lib, share, and yes indeed, bin.

1
2
% ls /opt/chefdk/embedded/bin
(there's a bunch of commands here, trust me)

Of particular note are /opt/chefdk/embedded/bin/ruby and /opt/chefdk/embedded/bin/gem.

To use ChefDK’s Ruby as default, simply edit the $PATH.

1
export PATH="/opt/chefdk/embedded/bin:${HOME}/.chefdk/gem/ruby/2.1.0/bin:$PATH"

Add that, or its equivalent, to a login shell profile/dotrc file, and rejoice. Here’s what I have now:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
$ which ruby
/opt/chefdk/embedded/bin/ruby
$ which gem
/opt/chefdk/embedded/bin/gem
$ ruby --version
ruby 2.1.1p76 (2014-02-24 revision 45161) [x86_64-linux]
$ gem --version
2.2.1
$ gem env
RubyGems Environment:
  - RUBYGEMS VERSION: 2.2.1
  - RUBY VERSION: 2.1.1 (2014-02-24 patchlevel 76) [x86_64-linux]
  - INSTALLATION DIRECTORY: /opt/chefdk/embedded/lib/ruby/gems/2.1.0
  - RUBY EXECUTABLE: /opt/chefdk/embedded/bin/ruby
  - EXECUTABLE DIRECTORY: /opt/chefdk/embedded/bin
  - SPEC CACHE DIRECTORY: /home/ubuntu/.gem/specs
  - RUBYGEMS PLATFORMS:
    - ruby
    - x86_64-linux
  - GEM PATHS:
     - /opt/chefdk/embedded/lib/ruby/gems/2.1.0
     - /home/ubuntu/.chefdk/gem/ruby/2.1.0
  - GEM CONFIGURATION:
     - :update_sources => true
     - :verbose => true
     - :backtrace => false
     - :bulk_threshold => 1000
     - "install" => "--user"
     - "update" => "--user"
  - REMOTE SOURCES:
     - https://rubygems.org/
  - SHELL PATH:
     - /opt/chefdk/embedded/bin
     - /home/ubuntu/.chefdk/gem/ruby/2.1.0/bin
     - /usr/local/sbin
     - /usr/local/bin
     - /usr/sbin
     - /usr/bin
     - /sbin
     - /bin
     - /usr/games
     - /usr/local/games

Note that this is the current stable release of Ruby, version 2.1.1 patchlevel 76, and the {almost} latest version of RubyGems, version 2.2.1. Also note the Gem paths – the first is the embedded gems path, which is where gems installed by root with the chef gem command will go. The other is in my home directory – ChefDK is set up so that gems can be installed as a non-root user within the ~/.chefdk/gems directory.

Installing Gems

Let’s see this in action. Install a gem using the gem command.

1
2
3
4
5
6
7
$ gem install knife-solve
Fetching: knife-solve-1.0.1.gem (100%)
Successfully installed knife-solve-1.0.1
Parsing documentation for knife-solve-1.0.1
Installing ri documentation for knife-solve-1.0.1
Done installing documentation for knife-solve after 0 seconds
1 gem installed

And as I said, this will be installed in the home directory:

1
2
3
4
5
6
7
$ gem content knife-solve
/home/ubuntu/.chefdk/gem/ruby/2.1.0/gems/knife-solve-1.0.1/LICENSE
/home/ubuntu/.chefdk/gem/ruby/2.1.0/gems/knife-solve-1.0.1/README.md
/home/ubuntu/.chefdk/gem/ruby/2.1.0/gems/knife-solve-1.0.1/Rakefile
/home/ubuntu/.chefdk/gem/ruby/2.1.0/gems/knife-solve-1.0.1/lib/chef/knife/solve.rb
/home/ubuntu/.chefdk/gem/ruby/2.1.0/gems/knife-solve-1.0.1/lib/knife-solve.rb
/home/ubuntu/.chefdk/gem/ruby/2.1.0/gems/knife-solve-1.0.1/lib/knife-solve/version.rb

Using Bundler

ChefDK also includes bundler. As a “non-Chef, Ruby use case”, I installed octopress for this blog.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
% bundle install --path vendor --binstubs
Fetching gem metadata from https://rubygems.org/.......
Fetching additional metadata from https://rubygems.org/..
Installing rake (0.9.6)
Installing RedCloth (4.2.9)
Installing chunky_png (1.2.9)
Installing fast-stemmer (1.0.2)
Installing classifier (1.3.3)
Installing fssm (0.2.10)
Installing sass (3.2.12)
Installing compass (0.12.2)
Installing directory_watcher (1.4.1)
Installing haml (3.1.8)
Installing kramdown (0.14.2)
Installing liquid (2.3.0)
Installing maruku (0.7.0)
Installing posix-spawn (0.3.6)
Installing yajl-ruby (1.1.0)
Installing pygments.rb (0.3.7)
Installing jekyll (0.12.1)
Installing rack (1.5.2)
Installing rack-protection (1.5.0)
Installing rb-fsevent (0.9.3)
Installing rdiscount (2.0.7.3)
Installing rubypants (0.2.0)
Installing sass-globbing (1.0.0)
Installing tilt (1.4.1)
Installing sinatra (1.4.3)
Installing stringex (1.4.0)
Using bundler (1.5.2)
Updating files in vendor/cache
  * classifier-1.3.3.gem
  * fssm-0.2.10.gem
  * sass-3.2.12.gem
  * compass-0.12.2.gem
  * directory_watcher-1.4.1.gem
  * haml-3.1.8.gem
  * kramdown-0.14.2.gem
  * liquid-2.3.0.gem
  * maruku-0.7.0.gem
  * posix-spawn-0.3.6.gem
  * yajl-ruby-1.1.0.gem
  * pygments.rb-0.3.7.gem
  * jekyll-0.12.1.gem
  * rack-1.5.2.gem
  * rack-protection-1.5.0.gem
  * rb-fsevent-0.9.3.gem
  * rdiscount-2.0.7.3.gem
  * rubypants-0.2.0.gem
  * sass-globbing-1.0.0.gem
  * tilt-1.4.1.gem
  * sinatra-1.4.3.gem
  * stringex-1.4.0.gem
Your bundle is complete!
It was installed into ./vendor

Then I can use for example the rake task to preview things while writing this post.

1
2
3
4
5
6
7
$ ./bin/rake preview
Starting to watch source with Jekyll and Compass. Starting Rack on port 4000
directory source/stylesheets/
   create source/stylesheets/screen.css
[2014-05-07 21:46:35] INFO  WEBrick 1.3.1
[2014-05-07 21:46:35] INFO  ruby 2.1.1 (2014-02-24) [x86_64-linux]
[2014-05-07 21:46:35] INFO  WEBrick::HTTPServer#start: pid=10815 port=4000

Conclusion

I’ve used Chef before it was even released. As the project has evolved, and as the Ruby community around it has established new best practices installing and maintaining Ruby development environments, I’ve followed along. I’ve used all the version managers listed above. I’ve spent untold hours getting the right set of gems installed just to have to upgrade everything again and debug my workstation. I’ve written blog posts, wiki pages, and helped countless users do this on their own systems.

Now, we have an all-in-one environment that provides a great solution. Give ChefDK a whirl on your workstation – I think you’ll like it!

Evolution of Cookbook Development

In this post, I will explore some development patterns that I’ve seen (and done!) with Chef cookbooks, and then explain how we can evolve to a new level of cookbook development. The examples here come from Chef’s new chef-splunk cookbook, which is a refactored version of an old splunk42 cookbook. While there is a public splunk cookbook on the Chef community site, it shares some of the issues that I saw with our old one, which are partially subject matter of this post.

Anyway, on to the evolution!

Sub-optimal patterns

These are the general patterns I’m going to address.

  • Composing URLs from multiple local variables or attributes
  • Large conditional logic branches like case statements in recipes
  • Not using definitions when it is best to do so
  • Knowledge of how node run lists are composed for search, or searching for “role:some-server
  • Repeated resources across multiple orthogonal recipes
  • Plaintext secrets in attributes or data bag items

Cookbook development is a wide and varied topic, so there are many other patterns to consider, but these are the ones most relevant to the refactored cookbook.

Composing URLs

It may seem like a good idea, to compose URL strings as attributes or local variables in a recipe based on other attributes and local variables. For example, in our splunk42 cookbook we have this:

1
2
3
4
5
splunk_root = "http://download.splunk.com/releases/"
splunk_version = "4.2.1"
splunk_build = "98164"
splunk_file = "splunkforwarder-#{splunk_version}-#{splunk_build}-linux-2.6-amd64.deb"
os = node['os'].gsub(/\d*/, '')

These get used in the following remote_file resource:

1
2
3
4
remote_file "/opt/#{splunk_file}" do
  source "#{splunk_root}/#{splunk_version}/universalforwarder/#{os}/#{splunk_file}"
  action :create_if_missing
end

We reused the filename variable, and composed the URL to the file to download. Then to upgrade, we can simply modify the splunk_version and splunk_build, as Splunk uses a consistent naming theme for their package URLs (thanks, Splunk!). The filename itself is built from a case statement (more on that in the next section). We could further make the version and build attributes, so users can update to newer versions by simply changing the attribute.

So what is bad about this? Two things.

  1. This is in the splunk42::client recipe, and repeated again in the splunk42::server recipe with only minor differences (the package name, splunk vs splunkforwarder).
  2. Ruby has excellent libraries for manipulating URIs and paths as strings, and it is easier to break up a string than compose a new one.

How can this be improved? First, we can set attributes for the full URL. The actual code for that is below, but suffice to say, it will look like this (note the version is different because the new cookbook installs a new Splunk version).

1
default['splunk']['forwarder']['url'] = 'http://download.splunk.com/releases/6.0.1/universalforwarder/linux/splunkforwarder-6.0.1-189883-linux-2.6-amd64.deb'

Second, we have helper libraries distributed with the cookbook that break up the URI so we can return just the package filename.

1
2
3
4
5
def splunk_file(uri)
  require 'pathname'
  require 'uri'
  Pathname.new(URI.parse(uri).path).basename.to_s
end

The previous remote_file resource is rewritten like this:

1
2
3
4
remote_file "/opt/#{splunk_file(node['splunk']['forwarder']['url'])}" do
  source node['splunk']['forwarder']['url']
  action :create_if_missing
end

As a bonus, the helper methods are available in other places like other cookbooks and recipes, rather than the local scope of local variables.

Conditional Logic Branches

One of the wonderful things about Chef is that simple Ruby conditionals can be used in recipes to selectively set values for resource attributes, define resources that should be used, and other decisions. One of the horrible things about Chef is that simple Ruby conditionals can be used in recipes and often end up being far more complicated than originally intended, especially when handling multiple platforms and versions.

In the earlier example, we had a splunk_file local variable set in a recipe. I mentioned it was built from a case statement, which looks like this, in full:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
splunk_file = case node['platform_family']
  when "rhel"
    if node['kernel']['machine'] == "x86_64"
      splunk_file = "splunkforwarder-#{splunk_version}-#{splunk_build}-linux-2.6-x86_64.rpm"
    else
      splunk_file = "splunkforwarder-#{splunk_version}-#{splunk_build}.i386.rpm"
    end
  when "debian"
    if node['kernel']['machine'] == "x86_64"
      splunk_file = "splunkforwarder-#{splunk_version}-#{splunk_build}-linux-2.6-amd64.deb"
    else
      splunk_file = "splunkforwarder-#{splunk_version}-#{splunk_build}-linux-2.6-intel.deb"
    end
  when "omnios"
    splunk_file = "splunkforwarder-#{splunk_version}-#{splunk_build}-solaris-10-intel.pkg.Z"
  end

Splunk itself supports many platforms, and not all of them are covered by this conditional, so it’s easy to imagine how this can get further out of control and make the recipe even harder to follow. Also consider that this is just the client portion for the splunkforwarder package, this same block is repeated in the server recipe, for the splunk package.

So why is this bad? There are three reasons.

  1. We have a large block of conditionals that sit in front of a user reading a recipe.
  2. This logic isn’t reusable elsewhere, so it has to be duplicated in the other recipe.
  3. This is only the logic for the package filename, but we care about the entire URL. I’ve also covered that composing URLs isn’t delightful.

What is a better approach? Use the full URL as I mentioned before, and set it as an attribute. We will still have the gnarly case statement, but it will be tucked away in the attributes/default.rb file, and hidden from anyone reading the recipe (which is the thing they probably care most about reading).

1
2
3
4
5
6
7
8
9
10
11
case node['platform_family']
when 'rhel'
  if node['kernel']['machine'] == 'x86_64'
    default['splunk']['forwarder']['url'] = 'http://download.splunk.com/releases/6.0.1/universalforwarder/linux/splunkforwarder-6.0.1-189883-linux-2.6-x86_64.rpm'
    default['splunk']['server']['url'] = 'http://download.splunk.com/releases/6.0.1/splunk/linux/splunk-6.0.1-189883-linux-2.6-x86_64.rpm'
  else
    default['splunk']['forwarder']['url'] = 'http://download.splunk.com/releases/6.0.1/universalforwarder/linux/splunkforwarder-6.0.1-189883.i386.rpm'
    default['splunk']['server']['url'] = 'http://download.splunk.com/releases/6.0.1/splunk/linux/splunk-6.0.1-189883.i386.rpm'
  end
when 'debian'
  # ...

The the complete case block can be viewed in the repository. Also, since this is an attribute, consumers of this cookbook can set the URL to whatever they want, including a local HTTP server.

Another example of gnarly conditional logic looks like this, also from the splunk42::client recipe.

1
2
3
4
5
6
7
8
9
10
11
12
13
case node['platform_family']
when "rhel"
  rpm_package "/opt/#{splunk_file}" do
    source "/opt/#{splunk_file}"
  end
when "debian"
  dpkg_package "/opt/#{splunk_file}" do
    source "/opt/#{splunk_file}"
  end
when "omnios"
  # tl;dr, this was more lines than you want to read, and
  # will be covered in the next section.
end

Why is this bad? After all, we’re selecting the proper package resource to install from a local file on disk. The main issue is the conditional creates different resources that can’t be looked up in the resource collection. Our recipe doesn’t do this, but perhaps a wrapper cookbook would. The consumer wrapping the cookbook has to duplicate this logic in their own. Instead, it is better to select the provider for a single package resource.

1
2
3
4
5
6
7
8
9
10
package "/opt/#{splunk_file(node['splunk']['forwarder']['url'])}" do
  case node['platform_family']
  when 'rhel'
    provider Chef::Provider::Package::Rpm
  when 'debian'
    provider Chef::Provider::Package::Dpkg
  when 'omnios'
    provider Chef::Provider::Package::Solaris
  end
end

Definitions Aren’t Bad

Definitions are simply defined as recipe “macros.” They are not actually Chef Resources themselves, they just look like them, and contain their own Chef resources. This has some disadvantages, such as lack of metaparameters (like action), which has lead people to prefer using the “Lightweight Resource/Provider” (LWRP) DSL instead. In fact, some feel that definitions are bad, and that one should feel bad for using them. I argue that they have their place. One advantage is their relative simplicity.

In our splunk42 cookbook, the client and server recipes duplicate a lot of logic. As mentioned a lot of this is case statements for the Splunk package file. They also repeat the same logic for choosing the provider to install the package. I snipped the content from the when "omnios" block, but it looks like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
cache_dir = Chef::Config[:file_cache_path]
splunk_pkg = splunk_file.gsub(/\.Z/, '')

execute "uncompress /opt/#{splunk_file}" do
  not_if { ::File.exists?(splunk_cmd) }
end

cookbook_file "#{cache_dir}/splunk-nocheck" do
  source "splunk-nocheck"
end

file "#{cache_dir}/splunkforwarder-response" do
  content "BASEDIR=/opt"
end

pkgopts = ["-a #{cache_dir}/splunk-nocheck",
           "-r #{cache_dir}/splunkforwarder-response"]

package "splunkforwarder" do
  source "/opt/#{splunk_pkg}"
  options pkgopts.join(' ')
  provider Chef::Provider::Package::Solaris
end

(Note: the logic for setting the provider is required since we’re not using the default over-the-network package providers, and installing from a local file on the system.)

This isn’t too bad on its own, but needs to be repeated again in the server recipe if one wanted to run a Splunk server on OmniOS. The actual differences between the client and server package installation are the package name, splunkforwarder vs splunk. The earlier URL attribute example established a forwarder and server attribute. Using a definition, named splunk_installer, allows us to simplify the package installation used by the client and server recipes to look like this:

1
2
3
4
5
6
splunk_installer 'splunkforwarder' do
  url node['splunk']['forwarder']['url']
end
splunk_installer 'splunk' do
  url node['splunk']['server']['url']
end

How is this better than an LWRP? Simply that there was less ceremony in creating it. There is less cognitive load for a cookbook developer to worry about. Definitions by their very nature of containing resources are already idempotent and convergent with no additional effort. They also automatically support why-run mode, whereas in an LWRP that must be done by the developer. Finally, between resources in the definition and the rest of the Chef run, notifications may be sent.

Contrast this to an LWRP, we need resources and providers directories, and the attributes of the resource need to be defined in the resource. Then the action methods need to be written in the provider. If we’re using inline resources (which we are) we need to declare those so any notifications work. Finally, we should ensure that why-run works properly.

The actual definition is ~40 lines, and can be viewed in the cookbook repository. I don’t have a comparable LWRP for this, but suffice to say that it would be longer and more complicated than the definition.

Reasonability About Search

Search is one of the killer features of running a Chef Server. Dynamically configuring load balancer configuration, or finding the master database server is simple with a search. Because we often think about the functionality a service provides based on the role it serves, we end up doing searches that look like this:

1
splunk_servers = search(:node, "role:splunk-server")

Then we do something with splunk_servers, like send it to a template. What if someone doesn’t like the role name? Then we have to do something like this:

1
splunk_servers = search(:node, "role:#{node['splunk']['server_role']}")

Then consumers of the cookbook can use whatever server role name they want, and just update the attribute for it. But, the internet has said that roles are bad, so we shouldn’t use them (even though they aren’t ;)). So instead, we need something like one of these queries:

1
2
3
splunk_servers = search(:node, "recipes:splunk42\:\:server")
#or
splunk_servers = search(:node, "#{node['splunk']['server_search_query']}")

The problem with the first is similar to the problem with the first (role:splunk-server), we need knowledge about the run list in order to search properly. The problem with the second is that we now have to worry about constructing a query properly as a string that gets interpolated correctly.

How can we improve this? I think it is more “Chef-like” to use an attribute on the server’s node object itself that informs queries the intention that the node is in fact a Splunk server. In our chef-splunk cookbook, we use node['splunk']['is_server']. The query looks like this:

1
splunk_servers = search(:node, "splunk_is_server:true")

This reads clearly, and the is_server attribute can be set in one of 15 places (for good or bad, but that’s a different post).

Repeating Resources, Composable Recipes

In the past, it was deemed okay to repeat resources across recipes when those recipes were not included on the same node. For example, client and server recipes that have similar resource requirements, but may pass in separate data. Another example is in the haproxy) cookbook I wrote where one recipe statically manages the configuration files, and the other uses a Chef search to populate the configuration.

As I have mentioned above, a lot of code was duplicated between the client and server recipes for our splunk42 cookbook: user and group, the case statements, package resources, execute statements (that haven’t been shared here), and the service resource. It is definitely important to ensure that all the resources needed to converge a recipe are defined, particularly when using notifications. That is why sometimes a recipe will have a service resource with no actions like this:

1
service 'mything'

However Chef 11will generate a warning about cloned resources when they are repeated in the same Chef run.

Why is this bad? Well, CHEF-3694 explains in more detail that particular issue, of cloned resources. The other reason is that it makes recipes harder to reuse when they have a larger scope than absolutely necessary. How can we make this better? A solution to this is to write small, composable recipes that contain resources that may be optional for certain use cases. For example, we can put the service resource in a recipe and include that:

1
2
3
4
5
service 'splunk' do
  supports :status => true, :restart => true
  provider Chef::Provider::Service::Init
  action :start
end

Then when we need to make sure we have the service resource available (e.g., for notifications):

1
2
3
4
5
6
7
template "#{splunk_dir}/etc/system/local/outputs.conf" do
  source 'outputs.conf.erb'
  mode 0644
  variables :splunk_servers => splunk_servers
  notifies :restart, 'service[splunk]'
end
include_recipe 'chef-splunk::service'

Note that the service is included after the resource that notifies it. This is a feature of the notification system, where the notified resource can appear anywhere in the resource collection, and brings up another excellent practice, which is to declare service resources after other resources which affect their configuration. This prevents a race condition where, if a bad config is deployed, the service would attempt to start, fail, and cause the Chef run to exit before the config file could correct the problem.

Making recipes composable in this way means that users can pick and choose the ones they want. Our chef-splunk cookbook has a prescriptive default recipe, but the client and server recipes mainly include the others they need. If someone doesn’t share our opinion on this for their use case, they can pick and choose the ones they want. Perhaps they have the splunk user and group created on systems through some other means. They won’t need the chef-splunk::user recipe, and can write their own wrapper to handle that. Overall this is good, though it does mean there are multiple places where a user must look to follow a recipe.

Plaintext Secrets

Managing secrets is one of the hardest problems to solve in system administration and configuration management. In Chef, it is very easy to simply set attributes, or use data bag items for authentication credentials. Our old splunk42 cookbook had this:

1
splunk_password = node[:splunk][:auth].split(':')[1]

Where node[:splunk][:auth] was set in a role with the username:password. This isn’t particularly bad since our Chef server runs on a private network and is secured with HTTPS and RSA keys, but a defense in depth security posture has more controls in place for secrets.

How can this be improved? At Chef, we started using Chef Vault to manage secrets. I wrote a post about chef-vault a few months ago, so I won’t dig too deep into the details here. The current chef-splunk cookbook loads the authentication information like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
splunk_auth_info = chef_vault_item(:vault, "splunk_#{node.chef_environment}")['auth']
user, pw = splunk_auth_info.split(':')

execute "#{splunk_cmd} edit user #{user} -password '#{pw}' -role admin -auth admin:changeme" do
  not_if { ::File.exists?("#{splunk_dir}/etc/.setup_#{user}_password") }
end

file "#{splunk_dir}/etc/.setup_#{user}_password" do
  content 'true\n'
  owner 'root'
  group 'root'
  mode 00600
end

The first line loads the authentication information from the encrypted-with-chef-vault data bag item. Then we make a couple of convenient local variables, and change the password from Splunk’s built-in default. Then, we control convergence of the execute by writing a file that indicates that the password has been set.

The advantage of this over attributes or data bag items is that the content is encrypted. The advantage over regular encrypted data bags is that we don’t need to distribute the secret key out to every system, we can update the list of nodes that have access with a knife command.

Conclusion

Neither Chef (the company), nor I are here to tell anyone how to write cookbooks. One of the benefits of Chef (the product) is its flexibility, allowing users to write blocks of Ruby code in recipes that quickly solve an immediate problem. That’s how we got to where we were with splunk42, and we certainly have other cookbooks that can be refactored similarly. When it comes to sharing cookbooks with the community, well-factored, easy to follow, understand, and use code is preferred.

Many of the ideas here came from community members like Miah Johnson, Noah Kantrowitz, Jamie Winsor, and Mike Fiedler. I owe them thanks for challenging me over the years on a lot of the older patterns that I held onto. Together we can build better automation through cookbooks, and a strong collaborative community. I hope this information is helpful to those goals.

Managing Multiple AWS Account Credentials

UPDATE: All non-default profiles must have their profile name start with “profile.” Below, this is “profile nondefault.” The ruby code is updated to reflect this.

In this post, I will describe my local setup for using the AWS CLI, the AWS Ruby SDK, and of course the Knife EC2 plugin.

The general practice I’ve used is to set the appropriate shell environment variables that are used by default by these tools (and the “legacy” ec2-api-tools, the java-based CLI). Over time and between tools, there have been several environment variables set:

1
2
3
4
5
6
7
8
AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY
AWS_DEFAULT_REGION
AWS_SSH_KEY
AMAZON_ACCESS_KEY_ID
AMAZON_SECRET_ACCESS_KEY
AWS_ACCESS_KEY
AWS_SECRET_KEY

There is now a config file (ini-flavored) that can be used to set credentials, ~/.aws/config. Each ini section in this file is a different account’s credentials. For example:

1
2
3
4
5
6
7
8
[default]
aws_access_key_id=MY_DEFAULT_KEY
aws_secret_access_key=MY_DEFAULT_SECRET
region=us-east-1
[profile nondefault]
aws_access_key_id=NOT_MY_DEFAULT_KEY
aws_secret_access_key=NOT_MY_DEFAULT_SECRET
region=us-east-1

I have two accounts listed here. Obviously, the actual keys are not listed :). I source a shell script that sets the environment variables with these values. Before, I maintained a separate script for each account. Now, I install the inifile RubyGem and use a one-liner for each of the keys.

1
2
3
4
export AWS_ACCESS_KEY_ID=`ruby -rinifile -e "puts IniFile.load(File.join(File.expand_path('~'), '.aws', 'config'))['default']['aws_access_key_id']"`
export AWS_SECRET_ACCESS_KEY=`ruby -rinifile -e "puts IniFile.load(File.join(File.expand_path('~'), '.aws', 'config'))['default']['aws_secret_access_key']"`
export AWS_DEFAULT_REGION="us-east-1"
export AWS_SSH_KEY='jtimberman'

This will load the specified file, ~/.aws/config with the IniFile.load method, retrieving the default section’s aws_access_key_id value. Then repeat the same for the aws_secret_access_key.

To use the nondefault profile:

1
2
export AWS_ACCESS_KEY_ID=`ruby -rinifile -e "puts IniFile.load(File.join(File.expand_path('~'), '.aws', 'config'))['profile nondefault']['aws_access_key_id']"`
export AWS_SECRET_ACCESS_KEY=`ruby -rinifile -e "puts IniFile.load(File.join(File.expand_path('~'), '.aws', 'config'))['profile nondefault']['aws_secret_access_key']"`

Note that this uses ['profile nondefault'].

Since different tools historically have used slightly different environment variables, I export those too:

1
2
3
4
export AMAZON_ACCESS_KEY_ID=$AWS_ACCESS_KEY_ID
export AMAZON_SECRET_ACCESS_KEY=$AWS_SECRET_ACCESS_KEY
export AWS_ACCESS_KEY=$AWS_ACCESS_KEY_ID
export AWS_SECRET_KEY=$AWS_SECRET_ACCESS_KEY

I create a separate config script for each account.

The AWS CLI tool will automatically use the ~/.aws/config, and can load different profiles with the --profile option. The aws-sdk Ruby library will use the environment variables, however. So authentication in a Ruby script is automatically set up.

1
2
require 'aws-sdk'
iam = AWS::IAM.new

Without this, it would be:

1
2
3
require 'aws-sdk'
iam = AWS::IAM.new(:access_key_id => 'YOUR_ACCESS_KEY_ID',
                   :secret_access_key => 'YOUR_SECRET_ACCESS_KEY')

Which is a little ornerous.

To use this with knife-ec2, I have the following in my .chef/knife.rb:

1
2
knife[:aws_access_key_id]      = ENV['AWS_ACCESS_KEY_ID']
knife[:aws_secret_access_key]  = ENV['AWS_SECRET_ACCESS_KEY']

Naturally, since knife.rb is Ruby, I could use Inifile.load there, but I only started using that library recently, and I have my knife configuration setup already.

Preview Chef Client Local Mode

Opscode Developer John Keiser mentioned that a feature for Chef Zero he’s been working on, “local mode,” is now in Chef’s master branch. This means it should be in the next release (11.8). I took the liberty to check this unreleased feature out.

Let’s just say, it’s super awesome and John has done some amazing work here.

PREVIEW

This is a preview of an unreleased feature in Chef. All standard disclaimers apply :).

Install

This is in the master branch of Chef, not released as a gem yet. You’ll need to get the source and build a gem locally. This totally assumes you’ve installed a sane ruby and bundler on your system.

1
2
3
4
5
git clone git://github.com/opscode/chef.git
cd chef
bundle install
bundle exec rake gem
gem install  pkg/chef-11.8.0.alpha.0.gem

Note Alpha!

Setup

Next, point it at a local repository. I’ll use a simple example.

1
2
3
4
git clone git://github.com/opscode/chef-repo.git
cd chef-repo
knife cookbook create zero -o ./cookbooks
vi cookbooks/zero/recipes/default.rb

I created a fairly trivial example recipe to show that this will support search, and data bag items:

1
2
3
4
5
6
7
8
9
10
a = search(:node, "*:*")
b = data_bag_item("zero", "fluff")

file "/tmp/zerofiles" do
  content a[0].to_s
end

file "/tmp/fluff" do
  content b.to_s
end

This simply searches for all nodes, and uses the content of the first node (the one we’re running on presumably) for a file in /tmp. It also loads a data bag item (which I created) and uses it for the content of another file in /tmp.

1
2
mkdir -p data_bags/zero
vi data_bags/zero/fluff.json

The data bag item:

1
2
3
4
{
  "id": "fluff",
  "clouds": "Are fluffy"
}

Converge!

Now, converge the node:

1
chef-client -z -o zero

The -z, or --local-mode argument is the magic that sets up Chef Zero, and loads all the contents of the repository. The -o zero tells Chef to use a one time run list of the “zero” recipe.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
[2013-10-10T23:53:32-06:00] WARN: No config file found or specified on command line, not loading.
Starting Chef Client, version 11.8.0.alpha.0
[2013-10-10T23:53:36-06:00] WARN: Run List override has been provided.
[2013-10-10T23:53:36-06:00] WARN: Original Run List: [recipe[zero]]
[2013-10-10T23:53:36-06:00] WARN: Overridden Run List: [recipe[zero]]
resolving cookbooks for run list: ["zero"]
Synchronizing Cookbooks:
  - zero
Compiling Cookbooks...
Converging 2 resources
Recipe: zero::default
  * file[/tmp/zerofiles] action create
    - create new file /tmp/zerofiles
    - update content in file /tmp/zerofiles from none to 0a038a
        --- /tmp/zerofiles      2013-10-10 23:53:36.368059768 -0600
        +++ /tmp/.zerofiles20131010-6903-10cvytu        2013-10-10 23:53:36.368059768 -0600
        @@ -1 +1,2 @@
        +node[jenkins.int.housepub.org]
  * file[/tmp/fluff] action create
    - create new file /tmp/fluff
    - update content in file /tmp/fluff from none to d46bab
        --- /tmp/fluff  2013-10-10 23:53:36.372059683 -0600
        +++ /tmp/.fluff20131010-6903-1l3i1h     2013-10-10 23:53:36.372059683 -0600
        @@ -1 +1,2 @@
        +data_bag_item[fluff]
Chef Client finished, 2 resources updated

The diff output from each of the file resources shows that the content does in fact come from the search (a node object was returned) and a data bag item (a data bag item object was returned).

What’s Next?

Since this is a feature of Chef, it will be documented and released, so look for that in the next version of Chef.

I can see this used for testing purposes, especially for recipes that make use of combinations of data bags and search, such as Opscode’s nagios cookbook.

Questions

  • Does it work with Berkshelf?

I don’t know. Probably not (yet).

  • Does it work with Test Kitchen?

I don’t know. Probalby not (yet). Provisioners in test-kitchen would need to be (re)written.

  • Should I use this in production?

This is an unreleased feature in the master branch. What do you think? :)

  • When will this be released?

I don’t know the schedule for 11.8.0. Soon?

  • Where do I find out more, or get involved?

Join #chef-hacking in irc.freenode.net, the chef-dev mailing list, or attend the Chef Community Summit (November 12-13, 2013 in Seattle).

Switching MyOpenID to Google OpenID

You may be aware that MyOpenID is shutting down in February 2014.

The next best thing to use IMO, is Google’s OpenID, since they have 2-factor authentication. Google doesn’t really expose the OpenID URL in a way that makes it as easy to use as “username.myopenid.com.” Fortunately, it’s relatively simple to add to a custom domain hosted by, for example, GitHub pages. My coworker, Stephen Delano, pointed me to this pro-tip.

The requirement is to put a <link> tag in the HTML header of the site. It should look like this:

1
2
<link rel="openid2.provider" href="https://www.google.com/accounts/o8/ud?source=profiles" />
<link rel="openid2.local_id" href="http://www.google.com/profiles/A_UNIQUE_GOOGLE_PROFILE_ID />

Obviously you need a Google Profile, but anyone interested in doing this probably has a Google+ account for Google Hangouts anyway :).

If you’re like me and have your custom domain hosted as an Octopress blog, this goes in source/_includes/custom/head.html. Then deploy the site and in a few moments you’ll be able to start using your site as an OpenID.