Ansible Best Practices

Best Practices and Style Guide for Ansible Projects

Best Practices

This document aims to gather good and best practices from Ansible practitioners and experience from multiple Ansible projects. It strives to give all Ansible users a guideline from which to start their automation journey in good conditions.

Ansible is simple, flexible, and powerful. Like any powerful tool, there are many ways to use it, some better than others.

Those are opinionated guidelines based on the experience of many projects. They are not meant to be followed blindly if they don’t fit the reader’s specific use case or needs. Take them as an inspiration and adjust them to your needs, still let us know your good and best practices, we all can learn.

Mindset
Ansible
Ansible Development
Ansible Automation Platform

Searching for something specific? Use the Search at the top!

Versioning

This guide is updated constantly, last update on May 12, 2025.

Mindset

The Zen of Ansible

Your Ansible automation content doesn’t necessarily have to follow this guidance, but they’re good ideas to keep in mind. These aphorisms are opinions that can be debated and sometimes can be contradictory. What matters is that they communicate a mindset for getting the most from Ansible and your automation.

20 aphorisms for Ansible

Ansible is not Python.
YAML sucks for coding.
Playbooks are not for programming.
Ansible users are (most likely) not programmers.
Clear is better than cluttered.
Concise is better than verbose.
Simple is better than complex.
Readability counts.
Helping users get things done matters most.
User experience beats ideological purity.
“Magic” conquers the manual.
When giving users options, use convention over configuration.
Declarative is better than imperative – most of the time.
Focus avoids complexity.
Complexity kills productivity.
If the implementation is hard to explain, it's a bad idea.
Every shell command and UI interaction is an opportunity to automate.
Just because something works, doesn’t mean it can’t be improved.
Friction should be eliminated whenever possible.
Automation is a journey that never ends.

Let me take you deeper into each of the aphorisms and explain what they mean to your automation practice.

Ansible is not Python

YAML sucks for coding. Playbooks are not for programming. Ansible users are (most probably) not programmers.

These aphorisms are at the heart of why applying guidelines for a programming language to good Ansible automation content didn’t seem right to me. As I said, it would give the wrong impression and would reinforce a mindset we don't recommend – that Ansible is a programming language for coding your playbooks.

These aphorisms are all saying the same thing in different ways – certainly the first 3. If you're trying to "write code" in your plays and roles, you're setting yourself up for failure. Ansible’s YAML-based playbooks were never meant to be for programming.

So it bothers me when I see Python-isms bleeding into what Ansible users see and do. It may be natural and make sense if you write code in Python, but most Ansible users are not Pythonistas. So, it can be challenging and confusing when these isms are incorporated, thereby introducing friction that degrades their user experience and the value that Ansible provides.

By Ansible not being a programming language, all parts of your organization can contribute to automating your entire IT stack rather than relying on skill programmers to understand your operations to write and maintain code for it.

If you are a programmer creating Ansible modules and plugins, assume you are not the target audience for what you are developing and your target audience won’t have the same skills and resources you possess.

Clear, Concise, Simple

Clear is better than cluttered. Concise is better than verbose. Simple is better than complex. Readability counts.

These are really just interpretations of aphorisms in “The Zen of Python”. The last one is taken directly from it because you can’t improve on perfection.

In the original Ansible best practices talk, we recommended users optimize for readability. This holds true even more so today. If done properly, your content can be the documentation of your workflow automation. Take the time to make your automation as clear and concise as possible. Iterate over what you create and always look for opportunities to simplify and clarify.

These aphorisms don’t just apply to those writing playbooks and creating roles. If you are a module developer, think about how your work can assist users, be clear and concise, do things simply and just get things done.

Helping users

Helping users get things done matters most. User experience beats ideological purity.

Whether you are creating modules, plugins and collections or writing playbooks or designing a cross domain hybrid automation workflow – Ansible is for helping you get things done. Always consider and look to maximize the user experience. Don’t get caught up and beholden to some strict interpretation of standards or ideological purity that shifts the burden on the end user.

It's a kind of Magic

“Magic” conquers the manual Arthur C. Clarke wrote, “Any sufficiently advanced technology is indistinguishable from magic.”

The “magic” in Ansible is its playbook engine and module system. It is how Ansible provides powerful and flexible capabilities in a straightforward and accessible way by abstracting users from all of the complex implementation details that lie beneath. This frees users from doing time consuming and error prone manual operations or writing brittle one-off scripts and code, enabling them the time to put their valuable expertise to use where it is needed.

Design automation that amazes users can make difficult or tedious tasks easy and almost effortless. Look to provide powerful time saving capabilities that are quick to deploy and utilize them to get things done.

Convention over configuration

When giving users options, use convention over configuration.

I am a big proponent of convention over configuration and don’t think it gets enough consideration in the Ansible community. Convention over configuration is a design paradigm that attempts to decrease the number of decisions that a developer is required to make without necessarily losing flexibility so they don't have to repeat themselves. It was popularized by Ruby on Rails.

A playbook developer utilizing your work should only need to specify unique and unconventional aspects of their automation tasks and workflows and no more. Look to reduce the number of decisions and implementation details a user needs to make. Take the time to handle the most common use cases for them. Look to provide as many sensible defaults with modules, plugins and roles as possible. Optimize for users to get things done quickly.

Declarative

Declarative is better than imperative – most of the time.

This aphorism is particularly for Ansible Content Collection developers. Ansible is a desired state engine by design. Think declaratively first. If there truly is no way to design something declaratively, then use imperative (procedural) means.

Declarative means that configuration is guaranteed by a set of facts instead of by a set of instructions, for example, “there should be 10 RHEL servers”, rather than “depending on how many RHEL servers are running, start/stop servers until you have 10, and tell me if it worked or not”.

This aphorism is an example of the “user experience beats ideological purity” aphorism in practice. Rather than strictly adhering to a declarative approach to automation, Ansible incorporates declarative and imperative means. This mix offers you the flexibility to focus on what you need to do, rather than strictly adhere to one paradigm.

Avoid complexity

Focus avoids complexity. Complexity kills productivity.

Remember that complexity kills productivity. The Ansible team at Red Hat really means it and believes that. That's not just a marketing slogan. Automation can crush complexity and give you the one thing you can’t get enough of ⎯ time.

Follow Linux principles of doing one thing, and one thing well. Keep roles and playbooks focused on a specific purpose. Multiple simple ones are better than having a huge single playbook full of conditionals and “programming” that Ansible is not well suited for.

We strive to reduce complexity in how we've designed Ansible and encourage you to do the same. Strive for simplification in what you automate.

Hard to explain !?

If the implementation is hard to explain, it's a bad idea.

This aphorism, like “readability counts”, is also taken directly from “The Zen of Python” because you cannot improve upon perfection.

In his essay on Literate Programming, Charles Knuth wrote, “Instead of imagining that our main task is to instruct a computer what to do, let us concentrate rather on explaining to human beings what we want a computer to do.” So it goes that if you cannot explain or document your implementation easily, then it’s a bad idea that needs to be rethought or scrapped. If it is hard to explain, what chance do others have of understanding it, using it and debugging it? Kernighan’s Law says “Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it.”

Ansible is designed for how real people think and work. Recall earlier when I said Ansible Playbooks are human readable automation with no special coding skills needed. Take advantage of that. Then, if you are having trouble explaining what you are trying to do, pause and re-consider your implementation and the process you are trying to automate. How can I make it easier to explain? Can my process be improved or streamlined? How can I simplify and clarify? Can I break it down into smaller more focused parts and iterate over this?

This will help you identify a bad idea sooner and avoid the types of friction that will slow down you and your organization over time.

Opportunity to automate!

Every shell command and UI interaction is an opportunity to automate.

This aphorism comes from my personal experience talking about Ansible and automation for many years. Sometimes I am asked what they should automate. Other times, I am challenged that an automation tool like Ansible is unnecessary or does not apply to what they are doing. No matter if we were talking about RHEL, Windows, networking infrastructure, security, edge devices, or cloud services, my response has essentially been the same over the years. I have repeated it so often, that I have jokingly formulated the point into my own theorem on automation. So call it “Appnel's Theorem on Automation” if you will.

If you are wondering what should be automated, look for anything anyone is typing into a Linux shell and clicking through in a user interface. Then ask yourself “is this something that can be automated?” Then ask “what is the value of automating this?” Most Ansible modules wrap command line tools or use the same APIs behind UIs.

Given a sufficient number of things to automate is identified, start with those that cause the most pain and those that you can get done quickly. Remember you want to create a virtuous cycle of releasing reliability, feedback and building trust across your organization. Showing progress and business value quickly will help do that.

Can't be improved?

Just because something works, doesn’t mean it can’t be improved. Friction should be eliminated whenever possible.

This first aphorism just so happens to be a quote from the movie Black Panther, and it elegantly expresses some important wisdom when it comes to Ansible automation.

Always iterate and adapt to real world feedback from your operations. Optimize readability. Continue to find ways to simplify and reduce friction in your organization and its processes. As changes are introduced into your environments and IT policies over time, they will create new friction and pain points. They will also create new opportunities to apply your automation practices to eliminate them.

Never ending story...

Automation is a journey that never ends.

Heraclitus, a Greek philosopher, said "change is the only constant in life. Nothing endures but change."

Anyone who has been around the IT industry for any length of time knows there is constant change. This is why it is so vital to be agile and prepared to respond to ongoing change, innovation and business demands quickly and reliably.

Automation is not a destination. It is a practice. It is a culture, a mindset and an attitude. Automation is a continuous process of feedback and learning and adapting to change and improving upon what you did before.

Automation creates opportunities and we at Red Hat see opportunities for automation everywhere.

So the question I pose to you is: Where will your automation journey lead you?

Ansible

This topic is split into seven main sections, each section covers a different aspect of automation using Ansible.

Installation

How to install Ansible and run it, from present to future.
Project

Your Ansible project, versioning control, dependencies, syntax
Inventory

How to define your inventory and target hosts
Playbook

Structure your automation, how to separate playbooks and plays
Roles

A best practice in itself, including how to create and fill the role folder
Tasks

Everything about tasks, module usage, tags, loops and filters
Variables

All about variables, where to store them, naming conventions and encryption

Installation

Standard install method

The latest version can only be obtained via the Python package manager, the ansible-core package contains the binaries and 71 standard modules.

pip3 install ansible-core

The included modules can be listed with ansible-doc --list ansible.builtin.
If more special modules are needed, the complete ansible package can be installed, this corresponds to the "old" installation method (batteries included).

pip3 install ansible

Tip

It makes sense to install only the ansible-core package. Afterwards, install the few collections necessary for your project via ansible-galaxy. This way you have an up-to-date, lean installation without unnecessary modules and plugins.
Take a look at the following section for the recommended installation.

Most OS package managers like apt or yum also provide the ansible-core or ansible packages, these versions are not latest but a couple of minor versions behind.

Installing Ansible with OS package manager

Even in fairly recent distributions the Ansible versions are not up to date:

$ pip3 show ansible-core
Name: ansible-core
Version: 2.14.3
...

$ dnf info ansible-core
Available Packages
Name         : ansible-core
Version      : 2.13.3
Release      : 2.el8_7
Architecture : x86_64
Size         : 2.8 M
Source       : ansible-core-2.13.3-2.el8_7.src.rpm
Repository   : appstream
...

$ apt info ansible-core
Package: ansible-core
Version: 2.12.0-1ubuntu0.1
Priority: optional
Section: universe/admin
Origin: Ubuntu
...

Install Collections

Necessary modules and plugins not included in the ansible-core binary are installed through collections.
Additional collections (the included collection is called ansible.builtin) are installed with the ansible-galaxy command-line utility:

ansible-galaxy collection install community.general

Multiple collections can be installed at once with a requirements.yml file.
Thereby the chapter Project > Collections is to be considered. If a container runtime is available, the complete installation can also be bundled in a container image (so-called Execution Environment).

To install collections from (Private) Automation Hub adjust the galaxy section in your ansible.cfg. Take a look at the chapter Project > Ansible configuration > Configure Ansible Galaxy

List installed collections

Show the name and version of each collection installed in the collections_path:

ansible-galaxy collection list

Upgrade installed collections

To upgrade installed collections use the --upgrade (-U) argument:

ansible-galaxy collection install community.general --upgrade

Store collections with your project

By default, collections are installed into a (hidden) folder in the home directory (~/.ansible/collections/ansible_collections/). This is defined by the collections_path configuration setting.

If you want to store collections alongside you project, create a folder collections in your project directory and install collections by providing the --collections-path (-p) argument:

ansible-galaxy collection install community.general --collections-path ./collections/

The collections folder is a default folder, collections stored there are automatically picked up by Ansible.

Install collections offline

Download the collection tarball from Galaxy for offline use:

Navigate to the collection page.
Click on Download tarball.
Copy the archive to the remote server.
Install the collection with the ansible-galaxy CLI utility, use the --offline argument:
```
ansible-galaxy collection install ~/community-general-6.4.0.tar.gz --offline
```

Execution environments

Execution Environments are container images that serve as Ansible control nodes.
EEs provide you with:

Software dependency isolation
Portability across teams and environments
Separation from other automation content and tooling

Ansible Builder

Ansible Builder is a tool that aids in the creation of Ansible Execution Environments. It does this by using the dependency information defined in various Ansible Content Collections, as well as by the user. Ansible Builder will produce a directory that acts as the build context for the container image build, which will contain the Containerfile (Dockerfile), along with any other files that need to be added to the image. There is no need to write a single line of Dockerfile, which makes it easy to build and use Execution Environments.

To build an EE, install ansible-builder from the Python Package Manager:

pip3 install ansible-builder

Define at least the definition file for the Execution Environment and other files, depending on your use-case.

EE definition fileCollection DependenciesPython DependenciesCross-Platform requirements

execution-environment.yml

---
version: 3

images:
  base_image: # (1)!
    name: ghcr.io/ansible-community/community-ee-base:latest

dependencies: # (2)!
  galaxy: requirements.yml # (3)!
  python: requirements.txt # (4)!
  system: bindep.txt

Some more useful base images are (take a look if a more recent tag is available):
- quay.io/rockylinux/rockylinux:9
- ghcr.io/ansible-community/community-ee-minimal:latest
- registry.redhat.io/ansible-automation-platform-24/ee-supported-rhel9:1.0.0-456
- registry.redhat.io/ansible-automation-platform/ee-minimal-rhel9::2.15.5-4
If you want to install a specific Ansible version add this configuration under the dependencies key:
```
dependencies:
  ansible_core:
    package_pip: ansible-core==2.14.3
```

Instead of using a separate file, you can provide collections (and roles) as a list:

dependencies:
  galaxy:
    collections:
      - kubernetes.core
    roles:
      - timgrt.terraform

Instead of using a separate file, you can provide the Python packages as a list:

dependencies:
  python:
    - awxkit
    - boto
    - botocore
    - boto3
    - openshift
    - requests-oauthlib

requirements.yml

---
collections:
  - redhat.openshift

requirements.txt

awxkit>=13.0.0
boto>=2.49.0
botocore>=1.12.249
boto3>=1.9.249
openshift>=0.6.2
requests-oauthlib

bindep.txt

If there are RPMS necessary, put them here.

subversion [platform:rpm]
subversion [platform:dpkg]

Package manager not found?

In case you see an error like this: unable to execute /usr/bin/dnf: No such file or directory. This can happen when using RHEL minimal images, you need to adjust the package manager path. Add the following setting to your execution-environment.yml:

options:
  package_manager_path: /usr/bin/microdnf

For more information, go to the Ansible Builder Documentation.

To build the EE, run this command (assuming you have Docker installed, by default Podman is used):

ansible-builder build --tag=demo/openshift-ee --container-runtime=docker -v=3

The resulting container images can be viewed with the docker images command:

$ docker images
REPOSITORY                        TAG       IMAGE ID       CREATED              SIZE
demo/openshift-ee                 latest    2ea9d5d7b185   10 seconds ago       1.14GB

You can also build Execution Environments with ansible-navigator, the Builder is installed alongside Navigator.

ansible-navigator builder build --tag=demo/openshift-ee --container-runtime=docker

Ansible Runner

Using the EE requires a binary which can make use of the Container images, it is not possible to run them with the ansible-playbook binary. You have to use (and install) either the ansible-navigator or the ansible-runner binary.

Tip

The Ansible Navigator is easier to use than the ansible-runner, use this one for creating, reviewing, running and troubleshooting Ansible content, including inventories, playbooks, collections, documentation and execution environments.

Ansible Runner is a tool and python library to provide a stable and consistent interface abstraction to Ansible, it represents the modularization of the part of Ansible AWX that is responsible for running ansible and ansible-playbook tasks and gathers the output from it.

If you want to use it standalone, install the ansible-runner binary:

pip3 install ansible-runner

To use the Ansible from the container image, e.g. run this command which executes an ad hoc command (setup module) against localhost:

ansible-runner run --container-image demo/openshift-ee /tmp -m setup --hosts localhost

Most parameters should be self-explanatory:

run - Run ansible-runner in the foreground
--container-image demo/openshift - Container image to use when running an ansible task
/tmp - base directory containing the ansible-runner metadata (project, inventory, env, etc)
-m setup - Module to execute
--hosts localhost - set of hosts to execute against (here only localhost)

The output looks like expected:

$ ansible-runner run --container-image demo/openshift-ee /tmp -m setup --hosts localhost
[WARNING]: No inventory was parsed, only implicit localhost is available
localhost | SUCCESS => {
    "ansible_facts": {
        "ansible_all_ipv4_addresses": [
            "192.168.178.114",
            "172.17.0.1"
        ],
        "ansible_all_ipv6_addresses": [
            "2001:9e8:4a14:2401:a00:27ff:febf:4207",
            "fe80::a00:27ff:febf:4207",
            "fe80::42:9eff:fef9:df59"
        ],
        "ansible_apparmor": {
            "status": "enabled"
        },
        "ansible_architecture": "x86_64",
        "ansible_bios_date": "12/01/2006",
        "ansible_bios_vendor": "innotek GmbH",
        "ansible_bios_version": "VirtualBox",
        "ansible_board_asset_tag": "NA",
        "ansible_board_name": "VirtualBox",
        "ansible_board_serial": "NA",
        "ansible_board_vendor": "Oracle Corporation",
        ...

Ansible Navigator

The ansible-navigator is text-based user interface (TUI) for the Red Hat Ansible Automation Platform. The Navigator also makes use of the Execution Environments and provides an easier to use interface to interact with EEs (than ansible-runner).
Install the ansible-navigator binary and its dependencies with the Python package manager:

pip3 install ansible-navigator

If you want to use the Navigator with EEs, you'll need a container runtime, install Docker or Podman an your system.

With the Navigator you, for example, can inspect *all locally available Execution Environments

Take a look at the Playbooks section on how to run playbooks in Execution Environments with the Navigator.

Some ansible-navigator commands map to ansible commands (prefix every Navigator command with ansible-navigator):

Navigator command	Description
`exec -- ansible ...`	Runs Ansible ad-hoc commands.
`builder`	Builds new execution environments, the `ansible-builder` utility is installed with `ansible-navigator`.
`config`	Explore the current ansible configuration as with `ansible-config`.
`doc`	Explore the documentation for modules and plugins as with `ansible-doc`.
`inventory`	Inspect the inventory and browser groups and hosts.
`lint`	Runs best-practice checker, `ansible-lint` needs to be installed locally or in the selected execution-environment.
`run`	Runs Playbooks.
`exec -- ansible-test ...`	Executes sanity, unit and integration tests for Collections.
`exec -- ansible-vault ...`	Runs utility to encrypt or decrypt Ansible content.

Project

Version Control

Keep your playbooks and inventory file in git (or another version control system), and commit when you make changes to them. This way you have an audit trail describing when and why you changed the rules that are automating your infrastructure.

Tip

Always use version control!

Take a look at the Development section for additional information.

Ansible configuration

Always use a project-specific ansible.cfg in the parent directory of your project. The following configuration can be used as a starting point:

[defaults]
# Define inventory, no need to provide '-i' anymore.
inventory = inventory/production.ini

# Playbook-Output in YAML instead of JSON
callback_result_format = yaml

Show check mode

The following parameter enables displaying markers when running in check mode.

[defaults]
check_mode_markers = true

The markers are DRY RUN at the beginning and ending of playbook execution (when calling ansible-playbook --check) and CHECK MODE as a suffix at every play and task that is run in check mode.

Example

$ ansible-playbook -i inventory.ini playbook.yml -C

DRY RUN ******************************************************************

PLAY [Install and configure Worker Nodes] [CHECK MODE] *******************

TASK [Gathering Facts] [CHECK MODE] **************************************
ok: [k8s-worker1]
ok: [k8s-worker2]
ok: [k8s-worker2]

...

Show task path when failed

For easier development when handling with very big playbooks, it may be useful to know which file holds the failed task. To display the path to the file containing the failed task and the line number, add this parameter:

[defaults]
show_task_path_on_failure = true

Example

When set to true:

...

TASK [Set motd message for k8s worker node] **************************************************
task path: /home/timgrt/kubernetes_installation/roles/kube_worker/tasks/configure.yml:39
fatal: [k8s-worker1]: FAILED! =>
...

When set to false:

...

TASK [Set motd message for k8s worker node] ****************************************************
fatal: [k8s-worker1]: FAILED! =>
...

Even if you don't set this, the path is displayed automatically for every task when running with -vv or greater verbosity, but you'll need to run the playbook again.

Configure ansible-galaxy

By default, ansible-galaxy uses https://galaxy.ansible.com as the Galaxy server. To configure additional servers like Automation Hub or Private Automation Hub, you'll need to configure these in the galaxy section.

Create a new section for each server name, set the url option for each server name and set the API token for each server name, if necessary.

[galaxy]
server_list = release_galaxy,automation_hub # (1)!

[galaxy_server.release_galaxy]
url = https://galaxy.ansible.com/ # (2)!

[galaxy_server.automation_hub]
url = https://console.redhat.com/api/automation-hub/content/published/ # (3)!
auth_url = https://sso.redhat.com/auth/realms/redhat-external/protocol/openid-connect/token # (4)!
token = <secret-token>

Comma-separated list of server identifiers in a prioritized order (in the example galaxy.ansible.com is searched first, then Automation Hub).
The name of the server does not matter, but needs to match the corresponding galaxy_server.* section.
galaxy.ansible.com doesn't need authentication to download collections (or roles).
If you want to publish stuff, you'll need an API Token (or username and password).
Gets content from the published repository, to get stuff from the validated repository, adjust or add a new section and add it to the server_list.
The URL of a Keycloak server token_endpoint if using SSO authentication. Requires token.

Danger

Take extra care when using tokens (or passwords) in your ansible.cfg, as they might get added to Git by accident.
Add the file to your .gitignore.

Take a look at the Ansible documentation for additional information.

Dependencies

Your project will have certain dependencies, make sure to provide a requirements.yml for necessary Ansible collections and a requirements.txt for necessary Python packages.
Consider using Execution Environments where all dependencies are combined in a Container Image.

Collections

Always provide a requirements.yml with all collections used within your project.
This makes sure that required collections can be installed, if only the ansible-core binary is installed.

---
collections:
  - community.general
  - ansible.posix

  - name: cisco.ios
    version: '>=3.1.0'

Install all collections from the requirements-file:

ansible-galaxy collection install -r requirements.yml

Python packages

Always provide a requirements.txt with all Python packages need by modules used within your project.

boto
openshift>=0.6
PyYAML>=3.11

Install all dependencies from the requirements-file:

pip3 install -r requirements.txt

Directory structure

.
├── ansible.cfg
├── hosts
├── k8s_install.yml
├── README.md
├── requirements.txt
├── requirements.yml
└── roles
    ├── k8s_bootstrap
    │   ├── files
    │   │   ├── daemon.json
    │   │   └── k8s.conf
    │   ├── tasks
    │   │   ├── install_kubeadm.yml
    │   │   ├── main.yml
    │   │   └── prerequisites.yml
    │   └── templates
    │       └── kubernetes.repo.j2
    ├── k8s_control_plane
    │   ├── files
    │   │   └── kubeconfig.sh
    │   └── tasks
    │       └── main.yml
    └── k8s_worker_nodes
        └── tasks
            └── main.yml

Filenames

Folder- and file-names consisting of multiple words are separated with underscores (e.g. roles/grafana_deployment/tasks/grafana_installation.yml).
YAML files are saved with the extension .yml.

GoodBad

.
├── ansible.cfg
├── hosts
├── k8s_install.yml
├── README.md
├── requirements.yml
└── roles
    ├── k8s_bootstrap
    │   ├── files
    │   │   ├── daemon.json
    │   │   └── k8s.conf
    │   ├── tasks
    │   │   ├── install_kubeadm.yml
    │   │   ├── main.yml
    │   │   └── prerequisites.yml
    │   └── templates
    │       └── kubernetes.repo.j2
    ├── k8s_control_plane
    │   ├── files
    │   │   └── kubeconfig.sh
    │   └── tasks
    │       └── main.yml
    └── k8s_worker_nodes
        └── tasks
            └── main.yml

Playbook-name without underscores and wrong file extension, role folders or task files inconsistent, with underscores

├── ├── ├── ├── └──              │         │

and wrong extension. ble-project-__codelineno-13-1">. ansible.cfg hosts k8s-install.yaml README.md roles ├── k8s-bootstrap │ ├── files │ │ ├── daemon.json │ │ └── k8s.conf │ ├── tasks │ │ ├── installKubeadm.yaml │ │ ├── main.yml │ │ └── prerequisites.yaml │ └── templates └── kubernetes.repo.j2 ├── k8sControlPlane │ ├── files │ │ └── kubeconfig.sh │ └── tasks └── main.yaml └── k8s_worker-nodes └── tasks └── main.yaml

YAML Syntax

Following a basic YAML coding style across the whole team improves readability and reusability.

Indentation

Two spaces are used to indent everything, e.g. list items or dictionary keys.

GoodBad

Playbook:

---
- name: Initialize Control-Plane Nodes
  hosts: kubemaster
  become: true
  roles:
    - k8s_control_plane

- name: Install and configure Worker Nodes
  hosts: kubeworker
  become: true
  roles:
    - k8s_worker_nodes

Variable-file:

ntp_server_list:
  - 0.de.pool.ntp.org
  - 1.de.pool.ntp.org
  - 2.de.pool.ntp.org
  - 3.de.pool.ntp.org

Playbook with roles not indented by two whitespaces.

- name: Demo play
  hosts: database_servers
  roles:
  - common
  - postgres

List in variable-file indented with four whitespaces:

ntp_server_list:
    - 0.de.pool.ntp.org
    - 1.de.pool.ntp.org
    - 2.de.pool.ntp.org
    - 3.de.pool.ntp.org

The so-called YAML "one-line" syntax is not used, neither for passing parameters in tasks, nor for lists or dictionaries.

GoodBad

---
- name: Install Apache from the testing repo
  ansible.builtin.package:
    name: httpd
    enablerepo: testing
    state: present

- name: Install a list of packages
  ansible.builtin.package:
    name:
      - nginx
      - postgresql
      - postgresql_server
    state: present

Task with One-line syntax:

- name: Install the latest version of Apache from the testing repo
  package: name=httpd enablerepo=testing state=present

List in task with One-line syntax:

- name: Install a list of packages
  package:
    name: ['nginx', 'postgresql', 'postgresql-server']
    state: present

Booleans

Use true and false for boolean values in playbooks.
Do not use the Ansible-specific yes and no as boolean values in YAML as these are completely custom extensions used by Ansible and are not part of the YAML spec. Also, avoid the use of the Python-style True and False for boolean values.

GoodBad

- name: Start and enable service httpd
  ansible.builtin.service:
    name: httpd
    enabled: true
    state: started

- name: Start and enable service httpd
  ansible.builtin.service:
    name: httpd
    enabled: yes
    state: started

YAML 1.1 allows all variants whereas YAML 1.2 allows only true/false, you can avoid a massive migration effort for when it becomes the default.

Use the | bool filter when using bare variables (expressions consisting of just one variable reference without any operator) in when conditions.

GoodBad

Using a variable upgrade_allowed with the default value false, task is executed when overwritten with true value.

- name: Upgrade all packages, excluding kernel & foo related packages # noqa package-latest
  ansible.builtin.package:
    name: "*"
    state: latest
    exclude: kernel*,foo*
  when: upgrade_allowed | bool

- name: Upgrade all packages, excluding kernel & foo related packages
  ansible.builtin.package:
    name: "*"
    state: latest
    exclude: kernel*,foo*
  when: upgrade_allowed

Quoting

Do not use quotes unless you have to, especially for short module-keyword-like strings like present, absent, etc.
When using quotes, use the same type of quotes throughout your playbooks. Always use double quotes ("), whenever possible.

Comments

Use loads of comments!
Well, the name parameter should describe your task in detail, but if your task uses multiple filters or regex's, comments should be used for further explanation.
Commented code is generally to be avoided. Playbooks or task files are not committed, if they contain commented out code.

Bad

Why is the second task commented? Is it not necessary anymore? Does it not work as expected?

- name: Change port to {{ grafana_port }}
  community.general.ini_file:
    path: /etc/grafana/grafana.ini
    section: server
    option: http_port
    value: "{{ grafana_port }}"
  become: true
  notify: restart grafana

# - name: Change theme to {{ grafana_theme }}
#   ansible.builtin.lineinfile:
#     path: /etc/grafana/grafana.ini
#     regexp: '.*default_theme ='
#     line: "default_theme = {{ grafana_theme }}"
#   become: yes
#   notify: restart grafana

Comment commented tasks

If you really have to comment the whole task, add a description why, when and by whom it was commented.

Inventory

An inventory is a list of managed nodes, or hosts, that Ansible deploys and configures. The inventory can either be static or dynamic.

Convert INI to YAML

The most common format for the Ansible Inventory is the .ini format, but sometimes you might need the inventory file in the YAML format.
A .ini inventory file for example might look like this:

inventory.ini

[control]
controller ansible_host=localhost ansible_connection=local

[target]
rocky8 ansible_connection=docker

You can convert your existing inventory to the YAML format with the ansible-inventory utility.

ansible-inventory -i inventory.ini -y --list > inventory.yml

The resulting file is your inventory in YAML format:

inventory.yml

all:
  children:
    control:
      hosts:
        controller:
          ansible_connection: local
          ansible_host: localhost
    target:
      hosts:
        rocky8:
          ansible_connection: docker

Static inventory

Warning

Work in Progress - More description necessary.

Dynamic inventory

Warning

Work in Progress - More description necessary.

Custom dynamic inventory

In case no suitable inventory plugin exists, you can easily write your own. Take a look at the Ansible Development - Extending section for additional information.

In-Memory Inventory

Normally Ansible requires an inventory file, to know which machines it is meant to operate on.

This is typically a manual process but can be greatly improved by using a dynamic inventory to pull inventory information from other systems.

Suppose, however, you needed to create X number of instances, which are transient in nature and had no existing details available to populate an inventory file for Ansible to utilise. If X is a small number, you could easily hand-craft the inventory file while the playbook already runs.

Use the add_host module, which makes use of Ansible's ability to populate an in-memory inventory with information it generates while creating new instances.

Take a look at the following example, the first play creates a couple of Containers and adds them to a new group. The seconds plays targets this new group and connects to the newly created Containers.

---
- name: Add hosts to additional groups
  hosts: localhost
  connection: local
  gather_facts: false
  vars:
    container_list:
      - node1
      - node2
      - node3
  tasks:
    - name: Start managed node containers
      containers.podman.podman_container:
        name: "{{ item }}"
        image: docker.io/timgrt/rockylinux8-ansible:latest
        hostname: "{{ item }}.example.com"
        stop_signal: 15
        state: started
      loop: "{{ container_list }}"

    - name: Add container to new group
      ansible.builtin.add_host:
        name: "{{ item }}" # (1)!
        groups: managed_node_containers # (2)!
        ansible_connection: podman # (3)!
        ansible_python_interpreter: /usr/libexec/platform-python # (4)!
        stage: test # (5)!
      loop: "{{ container_list }}"

- name: Run tasks on containers created in previous play
  hosts: managed_node_containers
  tasks:
    - name: Output stage variable
      ansible.builtin.debug:
        msg: "{{ stage }}"

Every container instance is added by looping the variable container_list. As the name parameter must be a string a loop is necessary.
This is the name of the new group! It is targeted in the second play. The groups parameter can be a list of multiple group names.
These are variables needed to connect to the new instances. As they are Podman containers the podman connection plugin is used.
The Python interpreter which is used in the new instances. Not always necessary, as normally Ansible discovers the interpreter pretty reliable.
This is a custom variable for all new instances. You can add more variables here if necessary.

Playbook output

$ ansible-playbook in-memory-inventory.yml

PLAY [Add hosts to additional groups] *******************************************************************************************************************************

TASK [Start managed node containers] ********************************************************************************************************************************
ok: [localhost] => (item=node1)
ok: [localhost] => (item=node2)
ok: [localhost] => (item=node3)

TASK [Add container to new group] ***********************************************************************************************************************************
changed: [localhost] => (item=node1)
changed: [localhost] => (item=node2)
changed: [localhost] => (item=node3)

PLAY [Run tasks on containers created in previous play] *************************************************************************************************************

TASK [Gathering Facts] **********************************************************************************************************************************************
ok: [node2]
ok: [node1]
ok: [node3]

TASK [Output stage variable] ****************************************************************************************************************************************
ok: [node1] =>
    msg: test
ok: [node2] =>
    msg: test
ok: [node3] =>
    msg: test

Playbooks

Playbooks are first thing you think of when using Ansible. This section describes some good practices.

Directory structure

The main playbook should have a recognizable name, e.g. referencing the projects name or scope. If you have multiple playbooks, create a new folder playbooks and store all playbooks there, except the main playbook (here called site.yml).

.
├── ansible.cfg
├── site.yml
└── playbooks
    ├── database.yml
    ├── loadbalancer.yml
    └── webserver.yml

The site.yml file contains references to the other playbooks:

---
# Main playbook including all other playbooks

- ansible.builtin.import_playbook: playbooks/database.yml # noqa name[play]
- ansible.builtin.import_playbook: playbooks/webserver.yml # noqa name[play]
- ansible.builtin.import_playbook: playbooks/loadbalancer.yml # noqa name[play]

noqa statement

The file site.yml only references other playbooks, still, the ansible-lint utility would trigger, as every play should have the name parameter.
While this is correct (and you should always name your actual plays), the name parameter on import statements is not shown anyway, as they are pre-processed at the time playbooks are parsed. Take a look at import vs. include in the tasks section

Success

Therefore, silencing the linter in this particular case with the noqa statement is acceptable.

In contrast, include statements like ansible.builtin.include_tasks should have the name parameter, as these statements are processed when they are encountered during the execution of the playbook.

The lower-level playbooks contains actual plays:

playbooks/database.yml

---
- name: Install and configure PostgreSQL database
  hosts: postgres_servers
  roles:
    - postgres

To be able to run the overall playbook, as well as the imported playbooks, add this parameter to your ansible.cfg, otherwise roles are not found:

[defaults]
roles_path = .roles

Playbook definition

Don't put too much logic in your playbook, put it in your roles (or even in custom modules).
A playbook could contain pre_tasks, roles, tasks and post_tasks sections, try to limit your playbooks to a list of a roles.

Warning

Avoid using both roles and tasks sections, the latter possibly containing import_role or include_role tasks. The order of execution between roles and tasks isn’t obvious, and hence mixing them should be avoided.

Either you need only static importing of roles and you can use the roles section, or you need dynamic inclusion and you should use only the tasks section. Of course, for very simple cases, you can just use tasks without roles (but playbooks/projects grow quickly, refactor to roles early).

Plays

Avoid putting multiple plays in a playbook, if not really necessary. As every play most likely targets a different host group, create a separate playbook file for it. This way you achieve to most flexibility.

k8s_installation.yml

---
- name: Initialize Control-Plane Nodes
  hosts: kubemaster
  become: true
  roles:
    - k8s_control_plane

- name: Install and configure Worker Nodes
  hosts: kubeworker
  become: true
  roles:
    - k8s_worker_nodes

Separate the two plays into their respective playbooks files and reference them in an overall playbook file:

k8s_control_plane_playbook.yml

---
- name: Initialize Control-Plane Nodes
  hosts: kubemaster
  become: true
  roles:
    - k8s_control_plane

k8s_worker_node_playbook.yml

---
- name: Install and configure Worker Nodes
  hosts: kubeworker
  become: true
  roles:
    - k8s_worker_nodes

k8s_installation.yml

---
- ansible.builtin.import_playbook: k8s_control_plane_playbook.yml # noqa name[play]
- ansible.builtin.import_playbook: k8s_worker_node_playbook.yml # noqa name[play]

Module defaults

If your playbook uses modules which need the be called with the same set of parameters or arguments, you can define these as module_defaults.
The defaults can be set at play, block or task level.

Module defaults are defined by grouping together modules that share common sets of parameters, especially for modules making heavy use of API-interaction such as cloud modules.

Since ansible-core 2.12, collections can define their own groups in the meta/runtime.yml file. module_defaults does not take the collections keyword into account, so the fully qualified group name must be used for new groups in module_defaults.

GoodBad

---
- name: Demo play with modules which need to call the same arguments
  hosts: aci
  module_defaults:
    group/cisco.aci.all:
      host: "{{ apic_api }}"
      username: "{{ apic_user }}"
      password: "{{ apic_password }}"
      validate_certs: false
  tasks:
    - name: Get system info
      cisco.aci.aci_system:
        state: query

    - name: Create a new demo tenant
      cisco.aci.aci_tenant:
        name: demo-tenant
        description: Tenant for demo purposes
        state: present

Authentication parameters are repeated in every task.

- name: Demo play with modules which need to call the same arguments
  hosts: aci
  tasks:
    - name: Get system info
      cisco.aci.aci_system:
        host: "{{ apic_api }}"
        username: "{{ apic_user }}"
        password: "{{ apic_password }}"
        validate_certs: false
        state: query

    - name: Create a new demo tenant
      cisco.aci.aci_tenant:
        host: "{{ apic_api }}"
        username: "{{ apic_user }}"
        password: "{{ apic_password }}"
        validate_certs: false
        name: demo-tenant
        description: Tenant for demo purposes
        state: present

To identify the correct group (remember, these are not inventory groups), take a look at the meta/runtime.yml of the desired collection. It needs to define the action_groups list, for example:

~/.ansible/collections/ansible_collections/cisco/aci/meta/runtime.yml

---
requires_ansible: '>=2.9.10'
action_groups:
  all:
    - aci_aaa_custom_privilege
    - aci_aaa_domain
    - aci_aaa_role
    - aci_aaa_ssh_auth
    - aci_aaa_user
    - aci_aaa_user_certificate
    - aci_aaa_user_domain
    - aci_aaa_user_role
    - aci_access_port_block_to_access_port
    ...

The group is called all, therefore the module defaults groups needs to be group/cisco.aci.all.

Note

Any module defaults set at the play level (and block/task level when using include_role or import_role) will apply to any roles used, which may cause unexpected behavior in the role.

Collections in playbooks

In a playbook, you can control the collections Ansible searches for modules and action plugins to execute.

tl;dr

This is not recommended, try to avoid this.

- name: Initialize Control-Plane Nodes
  hosts: kubemaster
  collections:
    - kubernetes.core
    - computacenter.utils
  become: true
  roles:
    - k8s_control_plane

With that you could omit the provider.collection part when using modules, by default you would reference a module with the FQCN:

- name: Check if Weave is already installed
  kubernetes.core.k8s_info:
    api_version: v1
    kind: DaemonSet
    name: weave-net
    namespace: kube-system
  register: weave_daemonset

With the collections list defined as part of the play definition, you could write your tasks like this:

- name: Check if Weave is already installed
  k8s_info:
    api_version: v1
    kind: DaemonSet
    name: weave-net
    namespace: kube-system
  register: weave_daemonset

Warning

If your playbook uses both the collections keyword and one or more roles, the roles do not inherit the collections set by the playbook!
The collections keyword merely creates an ordered search path for non-namespaced plugin and role references. It does not install content or otherwise change Ansible’s behavior around the loading of plugins or roles. Note that an FQCN is still required for non-action or module plugins (for example, lookups, filters, tests).

Tip

It is preferable to use a module or plugin’s FQCN over the collections keyword!

Executing playbooks

To run your playbook, use the ansible-playbook command.

ansible-playbook playbook.yml

Some useful command-line parameters when executing your playbook are the following

-C or --check runs the playbook without making any modifications
-D or --diff shows the differences when changing (small) files and templates
--step runs one-step-at-a-time, you need to confirm each task before running
--list-tags lists all available tags
--list-tasks lists all tasks that would be executed

With Ansible Navigator

To ensure that your Ansible Content works when running it locally during development and when running it in AAP or AWX later, it is advisable to execute it with the same Execution Environment. The ansible-playbook command can't run these, this is where the Navigator comes in.

The Ansible (Content) Navigator is a command-line tool and a text-based user interface (TUI) for creating, reviewing, running and troubleshooting Ansible content, including inventories, playbooks, collections, documentation and container images (execution environments). Take a look at the Installation section on how to install the utility and dependencies.

Use the following minimal configuration for the Navigator and store it in your project root directory:

ansible-navigator.yml

---
ansible-navigator:
  execution-environment:
    image: ghcr.io/ansible-community/community-ee-base:latest # (1)!
    pull:
      policy: missing
  logging:
    level: warning
    file: logs/ansible-navigator.log
  mode: stdout # (2)!
  playbook-artifact:
    enable: true
    save-as: "logs/{playbook_status}-{playbook_name}-{time_stamp}.json" # (3)!

Specifies the name of the execution environment image to use, change this, if you want to use your own. The pull policy will download the image if it is not already present (this also means no updated images will be downloaded!).
To build and use your own Execution Environment take a look at the section Installation > Execution Environments.
Specifies the user-interface mode, with stdout it will output to standard-out as with the usual ansible-playbook command. Use interactive to use the TUI. You can provide the CLI-parameter -m or --mode to overwrite the configuration.
Specifies the name for artifacts created from completed playbooks. For example, for a successful run of the site.yml playbook a log file like logs/successful-site-2023-11-01T12:20:20.907856+00:00.json. For failed runs it would be logs/failed-site-2023-11-01T12:29:17.020432+00:00.json. With the replay command, you now can observe output of previous playbook runs, e.g. ansible-navigator replay logs/failed-site-2023-11-01T12\:29\:33.129179+00\:00.json.

You can also use the Navigator configuration for all your projects, save it as a hidden file in your home directory (e.g. ~/.ansible-navigator.yml).

Take a look at the official Ansible Navigator Documentation for all other configuration options.

Warning

With the configuration above, playbook artifacts (logs), as well as the Navigator Log-file, will be stored in a logs folder in your playbook directory. Consider ignoring the folder from Git tracking.

.gitignore

logs/

Executing a playbook with the Navigator is as easy as before, just run it like this:

ansible-navigator run site.yml

Append any CLI-parameters (e.g. -i inventory.ini) that you are used to as when executing it with ansible-playbook.

Tip

Using the Interactive mode (the TUI) is encouraged, try around!

Roles

New playbook functionality is always added in a role. Roles should only serve a defined purpose that is unambiguous by the role name. The role name should be short and unique. It is separated with hyphens, if it consists of several words.

Readme

Every role must have a role-specific README.md describing scope and focus of the role. Use the following example:

# Role name/title

Brief description of the role, what it does and what not.

## Requirements

Technical requirements, e.g. necessary packages/rpms, own modules or plugins.

## Role Variables

The role uses the following variables:

| Variable Name | Type    | Default Value | Description            |
| ------------- | ------- | ------------- | ---------------------- |
| example       | Boolean | false         | Brief description      |

## Dependencies

This role expects to run **after** the following roles:
* repository
* networking
* common
* software

## Tags

The role can be executed with the following tags:
* install
* configure
* service

## Example Playbook

Use the role in a playbook like this (after running plays/roles from dependencies section):
```yaml
- name: Execute role
  hosts: example_servers
  become: true
  roles:
    - example_role
```

## Authors

Tim Grützmacher - <tim.gruetzmacher@computacenter.com>

Role structure

Role skeleton

The ansible-galaxy utility can be used to create the role skeleton with the following command:

ansible-galaxy role init roles/demo

This would create the following directory:

roles/demo/
├── defaults
│   └── main.yml
├── files
├── handlers
│   └── main.yml
├── meta
│   └── main.yml
├── README.md
├── tasks
│   └── main.yml
├── templates
├── tests
│   ├── inventory
│   └── test.yml
├── .travis.yml
└── vars
    └── main.yml

At least the folders (and content) tests (a sample inventory and playbook for testing, we will use a different testing method) and vars (variable definitions, not used according to this Best Practice Guide, because we use only group_vars, host_vars and defaults) are not necessary. Also the .travis.yml (a CI/CD solution) definition is not useful.

Tip

Use a custom role skeleton which is used by ansible-galaxy!

Consider the following role skeleton, note the missing vars and test folder and the newly added Molecule folder.

roles/role_skeleton/
├── defaults
│   └── main.yml
├── files
├── handlers
│   └── main.yml
├── meta
│   └── main.yml
├── molecule
│   └── default
│       ├── converge.yml
│       └── molecule.yml
├── README.md
├── tasks
│   └── main.yml
└── templates

You need to define the following parameter in your custom ansible.cfg:

[galaxy]
role_skeleton = roles/role_skeleton

Success

Afterwards, initializing a new role with ansible-galaxy role init creates a role structure with exactly the content you need!

Tasks

Tasks should always be inside of a role. Do not use tasks in a play directly.
Logically related tasks are to be separated into individual files, the main.yml of a role only imports other task files.

.
└── roles
    └── k8s_bootstrap
        └── tasks
            ├── install_kubeadm.yml
            ├── main.yml
            └── prerequisites.yml

The file name of a task file should describe the content.

roles/k8s_bootstrap/tasks/main.yml

---
- ansible.builtin.import_tasks: prerequisites.yml # noqa name[missing]
- ansible.builtin.import_tasks: install_kubeadm.yml # noqa name[missing]

noqa statement

The file main.yml only references other task-files, still, the ansible-lint utility would trigger, as every task should have the name parameter.
While this is correct (and you should always name your actual tasks), the name parameter on import statements is not shown anyway, as they are pre-processed at the time playbooks are parsed. Take a look at the following section regarding import vs. include.

Success

Therefore, silencing the linter in this particular case with the noqa statement is acceptable.

In contrast, include statements like ansible.builtin.include_tasks should have the name parameter, as these statements are processed when they are encountered during the execution of the playbook.

import vs. include

Ansible offers two ways to reuse tasks: statically with ansible.builtin.import_tasks and dynamically with ansible.builtin.include_tasks.
Each approach to reuse distributed Ansible artifacts has advantages and limitations, take a look at the Ansible documentation for an in-depth comparison of the two statements.

Tip

In most cases, use the static ansible.builtin.import_tasks statement, it has more advantages than disadvantages.

One of the biggest disadvantages of the dynamic include_tasks statement, syntax errors are not found easily with --syntax-check or by using ansible-lint. You may end up with a failed playbook, although all your testing looked fine. Take a look at the following example, the recommended ansible.builtin.import_tasks statement on the left, the ansible.builtin.include_tasks statement on the right.

Syntax or linting errors found

Using static ansible.builtin.import_tasks:

roles/prerequisites/tasks/main.yml

---
- ansible.builtin.import_tasks: prerequisites.yml
- ansible.builtin.import_tasks: install_kubeadm.yml

Task-file with syntax error (module-parameters are not indented correctly):

install_kubeadm.yml

- name: Install Kubernetes Repository
  ansible.builtin.template:
  src: kubernetes.repo.j2
  dest: /etc/yum.repos.d/kubernetes.repo

Running playbook with --syntax-check or running ansible-lint:

$ ansible-playbook k8s_install.yml --syntax-check
ERROR! conflicting action statements: ansible.builtin.template, src

The error appears to be in '/home/timgrt/kubernetes_installation/roles/k8s-bootstrap/tasks/install_kubeadm.yml': line 3, column 3, but may
be elsewhere in the file depending on the exact syntax problem.

The offending line appears to be:


- name: Install Kubernetes Repository
  ^ here
$ ansible-lint k8s_install.yml
WARNING  Listing 1 violation(s) that are fatal
syntax-check[specific]: conflicting action statements: ansible.builtin.template, src
roles/k8s_bootstrap/tasks/install_kubeadm.yml:3:3


                  Rule Violation Summary  
count tag                    profile rule associated tags
    1 syntax-check[specific] min     core, unskippable  

Failed: 1 failure(s), 0 warning(s) on 12 files.

Syntax or linting errors NOT found!

Using dynamic ansible.builtin.include_tasks:

roles/prerequisites/tasks/main.yml

---
- ansible.builtin.include_tasks: prerequisites.yml
- ansible.builtin.include_tasks: install_kubeadm.yml

Task-file with syntax error (module-parameters are not indented correctly):

install_kubeadm.yml

- name: Install Kubernetes Repository
  ansible.builtin.template:
  src: kubernetes.repo.j2
  dest: /etc/yum.repos.d/kubernetes.repo

Running playbook with --syntax-check or running ansible-lint:

$ ansible-playbook k8s_install.yml --syntax-check

playbook: k8s_install.yml
$ ansible-lint k8s_install.yml

Passed: 0 failure(s), 0 warning(s) on 12 files. Last profile that met the validation criteria was 'production'.

Danger

As the --syntax-check or ansible-lint are doing a static code analysis and the task-files are not included statically, possible syntax errors are not recognized!

Your playbook will fail when running it live, revealing the syntax error.

Info

There are also big differences in resource consumption and performance, imports are quite lean and fast, while includes require a lot of management and accounting.

Naming tasks

It is possible to leave off the name for a given task, though it is recommended to provide a description about why something is being done instead. This description is shown when the playbook is run.
Write task names in the imperative (e.g. "Ensure service is running"), this communicates the action of the task. Start with a capital letter.

GoodBad

- name: Install webserver package
  ansible.builtin.package:
    name: httpd
    state: present

- package:
    name: httpd
    state: present

Using name parameter, but not starting with capital letter, nor describing the task properly.

- name: install package
  package:
    name: httpd
    state: present

Idempotence

Each task must be idempotent, if non-idempotent modules are used (command, shell, raw) these tasks must be developed via appropriate parameters or conditions to an idempotent mode of operation.

Tip

In general, the use of non-idempotent modules should be reduced to a necessary minimum.

command vs. shell module

In most of the use cases, both shell and command modules perform the same job. However, there are few main differences between these two modules. The command module uses the Python interpreter on the target node (as all other modules), the shell module runs a real shell on the target (pipes and redirects are available, as well as access to environment variables).

Tip

Always try to use the command module over the shell module, if you do not explicitly need shell functionality.

Parsing shell meta-characters can lead to unexpected commands being executed if quoting is not done correctly so it is more secure to use the command module when possible. To sanitize any variables passed to the shell module, you should use {{ var | quote }} instead of
just {{ var }} to make sure they do not include evil things like semicolons.

creates and removes

Check mode is supported for non-idempotent modules when passing creates or removes. If running in check mode and either of these are specified, the module will check for the existence of the file and report the correct changed status. If these are not supplied, the task will be skipped.

Warning

Work in Progress - More description necessary.

failed_when and changed_when

Warning

Work in Progress - More description necessary.

GoodBad

- name: Install webserver package
  ansible.builtin.package:
    name: httpd
    state: present

This task never reports a changed state or fails when an error occurs.

- name: Install webserver package
  shell: sudo yum install http
  changed_when: false
  failed_when: false

Modules (and Collections)

Use the full qualified collection names (FQCN) for modules, they are supported since Version 2.9 and ensures your tasks are set for the future.

GoodBad

- name: Install webserver package
  ansible.builtin.package:
    name: httpd
    state: present

- package:
    name: httpd
    state: present

In Ansible 2.10, many plugins and modules have migrated to Collections on Ansible Galaxy. Your playbooks should continue to work without any changes. Using the FQCN in your playbooks ensures the explicit and authoritative indicator of which collection to use as some collections may contain duplicate module names.

Module parameters

Module defaults

The module_defaults keyword can be used at the play, block, and task level. Any module arguments explicitly specified in a task will override any established default for that module argument.
It makes the most sense to define the module defaults at play level, take a look in that section for an example and things to consider.

Permissions

When using modules like copy or template you can (and should) set permissions for the files/templates deployed with the mode parameter.

For those used to /usr/bin/chmod, remember that modes are actually octal numbers.
Add a leading zero (or 1 for setting sticky bit), showing Ansible’s YAML parser it is an octal number and quote it (like "0644" or "1777"), this way Ansible receives a string and can do its own conversion from string into number.

Warning

Giving Ansible a number without following one of these rules will end up with a decimal number which can have unexpected results.

GoodBad

- name: Copy index.html template
  ansible.builtin.template:
    src: welcome.html
    dest: /var/www/html/index.html
    mode: "0644"
    owner: apache
    group: apache
  become: true

Missing leading zero:

- name: copy index
  template:
    src: welcome.html
    dest: /var/www/html/index.html
    mode: 644
    owner: apache
    group: apache
  become: true

This leads to these permissions!

[root@demo /]# ll /var/www/html/
total 68
--w----r-T 1 apache apache 67691 Nov 18 14:30 index.html

State definition

The state parameter is optional to a lot of modules. Whether state: present or state: absent, it’s always best to leave that parameter in your playbooks to make it clear, especially as some modules support additional states.

Files vs. Templates

Ansible differentiates between files for static content (deployed with copy module) and templates for content, which should be rendered dynamically with Jinja2 (deployed with template module).

Tip

In almost every case, use templates, deployed via template module.

Even if there currently is nothing in the file that is being templated, if there is the possibility in the future that it might be added, having the file handled by the template module makes adding that functionality much simpler than if the file is initially handled by the copy module( and then needs to be moved before it can be edited).

Additionally, you now can add a marker, indicating that manual changes to the file will be lost:

TemplateRendered output

{{ ansible_managed | ansible.builtin.comment }}

#
# Ansible Managed
#

ansible.builtin.comment filter

By default, {{ ansible_managed }} is replaced by the string Ansible Managed as is (can be adjusted in the ansible.cfg).
In most cases, the appropriate comment symbol must be prefixed, this should be done with the ansible.builtin.comment filter.
For example, .xml files need to be commented differently, which can be configured:

TemplateRendered output

{{ ansible_managed | ansible.builtin.comment('xml') }}

<!--
-
- Ansible managed
-
-->

You can also use the decorate parameter to choose the symbol yourself.
Take a look at the Ansible documentation for additional information.

When using the template module, append .j2 to the template file name. Keep filenames and templates as close to the name on the destination system as possible.

Conditionals

If the when: condition results in a line that is very long, and is an and expression, then break it into a list of conditions.

GoodBad

- name: Set motd message for k8s worker node
  ansible.builtin.copy:
    content: "This host is used as k8s worker.\n"
    dest: /etc/motd
    mode: "0644"
  when:
    - inventory_hostname in groups['kubeworker']
    - kubeadm_join_result.rc == 0

- name: Set motd message for k8s worker node
  copy:
    content: "This host is used as k8s worker.\n"
    dest: /etc/motd
  when: inventory_hostname in groups['kubeworker'] and kubeadm_join_result.rc == 0

When using conditions on blocks, move the when statement to the top, below the name parameter, to improve readability.

GoodBad

- name: Install, configure, and start Apache
  when: ansible_facts['distribution'] == 'CentOS'
  block:
    - name: Install httpd and memcached
      ansible.builtin.package:
        name:
          - httpd
          - memcached
        state: present

    - name: Apply the foo config template
      ansible.builtin.template:
        src: templates/src.j2
        dest: /etc/foo.conf
        mode: "0644"

    - name: Start service bar and enable it
      ansible.builtin.service:
        name: bar
        state: started
        enabled: true

- name: Install, configure, and start Apache
  block:
    - name: Install httpd and memcached
      ansible.builtin.package:
        name:
        - httpd
        - memcached
        state: present

    - name: Apply the foo config template
      ansible.builtin.template:
        src: templates/src.j2
        dest: /etc/foo.conf

    - name: Start service bar and enable it
      ansible.builtin.service:
        name: bar
        state: started
        enabled: True
  when: ansible_facts['distribution'] == 'CentOS'

Avoid the use of when: foo_result is changed whenever possible. Use handlers, and, if necessary, handler chains to achieve this same result.

Loops

Warning

Work in Progress - More description necessary.

Converting from with_<lookup> to loop is described with a Migration Guide in the Ansible documentation

Limit loop output

When looping over complex data structures, the console output of your task can be enormous. To limit the displayed output, use the label directive with loop_control. For example, this tasks creates users with multiple parameters in a loop:

- name: Create local users
  ansible.builtin.user:
    name: "{{ item.name }}"
    groups: "{{ item.groups }}"
    append: "{{ item.append }}"
    comment: "{{ item.comment }}"
    generate_ssh_key: true
    password_expire_max: "{{ item.password_expire_max }}"
  loop: "{{ user_list }}"
  loop_control:
    label: "{{ item.name }}" # (1)!

Content of variable user_list:

user_list:
  - name: tgruetz
    groups: admins,docker
    append: false
    comment: Tim Grützmacher
    shell: /bin/bash
    password_expire_max: 180
  - name: joschmi
    groups: developers,docker
    append: true
    comment: Jonathan Schmidt
    shell: /bin/zsh
    password_expire_max: 90
  - name: mfrink
    groups: developers
    append: true
    comment: Mathias Frink
    shell: /bin/bash
    password_expire_max: 90

Running the playbook results in the following task output, only the content of the name parameter is shown instead of all key-value pairs in the list item.

GoodBad

TASK [common : Create local users] *********************************************
Friday 18 November 2022  12:18:01 +0100 (0:00:01.955)       0:00:03.933 *******
changed: [demo] => (item=tgruetz)
changed: [demo] => (item=joschmi)
changed: [demo] => (item=mfrink)

Not using the label in the loop_control dictionary results in a very long output:

TASK [common : Create local users] *********************************************
Friday 18 November 2022  12:22:40 +0100 (0:00:01.512)       0:00:03.609 *******
changed: [demo] => (item={'name': 'tgruetz', 'groups': 'admins,docker', 'append': False, 'comment': 'Tim Grützmacher', 'shell': '/bin/bash', 'password_expire_max': 90})
changed: [demo] => (item={'name': 'joschmi', 'groups': 'developers,docker', 'append': True, 'comment': 'Jonathan Schmidt', 'shell': '/bin/zsh', 'password_expire_max': 90})
changed: [demo] => (item={'name': 'mfrink', 'groups': 'developers', 'append': True, 'comment': 'Mathias Frink', 'shell': '/bin/bash', 'password_expire_max': 90})

Filter

Warning

Work in Progress - More description necessary.

Variables

Where to put variables

I always store all my variables at the following three locations:

group_vars folder
host_vars folder
defaults folder in roles

The defaults-folder contains only default values for all variables used by the role.

Naming Variables

The variable name should be self-explanatory (as brief as possible, as detailed as necessary), use multiple words and don't shorten things.

Multiple words are separated with underscores (_)
List-Variables are suffixed with _list
Dictionary-Variables are suffixed with _dict
Boolean values are provided with lowercase true or false

GoodBad

download_directory: ~/.local/bin
create_key: true
needs_agent: false

dir: ~/.local/bin
create_key: yes
needsAgent: no
knows_oop: True

Referencing variables

After a variable is defined, use Jinja2 syntax to reference it. Jinja2 variables use double curly braces ({{ and }}).
Use spaces after and before the double curly braces and the variable name.
When referencing list or dictionary variables, try to use the bracket notation instead of the dot notation. Bracket notation always works and you can use variables inside the brackets. Dot notation can cause problems because some keys collide with attributes and methods of python dictionaries.

GoodBad

Simple variable reference:

- name: Deploy configuration file
  ansible.builtin.template:
    src: foo.cfg.j2
    dest: "{{ remote_install_path }}/foo.cfg"
    mode: "0644"

Bracket-notation and using variable (interface_name) inside:

- name: Output IPv4 address of interface {{ interface_name }}
  ansible.builtin.debug:
    msg: "{{ ansible_facts[interface_name]['ipv4']['address'] }}"

Not using whitespaces around variable name.

- name: Deploy configuration file
  ansible.builtin.template:
    src: foo.cfg.j2
    dest: "{{remote_install_path}}/foo.cfg"

Not using whitespaces and using dot-notation.

- name: Output IPv4 address of eth0 interface
  ansible.builtin.debug:
    msg: "{{ansible_facts.eth0.ipv4.address}}"

Encrypted variables

Tip

All variables with sensitive content should be vault-encrypted.

Although encrypting just the value of a single variable is possible (with ansible-vault encrypt_string), you should avoid this. Store all sensitive variables in a single file and encrypt the whole file.
For example, to store sensitive variables in group_vars, create the subdirectory for the group and within create two files named vars.yml and vault.yml.
Inside of the vars.yml file, define all of the variables needed, including any sensitive ones. Next, copy all of the sensitive variables over to the vault.yml file and prefix these variables with vault_. Adjust the variables in the vars file to point to the matching vault_ variables using Jinja2 syntax, and ensure that the vault file is vault encrypted.

GoodBad

---
# file: group_vars/database_servers/vars.yml
username: "{{ vault_username }}"
password: "{{ vault_password }}"

---
# file: group_vars/database_servers/vault.yml
# NOTE: THIS FILE MUST ALWAYS BE VAULT-ENCRYPTED
vault_username: admin
vault_password: ex4mple

I can still read the credentials...?

Obviously, you wouldn't be able to read the content of the file group_vars/database_servers/vault.yml, as the file would be encrypted.
This only demonstrates how the variables are referencing each other.
The encrypted vault.yml file looks something like this:

$ANSIBLE_VAULT;1.1;AES256
30653164396132376333316665656131666165613863343330616666376264353830323234623631
6361303062336532303665643765336464656164363662370a663834313837303437323332336631
65656335643031393065333366366639653330353634303664653135653230656461666266356530
3935346533343834650a323934346666383032636562613966633136663631636435333834393261
36363833373439333735653262306331333062383630623432633134386138656636343137333439
61633965323066633433373137383330366466366332626334633234376231393330363335353436
62383866616232323132376366326161386561666238623731323835633237373036636561666165
36363838313737656232376365346136633934373861326130636531616438643036656137373762
39616234353135613063393536306536303065653231306166306432623232356465613063336439
34636232346334386464313935356537323832666436393336366536626463326631653137313639
36353532623161653266666436646135396632656133623762643131323439613534643430636333
31386635613238613233

# file: group_vars/database_servers.yml
username: admin
password: ex4mple

Defining variables this way makes sure that you can still find them with grep.
Encrypting files can be done with this command:

ansible-vault encrypt group_vars/database_servers/vault.yml

Once a variable file is encrypted, it should not be decrypted again (because it may get committed unencrypted). View or edit the file like this:

ansible-vault view group_vars/database_servers/vault.yml

ansible-vault edit group_vars/database_servers/vault.yml

Warning

There are modules which will print the values of encrypted variables into STDOUT while using them or with higher verbosity. Be sure to check the parameters and return values of all modules which use encrypted variables!

A good example is the ansible.builtin.user module, it automatically obfuscates the value for the password parameter, replacing it with the string NOT_LOGGING_PASSWORD.
The ansible.builtin.debug module on the other hand is a bad example, it will output the password in clear-text (well, by design, but this is not what you would expect)!

Success

Always add the no_log: true key-value-pair for tasks that run the risk of leaking vault-encrypted content!

GoodBad

---
- name: Using no_log parameter
  hosts: database_servers
  tasks:
    - name: Add user
      ansible.builtin.user:
        name: "{{ username }}"
        password: "{{ password }}"

    - name: Debugging a vaulted variable with no_log
      ansible.builtin.debug:
        msg: "{{ password }}"
      no_log: true

Output of playbook run

Using the stdout_callback: community.general.yaml for better readability, see Ansible configuration for more info.

$ ansible-playbook nolog.yml -v

[...]

TASK [Add user] *********************************************
[WARNING]: The input password appears not to have been hashed. The 'password'
argument must be encrypted for this module to work properly.
ok: [db_server1] => changed=false
  append: false
  comment: ''
  group: 1002
  home: /home/admin
  move_home: false
  name: admin
  password: NOT_LOGGING_PASSWORD
  shell: /bin/bash
  state: present
  uid: 1002

ASK [Debugging a vaulted Variable with no_log] *************
ok: [db_server1] =>
  censored: 'the output has been hidden due to the fact that ''no_log: true'' was specified for this result'

[...]

The debug task does not print the value of the password, the output is censored.

Hint

Observing the output from the "Add user" task, you can see that the value of the password parameter is not shown. The warning from the "Add user" task stating an unencrypted password is related to not having hashed the password. You can achieve this by using the password_hash filter:

password: "{{ vault_password | password_hash('sha512', 'mysecretsalt') }}"

This example uses the string mysecretsalt for salting, in cryptography, a salt is random data that is used as an additional input to a one-way function. Consider using a variable for the salt and treat it the same as the password itself!

password: "{{ vault_password | password_hash('sha512', vault_salt) }}"

In this example, the salt is stored in a variable, the same way as the password itself. If you hashed the password, the warning will disappear.

- name: Not using no_log parameter
  hosts: database_servers
  become: true
  tasks:
    - name: Add user
      ansible.builtin.user:
        name: "{{ username }}"
        password: "{{ password }}"

    - name: Debugging a vaulted Variable
      ansible.builtin.debug:
        msg: "{{ password }}"

Output of playbook run

$ ansible-playbook nolog.yml -v

[...]

TASK [Add user] *********************************************
[WARNING]: The input password appears not to have been hashed. The 'password'
argument must be encrypted for this module to work properly.
ok: [db_server1] => changed=false
  append: false
  comment: ''
  group: 1002
  home: /home/admin
  move_home: false
  name: admin
  password: NOT_LOGGING_PASSWORD
  shell: /bin/bash
  state: present
  uid: 1002

ASK [Debugging a vaulted Variable with no_log] *************
ok: [db_server1] =>
  msg: ex4mple

[...]

Prevent unintentional commits

Use a pre-commit hook to prevent accidentally committing unencrypted sensitive content. The easiest way would be to use the pre-commit framework/tool with the following configuration:

.pre-commit-config.yaml

repos:
  - repo: https://github.com/timgrt/pre-commit-hooks
      rev: v0.2.1
      hooks:
        - id: check-vault-files

Take a look at the development section for additional information.

Variable validation

Playbooks often need user input, this may lead to errors like

required variables not provided
wrong variable type (e.g. integer instead of string, string instead of list, ...)
typos in variable values
...

It is useful to validate the user input early and provide a meaningful error message, if necessary.

Assert module

For simple variable validations, use the ansible.builtin.assert module, it checks if a given expressions evaluates to true.

- name: Ensure AAP credentials are provided
  ansible.builtin.assert:
    that:
      - lookup('env', 'CONTROLLER_HOST') | length > 0
      - lookup('env', 'CONTROLLER_HOST') | length > 0
      - lookup('env', 'CONTROLLER_HOST') | length > 0
    quiet: true
    fail_msg: |
      AAP login credentials are missing!
      Export environment variables locally ir add the correct credential to the Job template.

The task above for example checks if three environment variables are set or rather contain input (the lookup plugin produces an empty string if the environment variable is not found). If the environment variable is found and is longer than zero, the expressions evaluates to true, otherwise the error message in the fail_msg parameter is shown and the playbook fails (for this host).

Validate module

Input validation for complex (deeply nested) variables can be challenging with the ansible.builtin.assert module, therefore use the ansible.utils.validate module.
By default, the JSON Schema engine is used by the module to validate the data with the provided criteria, other engines can be used as well.

JSON Schema is extremely widely used and nearly equally widely implemented. There are implementations of JSON Schema validation for many programming languages or environments (e.g. you can use it in VScode where it will validate your variable files, while you are writing it, before you even run the playbook.) and it is well documented. It provides

Structured Data Description
Rule Definition and Enforcement
Produce clear documentation
Extensibility
Data Validation

Take a look at the following example.

Variable FileJSON Schema

---
server_list:
  - fqdn: server1.example.com
    ipv4_address_list:
      - 10.0.5.36
      - 192.168.2.67
    cores: 4
    memory: 16GB
    disk_space: 100GB
    business_owner: john.doe@example.com # (1)!
  - fqdn: server2 # (2)!
    ipv4_address_list:
      - 10.0.5.55
      - 192.168.2.89
    cores: 2
    memory: 8 # (3)!
    disk_space: 100GB

This is an optional field, as you can in the JSON Schema definition in line 47
That is not a FQDN! The JSON Schema definition validates that it is (line 17), as well as checking if the required domain is used.
The memory value is expected to be a string, prefixed with GB. The JSON Schema definition (line 34) ensures the correct type and prefix.

{
    "$schema": "https://json-schema.org/draft/2020-12/schema",
    "title": "Server List variable file validation",
    "description": "A schema to validate the variable file containing the server list",
    "type": "object",
    "additionalProperties": false,
    "properties": {
        "server_list": {
            "type": "array",
            "items": {
                "type": "object",
                "additionalProperties": false,
                "properties": {
                    "fqdn": {
                        "type": "string",
                        "description": "Server name with example.com domain",
                        "pattern": "^([a-z0-9]+)(\\.example\\.com)$"
                    },
                    "ipv4_address_list": {
                        "type": "array",
                        "items": {
                            "type": "string",
                            "format": "ipv4"
                        }
                    },
                    "cores": {
                        "type": "number",
                        "minimum": 1,
                        "maximum": 64
                    },
                    "memory": {
                        "type": "string",
                        "description": "Memory size in GB, expects a number followed by GB",
                        "pattern": "^\\d+GB$"
                    },
                    "disk_space": {
                        "type": "string",
                        "description": "Disk size in GB, expects a number followed by GB",
                        "pattern": "^\\d+GB$"
                    },
                    "business_owner": {
                        "type": "string",
                        "description": "Email address of the responsible person for this server",
                        "format": "email"
                    }
                },
                "required": [
                    "fqdn",
                    "ipv4_address_list",
                    "cores",
                    "memory",
                    "disk_space"
                ]
            }
        }
    },
    "required": [
        "server_list"
    ]
}

To validate the variable file with the provided JSON Schema file, use the following task:

- name: Variable file validation
  ansible.utils.validate:
    data: "{{ lookup('file', 'variables.yml') | from_yaml | to_json }}"
    criteria: "{{ lookup('file', 'json_schemas/server_list_validation.json') }}"
    engine: ansible.utils.jsonschema

The files are read in with a file lookup, the variables file is converted to JSON.
To be able to use the module (or the filter-, test- or lookup-Plugin with the same name), you'll need the ansible.utils collection and an additional Python package:

ansible-galaxy collection install ansible.utils

pip3 install jsonschema

Playbook output showing validation errors

TASK [Variable file validation] ****************************************************************************************
fatal: [localhost]: FAILED! =>
    changed: false
    errors:
    -   data_path: server_list.1.fqdn
        expected: ^([a-z0-9]+)(\.example\.com)$
        found: server2
        json_path: $.server_list[1].fqdn
        message: '''server2'' does not match ''^([a-z0-9]+)(\\.example\\.com)$'''
        relative_schema:
            description: Server name with example.com domain
            pattern: ^([a-z0-9]+)(\.example\.com)$
            type: string
        schema_path: properties.server_list.items.properties.fqdn.pattern
        validator: pattern
    -   data_path: server_list.1.memory
        expected: string
        found: 8
        json_path: $.server_list[1].memory
        message: 8 is not of type 'string'
        relative_schema:
            description: Memory size in GB, expects a number followed by GB
            pattern: ^\d+GB$
            type: string
        schema_path: properties.server_list.items.properties.memory.type
        validator: type
    msg: |-
        Validation errors were found.
        At 'properties.server_list.items.properties.fqdn.pattern' 'server2' does not match '^([a-z0-9]+)(\\.example\\.com)$'.
        At 'properties.server_list.items.properties.memory.type' 8 is not of type 'string'.

PLAY RECAP *************************************************************************************************************
localhost                  : ok=0    changed=0    unreachable=0    failed=1    skipped=0    rescued=0    ignored=0

To create the initial JSON schema for your variable input, multiple tools are available online, for example to convert from YAML to JSON and afterwards from JSON to JSON-Schema.

Usage with ansible.utils.validate filter plugin, also for single variable

The previous examples validated complete variable files (with potentially multiple variables), the following example shows how to validate a single variable (the same as above) with a provided JSON schema file.
It will also make use of the ansible.utils.validate plugin with additional tasks for a more dense output.

TasksJSON Schema

- name: Run variable validation
  ansible.builtin.set_fact:
    server_list_variable_validation: "{{ server_list | ansible.utils.validate(validation_criteria, engine='ansible.utils.jsonschema') }}"
  vars:
    validation_criteria: "{{ lookup('ansible.builtin.file', 'json_schemas/server_list.json') }}"

- name: Output validation errors for server_list variable
  ansible.builtin.debug:
    msg: "Error in {{ item.data_path }}"
  loop: "{{ server_list_variable_validation }}"
  loop_control:
    label: "{{ item.message }}"
  when: server_list_variable_validation | length > 0

- name: Assert variable validation
  ansible.builtin.assert:
    that:
      - server_list_variable_validation | length == 0
    quiet: true
    fail_msg: "Validation failed, fix the errors shown above!"

The validation plugin produces a list of validations. The second task is shown if the list contains entries, the last task fails the playbook by using the assert module if the validation list is not empty.

Pretty much the same validation as before, but this time the uppermost type (line) is array (a list).

{
    "$schema": "https://json-schema.org/draft/2020-12/schema",
    "title": "Server List variable validation",
    "description": "A schema to validate the variable server_list",
    "type": "array",
    "items": {
        "type": "object",
        "additionalProperties": false,
        "properties": {
            "fqdn": {
                "type": "string",
                "description": "Server name with example.com domain",
                "pattern": "^([a-z0-9]+)(\\.example\\.com)$"
            },
            "ipv4_address_list": {
                "type": "array",
                "items": {
                    "type": "string",
                    "format": "ipv4"
                }
            },
            "cores": {
                "type": "number",
                "minimum": 1,
                "maximum": 64
            },
            "memory": {
                "type": "string",
                "description": "Memory size in GB, expects a number followed by GB",
                "pattern": "^\\d+GB$"
            },
            "disk_space": {
                "type": "string",
                "description": "Disk size in GB, expects a number followed by GB",
                "pattern": "^\\d+GB$"
            },
            "business_owner": {
                "type": "string",
                "description": "Email address of the responsible person for this server",
                "format": "email"
            }
        },
        "required": [
            "fqdn",
            "ipv4_address_list",
            "cores",
            "memory",
            "disk_space"
        ]
    }
}

The tasks produce the following output, only showing the data path and the violation message as the list item label.

TASK [Run variable validation] **************************************************************************************
ok: [localhost]

TASK [Output validation errors for server_list variable] ************************************************************
ok: [localhost] => (item='server2' does not match '^([a-z0-9]+)(\\.example\\.com)$') =>
    msg: Error in 1.fqdn
ok: [localhost] => (item=8 is not of type 'string') =>
    msg: Error in 1.memory

TASK [Assert variable validation] ***********************************************************************************
fatal: [localhost]: FAILED! =>
    assertion: server_list_variable_validation | length == 0
    changed: false
    evaluated_to: false
    msg: Validation failed, fix the errors shown above!

PLAY RECAP **********************************************************************************************************
localhost                  : ok=2    changed=0    unreachable=0    failed=1    skipped=0    rescued=0    ignored=0

Disable variable templating

Sometimes, it is necessary to provide special characters like curly braces. The most common use cases include passwords that allow special characters like { or %, and JSON arguments that look like templates but should not be templated.

---
examplepassword: !unsafe 234%234{435lkj{{lkjsdf

Abstract

When handling values returned by lookup plugins, Ansible uses a data type called unsafe to block templating. Marking data as unsafe prevents malicious users from abusing Jinja2 templates to execute arbitrary code on target machines. The Ansible implementation !unsafe ensures that these values are never templated. You can use the same unsafe data type in variables you define, to prevent templating errors and information disclosure.

For complex variables such as hashes or arrays, use !unsafe on the individual elements, take a look at this example for AWX/AAP automation.

For Jinja2 templates this behavior can be achieved with the {% raw %} and {% endraw %} tags.
Consider the following template where name_of_receiver_group should be replaced with a variable you set elsewhere, but details contains stuff which should stay as it is:

receivers:
- name: "{{ name_of_receiver_group }}"
  opsgenie_configs:
  - api_key: 123-123-123-123-123
    send_resolved: false
    {% raw %}
    # protecting the go templates inside the raw section.
    details: { details: "{{ .CommonAnnotations.SortedPairs.Values | join \" \" }}" }
    {% endraw %}

Ansible Development

Development

This topic is split into four main sections, each section covers a different additional tool to consider when developing your Ansible content.

Version Control

Small guide for version controlling playbooks.
Linting

Installation and usage of the community backed Ansible Best Practice checker.
Testing

How to test your Ansible content during development.
Extending

How to create your own custom modules and plugins.
Monitoring & Troubleshooting

How to monitor your playbook for resource consumption or time taken.

Tools

Each section above make use of an additional tool to support you during your Ansible content development. In most cases the standalone installation, as well as a custom container-based installation and usage method is described.

The Ansible community provides a Container image bundling all the tools described in the sections above.

docker pull quay.io/ansible/creator-ee

For example you could output the version of the installed tools like this:

docker run --rm quay.io/ansible/creator-ee ansible-lint --version

docker run --rm quay.io/ansible/creator-ee molecule --version

Take a look into the respective sections for more information and additional usage instructions.

Version Control

Ansible content should be treated as any project containing source code, therefore using version control is always recommended. This guide focuses on Git as it is the most widespread tool.

Installation

Most Linux distributions already have Git installed, otherwise install the package with the package manager of the system, for example:

sudo yum install git

Configuration

Git needs some minimal configuration, most important you need to tell Git who you are.

git config --global user.name "Your Name"

git config --global user.email "your.mail@computacenter.com"

Every commit you make can now be traced back to you, this enables collaborating work on Ansible projects.

Workflow

Git has multiple states that your files can reside in:

untracked
modified
staged
committed

The files flow through different sections of your Git project:

Working Directory - also called Working tree, this is basically your filesystem where you are developing
Staging Area - also called Index, the files that will go into your next commit
Local Repository - the .git folder where metadata and objects are stored for your project.
Remote Repository - the (optional, but recommended) upstream repository

Success

Although this seems complicated, don't worry, in most cases Git is fairly easy.

The basic Git workflow goes something like this:

You modify files in your working tree.
You selectively stage just those changes you want to be part of your next commit, which adds only those changes to the staging area.
You do a commit, which takes the files as they are in the staging area and stores that snapshot permanently to your Git directory.

The commands you will be using the most and how the files in different states flow through the stages is shown below:

sequenceDiagram
  box Remote
  participant UR as Upstream Repository
  end
  box Local
  participant LR as Local Repository
  participant SG as Staging Area
  participant WS as Working Directory
  participant SH as Stash
  end
  UR->>WS: git clone
  UR->>WS: git pull
  UR->>LR: git fetch
  LR->>WS: git checkout -b <branch-name>
  WS->>SG: git add <file>
  WS->>SG: git add -A
  SG->>LR: git commit -m "Commit message"
  LR->>UR: git push
  WS->>SH: git stash
  SH->>WS: git stash pop

Branching concept

Branches are a part of your everyday development process, they are effectively a pointer to a snapshot of your changes. When you want to add a new feature or fix a bug, you spawn a new branch to encapsulate your changes. This makes it harder for unstable code to get merged into the main code base, and it gives you the chance to clean up your future's history before merging it into the main branch.
We are using the following branches:

main (protected, only merge commits are allowed)
dev (protected, force-pushes are allowed)
feature/branch-name
bugfix/branch-name
hotfix/branch-name

The main branch is the production-code, forking (a feature or bugfix branch) is always done from the dev branch. Forking a hotfix branch is done from the main branch, as it should fix something not working with the production code.

Feature request

Creating a new feature should be done with a fork of the latest stage of the dev branch, prefix your branch-name with feature/ and provide a short, but meaningful description of the new feature.

gitGraph
   commit
   commit
   branch dev
   checkout dev
   commit
   branch feature
   checkout feature
   commit
   commit
   checkout dev
   commit
   checkout feature
   merge dev
   checkout dev
   merge feature
   commit
   checkout main
   merge dev
   checkout dev
   commit
   checkout main
   commit type:HIGHLIGHT

The complete workflow with git commands looks something like this:

$ git checkout dev
Switched to branch 'dev'
Your branch is behind 'origin/dev' by 3 commits, and can be fast-forwarded.
  (use "git pull" to update your local branch)
$ git pull
Updating b666be1..e1fc998
Fast-forward
...
$ git checkout -b feature/postgres-ha
Switched to a new branch 'feature/postgres-ha'

The single steps in order:

git checkout dev - Switching to dev branch.
git pull - Getting latest changes from upstream dev branch to local dev branch
git checkout -b feature/postgres-ha - Creating and switching to hotfix branch.

Start developing, save your work in a commit (or multiple commits).

$ git status
...
$ git add -A
...
$ git commit -m "Added tasks to configure Postgres High-Availability."

As the last step, before pushing your changes to the UR and opening a merge request, ensure that the latest changes from the dev branch (which were made by others during your feature development) are also in your branch and no merge conflicts arise.
Do the following steps:

$ git checkout dev
Switched to branch 'dev'
Your branch is behind 'origin/dev' by 2 commits, and can be fast-forwarded.
  (use "git pull" to update your local branch)
$ git pull
Updating e546ag7..klr732i
Fast-forward
...
$ git checkout -b feature/postgres-ha
...
Switched to branch 'feature/postgres-ha'
$ git merge dev
...
$ git push -u origin

Bugfix request

In case you need to fix a bug in a role or playbook, fork a new branch from dev and prefix your branch-name with bugfix/ and provide a short, but meaningful description of the unwanted behavior.

Info

The steps are the same as for a feature branch, only the branch-name should indicate that a bug is to be fixed.

gitGraph
   commit
   commit
   branch dev
   checkout dev
   commit
   branch bugfix
   checkout bugfix
   commit
   commit
   checkout dev
   commit
   checkout bugfix
   merge dev
   checkout dev
   merge bugfix
   commit
   checkout main
   merge dev
   checkout dev
   commit
   checkout main
   commit type:HIGHLIGHT

Take a look at the section above for an explanation of the single steps.

Hotfix request

gitGraph
   commit
   commit
   branch dev
   checkout dev
   commit
   checkout main
   commit
   branch hotfix
   checkout hotfix
   commit
   checkout main
   checkout hotfix
   commit
   checkout main
   merge hotfix
   checkout dev
   merge main
   commit
   commit
   checkout main
   commit type:HIGHLIGHT

The complete workflow with git commands looks something like this:

$ git checkout main
Switched to branch 'main'
Your branch is behind 'origin/main' by 11 commits, and can be fast-forwarded.
  (use "git pull" to update your local branch)
$ git pull
Updating b666be1..e1fc998
Fast-forward
...
$ git checkout -b hotfix/mitigate-prod-outage
Switched to a new branch 'hotfix/mitigate-prod-outage'

The single steps in order:

git checkout main - Switching to main branch.
git pull - Getting latest changes from upstream main branch to local main branch
git checkout -b hotfix/mitigate-prod-outage - Creating and switching to hotfix branch.

After creating (and testing!) the fixes, save your work in a commit (or multiple commits).

$ git status
...
$ git add -A
...
$ git commit -m "Fixes Issue #31, will restore prod environment."

Now, push your changes to the UR.

$ git push -u origin
...

In the UR, open a merge request from your hotfix branch to the main branch.

Note

After rolling out the changes to the production environment and ensuring the hotfix works as expected, open a new merge request against the dev branch to ensure the fixes are also available in the development stage.

Git hooks

Git Hooks are scripts that Git can execute automatically when certain events occur, such as before or after a commit, push, or merge. There are several types of Git Hooks, each with a specific purpose.

Pre-Commit

Pre-commit hooks can be used to enforce code formatting or run tests before a commit is made.

The most convenient way is the use of the pre-commit framework, install the pre-commit utility:

pip3 install pre-commit

Use the following configuration as a starting point, create the file in your project folder.

.pre-commit-config.yaml

repos:
  - repo: https://github.com/pre-commit/pre-commit-hooks
    rev: v4.4.0
    hooks:
      - id: check-yaml
      - id: check-merge-conflict
      - id: trailing-whitespace
        args: [--markdown-linebreak-ext=md]
      - id: no-commit-to-branch
      - id: requirements-txt-fixer
  - repo: https://github.com/timgrt/pre-commit-hooks
    rev: v0.2.0
    hooks:
      - id: check-file-names
      - id: check-vault-files
  - repo: https://github.com/ansible-community/ansible-lint
    rev: v6.15.0
    hooks:
      - id: ansible-lint

Take a look at https://pre-commit.com/hooks.html for additional hooks for your use-case.

Install all hooks of the .pre-commit-config.yaml file:

pre-commit install

Run the autoupdate command to update all revisions to the latest state:

pre-commit autoupdate

Success

pre-commit will now run on every commit.

You can run all hooks at any time with the following command, without committing:

pre-commit run -a

Example output

$ pre-commit run -a
check yaml...............................................................Passed
check for merge conflicts................................................Passed
trim trailing whitespace.................................................Passed
don't commit to branch...................................................Passed
fix requirements.txt.................................(no files to check)Skipped
markdownlint-docker......................................................Passed
Check files for non-compliant names......................................Passed
Ansible-lint.............................................................Failed
- hook id: ansible-lint
- exit code: 2

[...output cut for readability...]

Read documentation for instructions on how to ignore specific rule violations.

                      Rule Violation Summary  
count tag                           profile rule associated tags  
    3 role-name                     basic   deprecations, metadata
    1 name[missing]                 basic   idiom  
    2 yaml[comments]                basic   formatting, yaml  
    1 yaml[new-line-at-end-of-file] basic   formatting, yaml  

Failed after min profile: 7 failure(s), 0 warning(s) on 30 files.

Hint

The first time pre-commit runs on a file it will automatically download, install, and run the hook. Note that running a hook for the first time may be slow. but will be faster in subsequent iterations.

Offline

The pre-commit framework by default needs internet connection to setup the hooks, in disconnected environments you can build the pre-commit hook yourself.

The following script can be used as a starting point, it uses ansible-lint from inside a container (see Lint in Docker Image how to build it) and also checks for unencrypted files in your commit.

.git/hooks/pre-commit

#!/bin/bash
#
# File should be .git/hooks/pre-commit and executable
#

# Pre-commit hook that runs ansible-lint Container for best practice checking
# If lint has errors, commit will fail with an error message.
if [[ ! $(docker inspect ansible-lint) ]] ; then
  echo "# DOCKER IMAGE NOT FOUND"
  echo "# Build the Docker image from the Gitlab project 'ansible-lint Docker Image'."
  echo "# No linting is done!"
else
  echo "# Running 'ansible-lint' against commit, this takes some time ..."
  # Getting all files currently staged and storing them in variable
  FILES_TO_LINT=$(git diff --cached --name-only)
  # Running with shared profile, see https://ansible-lint.readthedocs.io/profiles/
  if [ -z "$FILES_TO_LINT" ] ; then
    echo "# No files linting found. Add files to SG area with 'git add <file>'."
  else
    docker run --rm -v $(pwd):/data ansible-lint $FILES_TO_LINT
    if [ ! $? = 0 ]; then
      echo "# COMMIT REJECTED"
      echo "# Please fix the shown linting errors"
      echo "#   (or force the commit with '--no-verify')."
      exit 1;
    fi
  fi
fi

# Pre-commit hook that verifies if all files containing 'vault' in the name
# are encrypted.
# If not, commit will fail with an error message.
# Finds all files in 'inventory' folder or 'files' folder in roles. Files in other
# locations are not recognized!
FILES_PATTERN='(inventory.*vault.*)|(files.*vault.*)'
REQUIRED='ANSIBLE_VAULT'

EXIT_STATUS=0
wipe="\033[1m\033[0m"
yellow='\033[1;33m'
# carriage return hack. Leave it on 2 lines.
cr='
'
echo "# Checking for unencrypted vault files in commit ..."
for f in $(git diff --cached --name-only | grep -E $FILES_PATTERN)
do
  # test for the presence of the required bit.
  MATCH=`head -n1 $f | grep --no-messages $REQUIRED`
  if [ ! $MATCH ] ; then
    # Build the list of unencrypted files if any
    UNENCRYPTED_FILES="$f$cr$UNENCRYPTED_FILES"
    EXIT_STATUS=1
  fi
done
if [ ! $EXIT_STATUS = 0 ] ; then
  echo '# COMMIT REJECTED'
  echo '# Looks like unencrypted ansible-vault files are part of the commit:'
  echo '#'
  while read -r line; do
    if [ -n "$line" ] ; then
      echo -e "#\t${yellow}unencrypted:   $line${wipe}"
    fi
  done <<< "$UNENCRYPTED_FILES"
  echo '#'
  echo "# Please encrypt them with 'ansible-vault encrypt <file>'"
  echo "#   (or force the commit with '--no-verify')."
  exit $EXIT_STATUS
fi
exit $EXIT_STATUS

Linting

Ansible Lint is a best-practice checker for Ansible, maintained by the Ansible community.

Installation

Ansible Lint is installed through the Python packet manager:

Note

Ansible Lint always needs Ansible itself, ansible-core is enough.

pip3 install ansible-lint

Configuration

Minimal configuration is necessary, use the following as a starting point in your project directory:

.ansible-lint

---
profile: shared

# Silence infos, warnings and don't show summary
quiet: true

skip_list:
  - role-name

# Enable some useful rules which are opt-in
enable_list:
  - args
  - empty-string-compare
  - no-log-password
  - no-same-owner

Profiles gradually increase the strictness of rules, from lowest to highest, every profile extends to previous:

Strictness	Profile name	Description
1	min	ensures that Ansible can load content, rules in this profile are mandatory
2	basic	prevents common coding issues and enforces standard styles and formatting
3	moderate	ensures that content adheres to best practices for making content easier to read and maintain
4	safety	avoids module calls that can have non-determinant outcomes or security concerns
5	shared	for packaging and publishing to galaxy.ansible.com, automation-hub, or a private instance
6	production	for inclusion in AAP as validated or certified content

Take a look at the official documentation for more information.

Usage

The usage is fairly simple, just run ansible-lint <your-playbook>.
The tool will check your playbook for best-practices, it traverses your playbook and will lint all included playbooks and roles.

Take a look at the ansible-lint documentation for additional information.

Lint in Docker Image

The following Dockerfile can be used to build a Docker Container image which bundles ansible-lint and its dependencies:

Dockerfile

FROM python:3.9-slim

# Enable colored output
ENV TERM xterm-256color

# Defining Ansible environment variable to not output deprecation warnings. This is not useful in the linting container.
# This overwrites the value in the ansible.cfg from volume mount
ENV ANSIBLE_DEPRECATION_WARNINGS=false

# Install requirements.
RUN apt-get update && apt-get install -y \
  git \
  && rm -rf /var/lib/apt/lists/*

# Update pip
RUN python3 -m pip install --no-cache-dir --no-compile --upgrade pip

# Install ansible-lint and dependencies
RUN pip3 install --no-cache-dir --no-compile ansible-lint ansible yamllint

WORKDIR /data
ENTRYPOINT ["ansible-lint"]
CMD ["--version"]

Build the container image, the command expects that the Dockerfile is present in the current directory:

docker build -t ansible-lint .

After building the image, the image can be used. Inside of the Ansible project directory, run this command (e.g. this lints the site.yml playbook).

docker run --rm -v $(pwd):/data ansible-lint site.yml

The output for example is something like this, ansible-lint reports a warning regarding unnecessary white-spaces in a line, as well as an error regarding unset file permissions (fix could be setting mode: 0644 in the task):

$ docker run --rm -v $(pwd):/data ansible-lint site.yml
WARNING  Overriding detected file kind 'yaml' with 'playbook' for given positional argument: site.yml
WARNING  Listing 2 violation(s) that are fatal
yaml: trailing spaces (trailing-spaces)
roles/network/tasks/cacheserve-loopback-interface.yml:19

risky-file-permissions: File permissions unset or incorrect
roles/network/tasks/cacheserve-loopback-interface.yml:43 Task/Handler: Deploy loopback interface config for Cacheserve

You can skip specific rules or tags by adding them to your configuration file:
# .ansible-lint
warn_list:  # or 'skip_list' to silence them completely
  - experimental  # all rules tagged as experimental
  - yaml  # Violations reported by yamllint

Finished with 1 failure(s), 1 warning(s) on 460 files.

To simplify the usage, consider adding an alias to your .bashrc, e.g.:

# .bashrc
# User specific aliases and functions
alias lint="docker run --rm -v $(pwd):/data ansible-lint"

After running source ~/.bashrc you can use the alias:

lint site.yml

Automated Linting

Lining can and should be done automatically, this way you can't forget to check your playbook for best practices. This can be done on multiple levels, either locally as part of your Git workflow, as well as with a pipeline in your remote repository.

Git pre-commit hook

A nice way to check for best practices during your Git workflow is the usage of a pre-commit hook. These hooks can be simple bash script, which are run whenever you are committing changes locally to the staging area or a framework/utility like pre-commit.

Take a look at the Version Control section for installing and configuring pre-commit hooks.

CI Pipeline

Running ansible-lint through a CI pipeline automatically when merging changes to the Git repository is highly advisable.

A possible pipeline in Gitlab may look like this, utilizing the container image above:

workflow:
  rules:
    - if: $CI_PIPELINE_SOURCE == 'merge_request_event'
    - if: $CI_PIPELINE_SOURCE == 'web'
    - if: $CI_PIPELINE_SOURCE == 'schedule'

variables:
  GIT_STRATEGY: clone

stages:
  - prepare
  - syntax
  - lint

prepare:
  stage: prepare
  script:
    - 'echo -e "### Prepare playbook execution. ###"'
    - 'cp ansible.cfg.sample-lab ansible.cfg'
    - 'echo -e "$VAULT_PASSWORD" > .vault-password'
  artifacts:
    paths:
      - ansible.cfg
      - .vault-password
  cache:
    paths:
      - ansible.cfg
      - .vault-password
  tags:
    - ansible-lint

syntax-check:
  stage: syntax
  script:
    - 'echo -e "Perform a syntax check on the playbook. ###"'
    - 'docker run --rm --entrypoint ansible-playbook -v $(pwd):/data ansible-lint site.yml --syntax-check'
  cache:
    paths:
      - ansible.cfg
      - .vault-password
  dependencies:
    - prepare
  tags:
    - ansible-lint

ansible-lint:
  stage: lint
  script:
    - 'echo -e "### Check for best practices with ansible-lint. ###"'
    - 'echo -e "### Using ansible-lint version: ###"'
    - 'docker run --rm -v $(pwd):/data ansible-lint'
    - 'docker run --rm -v $(pwd):/data ansible-lint site.yml'
  cache:
    paths:
      - ansible.cfg
      - .vault-password
  dependencies:
    - prepare
  tags:
    - ansible-lint

If you want to utilize the installed ansible and ansible-lint utilities on the host running the Gitlab Runner change the commands in the syntax stage to ansible-playbook site.yml --syntax-check and in the lint stage to ansible-lint --version and ansible-lint site.yml.

Testing

With many people contributing to the automation, it is crucial to test the automation content in-depth. So when you’re developing new Ansible Content like playbooks, roles and collections, it’s a good idea to test the content in a test environment before using it to automate production infrastructure. Testing ensures the automation works as designed and avoids unpleasant surprises down the road.
Testing automation content is often a challenge, since it requires the deployment of specific testing infrastructure as well as setting up the testing conditions to ensure the tests are relevant.

Consider the following list for testing your Ansible content, with increasing complexity:

yamllint
ansible-playbook --syntax-check
ansible-lint
molecule test
ansible-playbook --check (against production)
Parallel infrastructure

Syntax check

The whole playbook (and all roles and tasks) need to, minimally, pass a basic ansible-playbook syntax check run.

ansible-playbook main.yml --syntax-check

Running this as a step in a CI Pipeline is advisable.

Linting

Take a look at the Linting section for further information.

Molecule

The Molecule project is designed to aid in the development and testing of Ansible roles, provides support for testing with multiple instances, operating systems and distributions, virtualization providers, test frameworks and testing scenarios.
Molecule is mostly used to test roles in isolation (although it is possible to test multiple roles or playbooks at once). To test against a fresh system, molecule uses a container runtime to provision virtualized/containerized test hosts, runs commands on them and asserts the success.
By default, Containers don't allow services to be installed, started and stopped as in a virtual machine. We will be using custom systemd-enabled images, which are designed to run an init system as PID 1 for running multi-services inside the container. Also, some additional configuration is needed in the Molecule configuration file as shown below.

Take a look at the Molecule documentation for a full overview.

Installation

The described configuration below expects the Podman container runtime on the Ansible Controller (other drivers like Docker are available). You can install Podman with the following command:

sudo apt install podman

The Molecule binary and dependencies are installed through the Python package manager, you'll need a fairly new Python version (Python >= 3.10 with ansible-core >= 2.12).
Use a Python Virtual environment (requires the python3-venv package) to encapsulate the installation from the rest of your Controller.

python3 -m venv molecule-venv

Activate the VE:

source molecule-venv/bin/activate

Install dependencies, after upgrading pip:

pip3 install --upgrade pip setuptools

pip3 install ansible-core molecule molecule-plugins[podman]

Molecule plugins contains the following provider:

azure
containers
docker
ec2
gce
podman
vagrant

Note

The Molecule Podman provider requires the modules of the containers.podman collection (as it provisions the containers with Ansible itself).
If you only installed ansible-core, you'll need to install the collection separately:

ansible-galaxy collection install containers.podman

If you are done with Molecule testing, use deactivate to leave your VE.

Configuration

The molecule configuration files are kept in the role folder you want to test. Create the directory molecule/default and at least the molecule.yml and converge.yml:

roles/
└── webserver_demo
    ├── defaults
    │   └── main.yml
    ├── molecule
    │   └── default
    │       ├── converge.yml
    │       └── molecule.yml
    ├── tasks
    │   └── main.yml
    └── templates
        └── index.html

You may use these example configurations as a starting point. It expects that the Container image is already present (use podman pull docker.io/timgrt/rockylinux9-ansible:latest).

Central Molecule configurationPlaybook filePreparation stageVerification

molecule.yml

---
driver:
  name: podman
platforms: # (1)!
  - name: instance1 # (2)!
    groups: # (3)!
      - molecule
      - rocky
    image: docker.io/timgrt/rockylinux9-ansible:latest # (4)!
    volumes:
      - /sys/fs/cgroup:/sys/fs/cgroup:ro
    command: "/usr/sbin/init"
    pre_build_image: true # (5)!
    exposed_ports:
      - 80/tcp
    published_ports: # (6)!
      - 8080:80/tcp
provisioner:
  name: ansible
  options:
    D: true # (7)!
  connection_options:
    ansible_user: ansible # (8)!
  config_options:
    defaults:
      interpreter_python: auto_silent
      callback_whitelist: profile_tasks, timer, yaml # (9)!
  inventory:
    links:
      group_vars: ../../../../inventory/group_vars/ # (10)!
scenario: # (11)!
  create_sequence:
    - create
    - prepare
  converge_sequence:
    - create
    - prepare
    - converge
  test_sequence:
    - destroy
    - create
    - converge
    - idempotence
    - destroy
  destroy_sequence:
    - destroy

List of hosts to provision by molecule, copy the list item and use a unique name if you want to deploy multiple containers. In the following example one Container with Rocky Linux 8 and one Ubuntu 20.04 container are provisioned.

  - name: rocky8-instance1
    image: docker.io/timgrt/rockylinux9-ansible:latest
    volumes:
      - /sys/fs/cgroup:/sys/fs/cgroup:ro
    tmpfs:
      - /run
      - /tmp
    command: "/usr/sbin/init"
    pre_build_image: true
    groups:
      - molecule
      - rocky
  - name: ubuntu2004
    image: docker.io/timgrt/ubuntu2004-ansible:latest
    volumes:
      - /sys/fs/cgroup:/sys/fs/cgroup:ro
    command: "/lib/systemd/systemd"
    pre_build_image: true
    groups:
      - molecule
      - ubuntu

The name of your container, for better identification you could use e.g. demo.${USER}.molecule which uses your username from environment variable substitution, showing who deployed the container for what purpose.
Additional groups the host should be part of, using a custom molecule group for referencing in converge.yml.
If you want your container to inherit variables from group_vars (see inventory.links.group_vars in the provisioner section), add the group(s) to this list.
For more information regarding the used container image, see https://hub.docker.com/r/timgrt/rockylinux9-ansible. The image provides a systemd-enabled environment, this ensures you can install and start services with systemctl as in any normal VM.
Some more useful images are:
Container image must be present before running Molecule, pull it with podman pull docker.io/timgrt/rockylinux9-ansible:latest
When running a webserver inside the container (on port 80), this will publish the container port 80 to the host port 8080. Now, you can check the webserver content by using http://localhost:8080 (or use the IP of your host).
Enables diff mode, set to false if you don't want that.
Uses the ansible user to connect to the container (defined in the container image), this way you can test with become. Otherwise you would connect with the root user, most likely this is not what you would do in production.
Adds a timer to every task and the overall playbook run, as well as formatting the Ansible output to YAML for better readability.
Install necessary collections with ansible-galaxy collection install ansible.posix community.general.
If you want your container to inherit variables from group_vars, reference the location of your group_vars (here they are stored in the subfolder inventory of the project, searching begins in the scenario folder defaults). Delete the inventory key and all content if you don't need this.
A scenario allows Molecule to test a role in a particular way, these are the stages when executing Molecule.
For example, running molecule converge would create a container (if not already created), prepare it (if not already prepared) and run the converge stage/playbook.

converge.yml

The role to test must be defined here, change role-name to the actual name.

---
- name: Converge
  hosts: molecule
  become: true
  roles:
    - role-name

prepare.yml

Adds an optional preparation stage (referenced by prepare in the scenario definition).
For example, if you want to test SSH Key-Pair creation in your container (this is also used by the user module to create SSH keys), install the necessary packages before running the role itself.

---
- name: Prepare
  hosts: molecule
  become: true
  tasks:
    - name: Install OpenSSH for ssh-keygen
      ansible.builtin.package:
        name: openssh
        state: present

Remember, you are using a Container image, not every package from the distribution is installed by default to minimize the image size.

verify.yml

Adds an optional verification stage (referenced by verify in the scenario definition). Not used in the example above.

Add this block to your molecule.yml as a top-level key:

verifier:
  name: ansible

The verify.yml contains your tests for your role.

---
- name: Verify
  hosts: molecule
  become: true
  tasks:
    - name: Get service facts
      ansible.builtin.service_facts:

    # Service may have started, returning 'OK' in the service module, but may have failed later.
    - name: Ensure that MariaDB is in running state
      assert:
        that:
          - ansible_facts['services']['mariadb.service']['state'] == 'running'

Other verifiers like testinfra can be used.

Usage

Molecule is executed from within the role you want to test, change directory:

cd roles/webserver_demo

From here, run the molecule scenario, after activating your Python VE with molecule:

source molecule-venv/bin/activate

To only create the defined containers, but not run the Ansible tasks:

molecule create

To run the Ansible tasks of the role (if the container does not exist, it will be created):

molecule converge

To execute a full test circle (existing containers are deleted, re-created and Ansible tasks are executed, containers are deleted(!) afterwards):

molecule test

If you want to login to a running container instance:

molecule login

Minimal testing environment

Tip

This is meant as a quick and dirty testing or demo environment only, for anything more sophisticated, use Molecule (as you most likely will be moving your content into one or more roles anyway).

You'll miss out on the convenient and frankly easy to use possibilities of Molecule, but, if you just need a small environment for testing your Ansible content without impacting your Ansible Control Node, the following setup spins up a small one in (Podman) containers. You will need Podman and Ansible (naturally), but nothing else.

Installation

You can install Podman with the following command:

sudo apt install podman

The playbook to create the testing instances uses the containers.podman collection, if you only installed ansible-core, you'll need to install the collection separately:

ansible-galaxy collection install containers.podman

Configuration

Copy the three files in the separate tabs, a playbook for creating the testing environment, an inventory file defining the testing instances and a small demo playbook which can be used to test your Ansible content.

Create test environmentTesting inventoryTesting Playbook

testing_environment.yml

---
- name: Create or delete demo environment for local testing
  hosts: localhost
  connection: local
  vars:
    testing_image: docker.io/timgrt/rockylinux9-ansible:latest
  tasks:
    - name: "{{ (delete | default(false)) | ternary('Delete', 'Create') }} demo instance"
      containers.podman.podman_container:
        name: "{{ item }}"
        hostname: "{{ item }}"
        image: "{{ testing_image }}"
        volumes:
          - /sys/fs/cgroup:/sys/fs/cgroup:ro
        command: "/usr/sbin/init"
        state: "{{ (delete | default(false)) | ternary('absent', 'started') }}"
      loop: "{{ groups['test'] }}"

testing_inventory.yml

Add additional instances in the test group, if necessary.

[test]
instance1

[test:vars]
ansible_user=ansible
ansible_connection=podman

testing_inventory.yml

Add your tasks to this playbook and start testing. If you want to use your own playbook, target the test group as well.

---
- name: Testing playbook
  hosts: test
  tasks:
    - name: Output distribution
      ansible.builtin.debug:
        msg: "{{ ansible_distribution }}"

Usage

First, create the testing instances by executing the testing_environment.yml playbook:

ansible-playbook -i testing_inventory.ini testing_environment.yml

Add your tasks to the testing_playbook.yml (or use your existing playbook, target the test group) and execute:

ansible-playbook -i testing_inventory.ini testing_playbook.yml

After finishing your tests remove the instances by running the testing_environment.yml playbook and provide the extra-var delete:

ansible-playbook -i testing_inventory.ini testing_environment.yml -e delete=true

Extending Ansible

Ansible is easily customizable, you can extend Ansible by adding custom modules or plugins.
You might wonder whether you need a module or a plugin. Ansible modules are units of code that can control system resources or execute system commands. Ansible provides a module library that you can execute directly on remote hosts or through playbooks.
Similar to modules are plugins, which are pieces of code that extend core Ansible functionality. Ansible uses a plugin architecture to enable a rich, flexible, and expandable feature set. It ships with several plugins and lets you easily use your own plugins.

Store custom content

Custom modules can be stored in the library folder in your project root directory, plugins need to be stored in folders called <plugin type>_plugins, e.g. filter_plugins. These locations are still valid, but it is recommended to store custom content in a collection, this way you have all your custom content in a single location (folder).

You can store custom collections with your Ansible project, create it with the ansible-galaxy utility and provide the --init-path parameter. The folder collections/ansible_collections will automatically be picked up by Ansible (although your custom collection is not shown by the ansible-galaxy collection list command, adjust the ansible.cfg for that, take a look into the next subsection).

ansible-galaxy collection init computacenter.utils --init-path collections/ansible_collections

This creates the following structure:

collections/
└── ansible_collections
    └── computacenter
        └── utils
            ├── README.md
            ├── docs
            ├── galaxy.yml
            ├── plugins
            │   └── README.md
            └── roles

Create subfolder beneath the plugins folder, modules for modules and e.g. filter for filter plugins. Take a look into the included README.md in the plugins folder. Store your custom content in python files in the respective folders.

Tip

Only underscores (_) are allowed for filenames inside collections!
Naming a file cc-filter-plugins.py will result in an error!

Listing (custom) collections

When storing custom collections alongside your project and you want to list all collections, you need to adjust your Ansible configuration. You will be able to use your custom collection nevertheless, this is more a quality of life change.

Adjust the collections_paths parameter in the defaults section of your ansible.cfg:

[defaults]
collections_paths = ~/.ansible/collections:/usr/share/ansible/collections:./collections

The first two paths are the default locations for collections, paths are separated with colons.

Listing collections

Using a custom collection in the project folder test with adjusted configuration file.

$ ansible-galaxy collection list

# /home/tgruetz/.ansible/collections/ansible_collections
Collection        Version
----------------- -------
ansible.netcommon 4.1.0  
ansible.posix     1.4.0  
ansible.utils     2.8.0  
cisco.aci         2.3.0  
cisco.ios         4.2.0  
community.docker  3.3.2  
community.general 6.1.0  

# /home/tgruetz/test/collections/ansible_collections
Collection          Version
------------------- -------
computacenter.utils 1.0.0

Custom facts

The setup module in Ansible automatically discovers a standard set of facts about each host. If you want to add custom values to your facts, you can provide permanent custom facts using the facts.d directory or even write a custom facts module.

Static facts

The easiest method is to add an .ini file to /etc/ansible/facts.d on the remote host, e.g.

/etc/ansible/facts.d/general.fact

[owner]
name=Computacenter AG
community=Ansible Community

[environment]
stage=production

Warning

Ensure the file has the .fact extension and is not executable, this will break the ansible.builtin.setup module!

For example, running an ad-hoc command against an example host with the custom fact:

$ ansible -i inventory test -m ansible.builtin.setup -a filter=ansible_local
ubuntu | SUCCESS => {
     "ansible_facts": {
        "ansible_local": {
            "general": {
                "environment": {
                    "stage": "production"
                },
                "owner": {
                    "community": "Ansible Community",
                    "name": "Computacenter AG"
                }
            }
        },
        "discovered_interpreter_python": "/usr/bin/python3"
    },
    "changed": false
}

The parent key for the custom fact is the name of the file, the lower keys are the section names of the ini file.

Hint

The key in ansible_facts for custom content is always ansible_local, this has nothing to do with running locally.

Dynamic facts

You can also use facts.d to execute a script on the remote host, generating dynamic custom facts to the ansible_local namespace. Consider the following points when creating dynamic custom facts:

must return JSON data
must have the .fact extension (add the correct Shebang!)
is executable by the Ansible connection user
dependencies must be installed on the remote host

For example, a custom fact returning information about running or exited Docker containers on the remote host can look like this:

/etc/ansible/facts.d/docker-containers.fact

#!/usr/bin/env python3

# DEPENDENCY: requires Python module 'docker', install e.g. with 'pip3 install docker' or install 'python3-docker' rpm with package manager

import json

try:
    import docker
except ModuleNotFoundError:
    print(json.dumps({"error": "Python docker module not found! Install requirements!"}))
    raise SystemExit()

try:
    client = docker.from_env()
except docker.errors.DockerException:
    print(json.dumps({"error": "Docker Client not instantiated! Is Docker running?"}))
    raise SystemExit()

def exited_containers():
    exited_containers = []

    for container in client.containers.list(all=True,filters={"status": "exited"}):
        exited_containers.append({"id": container.short_id, "name": container.name, "image": container.image.tags[0]})

    return exited_containers

def running_containers():
    running_containers = []

    for container in client.containers.list():
        running_containers.append({"id": container.short_id, "name": container.name, "image": container.image.tags[0]})

    return running_containers


def main():

    container_facts = {"running": running_containers(), "exited": exited_containers()}
    print(json.dumps(container_facts))

if __name__ == '__main__':
   main()

The custom fact returns a JSON dictionary with two lists, running and exited. Every list item has the Container ID, name and image.

Warning

Using the fact requires the Python docker module (mind the import docker statement) and the Docker service running on the target node.
Otherwise, an error message is returned, e.g.:

"ansible_local": {
        "docker-containers": {
            "error": "Python docker module not found! Install requirements!"
        }
    }

"ansible_local": {
        "docker-containers": {
            "error": "Docker Client not instantiated! Is Docker running?"
        }
    }

Executing fact gathering for example returns this:

$ ansible -i inventory test -m setup -a filter=ansible_local
ubuntu | SUCCESS => {
    "ansible_facts": {
        "ansible_local": {
            "docker-containers": {
                "exited": [
                    {
                        "id": "a6bfc512b842",
                        "image": "timgrt/rockylinux8-ansible:latest",
                        "name": "rocky-linux"
                    }
                ],
                "running": [
                    {
                        "id": "f3731d560625",
                        "image": "local/timgrt/ansible-best-practices:latest",
                        "name": "ansible-best-practices"
                    }
                ]
            }
        },
        "discovered_interpreter_python": "/usr/bin/python3"
    },
    "changed": false
}

In the example, we have one running container and one stopped container.

Additional info

Running docker ps on the target host

$ docker ps -a
CONTAINER ID   IMAGE                                 COMMAND                  CREATED             STATUS                           PORTS                  NAMES
a6bfc512b842   timgrt/rockylinux8-ansible:latest     "/usr/lib/systemd/sy…"   About an hour ago   Exited (137) About an hour ago                          rocky-linux
f3731d560625   local/timgrt/ansible-best-practices   "/bin/sh -c 'python …"   4 hours ago         Up 4 hours                       0.0.0.0:8080->80/tcp   ansible-best-practices

Executing the script standalone (using a JSON module for better readability):

$ /etc/ansible/facts.d/docker-containers.fact | python3 -m json.tool
{
    "running": [
        {
            "id": "f3731d560625",
            "name": "ansible-best-practices",
            "image": "local/timgrt/ansible-best-practices:latest"
        }
    ],
    "exited": [
        {
            "id": "a6bfc512b842",
            "name": "rocky-linux",
            "image": "timgrt/rockylinux8-ansible:latest"
        }
    ]
}

Developing modules

Modules are reusable, standalone scripts that can be used by the Ansible API, the ansible command, or the ansible-playbook command. Modules provide a defined interface. Each module accepts arguments and returns information to Ansible by printing a JSON string to stdout before exiting. Modules execute on the target system (usually that means on a remote system) in separate processes. Modules are technically plugins, but for historical reasons we do not usually talk about “module plugins”.

Warning

Work in Progress - More description necessary.

Developing plugins

Plugins extend Ansible’s core functionality and execute on the control node within the /usr/bin/ansible process. Plugins offer options and extensions for the core features of Ansible e.g. transforming data, logging output, connecting to inventory, and more. Take a look into the Ansible Developer Documentation for an overview of the different plugin types.

All plugins must

be written in Python (in a compatible version of Python)
raise errors (when things go wrong)
return strings in unicode (to run through Jinja2)
conform to Ansible’s configuration and documentation standards (how to use your plugin)

Depending on the type of plugin you want to create, different considerations need to be taken, the next subsections give a brief overview with a small example. Always use the latest Ansible documentation for additional information.

Tip

The usage of the FQCN for your Plugin is mandatory!

Filter plugins

Filter plugins manipulate data. They are a feature of Jinja2 and are also available in Jinja2 templates used by the template module. As with all plugins, they can be easily extended, but instead of having a file for each one you can have several per file.

This file may be used as a minimal starting point, it includes a small example:

cc_filter_plugins.py

from __future__ import absolute_import, division, print_function
__metaclass__ = type

from ansible.errors import AnsibleError# (1)!
from ansible.module_utils.common.text.converters import to_native# (2)!

try:
    import netaddr# (3)!
except ImportError as imp_exc:
    NETADDR_IMPORT_ERROR = imp_exc
else:
    NETADDR_IMPORT_ERROR = None


def sort_ip(unsorted_ip_list):# (4)!
    # Function sorts a given list of IP addresses

    if NETADDR_IMPORT_ERROR:
        raise AnsibleError('netaddr library must be installed to use this plugin') from NETADDR_IMPORT_ERROR

    if not isinstance(unsorted_ip_list, list):# (5)!
        raise AnsibleError("Filter needs list input, got '%s'" % type(unsorted_ip_list))
    else:
        try:
            sorted_ip_list = sorted(unsorted_ip_list, key=netaddr.IPAddress)# (6)!
        except netaddr.core.AddrFormatError as e:
            raise AnsibleError('Error from netaddr library, %s' % to_native(e))

    return sorted_ip_list# (7)!


class FilterModule(object): # (8)!

    def filters(self):
        return {
            # Sorting list of IP Addresses
            'sort_ip': sort_ip # (9)!
        }

This is the most generic AnsibleError object, depending on the specific plugin type you’re developing you may want to use different ones.
Use this to convert plugin output to convert output into Python’s unicode type (to_text) or for wrapping other exceptions into error messages (to_native).
This is a non-standard dependency, the user needs to install this beforehand (e.g. pip3 install netaddr --user), therefore surrounding it with try-except. Document necessary requirements!
Example plugin definition, this sorts a given list of IP addresses ( Jinja2 sort filter does not work correctly with IPs), it expects a list.
Testing if input is a list, otherwise return an error message. Maybe another error type (e.g. AnsibleFilterTypeError) is more appropriate? What other exceptions need to be caught?
This line sorts the list with the built-in Python sorted() library, the key specifies the comparison key for each list element, it uses the netaddr library.
The function returns a sorted list of IPs.
Main class, this is called by Ansible's PluginLoader.
Mapping of filter name and definition, you may call your filter like this: "{{ ip_list | sort_ip }}" (this only works when stored in the project root in the folder filter_plugins, otherwise you need to use the FQCN!). Filter name and definition do not need to have the same name. Add more filter definitions by comma-separation.

The Python file needs to be stored in a collection, e.g.:

collections/
└── ansible_collections
    └── computacenter
        └── utils
            ├── README.md
            ├── docs
            ├── galaxy.yml
            ├── plugins
            │   ├── README.md
            │   └── filter
            │       └── cc_filter_plugins.py
            └── roles

Now, the filter can be used:

sorted_ip_list: "{{ ip_list | computacenter.utils.sort_ip }}"

Inventory plugins

Ansible can pull information from different sources, like ServiceNow, Cisco etc. If your source is not covered with the integrated inventory plugins, you can create your own.

For more information take a look at Ansible docs - Developing inventory plugin.

Key things to note

The DOCUMENTATION section is required and used by the plugin. Note how the options here reflect exactly the options we specified in the csv_inventory.yaml file in the previous step.
The NAME should exactly match the name of the plugin everywhere else.
For details on the imports and base classes/helpers take a look at the python code in Github

This file may be used as a minimal starting point, it includes a small example:

cc_cisco_prime.py

from __future__ import absolute_import, division, print_function

__metaclass__ = type

# (1)!
DOCUMENTATION = r'''
    name: cc_cisco_prime
    author:
    - Kevin Blase (@FlachDerPlatte)
    - Jonathan Schmidt (@SchmidtJonathan1)
    short_description: Inventory source for Cisco Prime API.
    description:
    - Builds inventory from Cisco Prime API.
    - Requires a configuration file ending in C(prime.yml) or C(prime.yaml).
        See the example section for more details.
    version_added: 1.0.0
    extends_documentation_fragment:
    - ansible.builtin.constructed
    notes:
    - Nothing
    options:
    plugin:
        description:
        - The name of the Cisco Prime API Inventory Plugin.
        - This should always be C(computacenter.utils.cc_cisco_prime).
        required: true
        type: str
        choices: [ computacenter.utils.cc_cisco_prime ]
'''

# (2)!
EXAMPLES = r'''
    # Inventory File in YAML format
    plugin: computacenter.utils.cc_cisco_prime
    api_user: user123
    api_pass: password123
    api_host_url: host.domain.tld
'''

import requests
from ansible.errors import AnsibleParserError
from ansible.inventory.group import to_safe_group_name
from ansible.plugins.inventory import (
    BaseInventoryPlugin,
    Constructable,
    to_safe_group_name,
)

class InventoryModule(BaseInventoryPlugin, Constructable):

    NAME = 'computacenter.utils.cc_cisco_prime'  # used internally by Ansible, it should match the file name but not required

    def verify_file(self, path): # (3)!
        valid = False
        if super(InventoryModule, self).verify_file(path):
            if path.endswith(('prime.yaml', 'prime.yml')):
                valid = True
            else:
                self.display.vvv(
                    'Skipping due to inventory source not ending in "prime.yaml" nor "prime.yml"')
        return valid

    def add_host(self, hostname, host_vars):
        self.inventory.add_host(hostname, group='all')

        for var_name, var_value in host_vars.items():
            self.inventory.set_variable(hostname, var_name, var_value)

        strict = self.get_option('strict')

        # Add variables created by the user's Jinja2 expressions to the host
        self._set_composite_vars(self.get_option('compose'), host_vars, hostname, strict=True)

        # Create user-defined groups using variables and Jinja2 conditionals
        self._add_host_to_composed_groups(self.get_option('groups'), host_vars, hostname, strict=strict)
        self._add_host_to_keyed_groups(self.get_option('keyed_groups'), host_vars, hostname, strict=strict)
...

Declare option that are needed in the plugin. More about documentation
Example with parameter for a inventory file to run the script.
Different methods like verify_file, parse and more. Additional information about class and function here

The Python file needs to be stored in a collection, e.g.:

collections/
└── ansible_collections
    └── computacenter
        └── utils
            ├── README.md
            ├── plugins
            │   ├── README.md
            │   └── inventory
            │       └── cc_cisco_prime.py
            └── roles

To run this script, create a inventory file with the correct entries, as in the examples section of the inventory script.

# inventory.yml
plugin: computacenter.utils.cc_cisco_prime
api_user: "user123"
api_pass: "password123"
api_host_url: "host.domain.tld"

Run your playbook, referencing the custom inventory plugin file:

ansible-playbook -i inventory.yml main.yml

Monitoring & Troubleshooting

This section describes different methods to monitor or troubleshoot your Ansible playbook runs.
When you need metrics about playbook execution and machine resource consumption, callback plugins can help you drill down into the data and troubleshoot issues.

How long does it take?

To measure the time spent for tasks and the overall playbook run, multiple callback plugins are available. Install the necessary collections which include the desired callback plugins:

ansible-galaxy collection install ansible.posix

The following plugins are available and useful for different purposes.

ansible.posix.timer - Adds total play duration to the play stats.
ansible.posix.profile_tasks - For timing individual tasks and overall execution time.
ansible.posix.profile_roles - Adds timing information to roles.

Tip

To use the callback plugins, they need to be enabled.

For example, to show the start-time and duration for every task, you can use the timer and profile_tasks callback plugin. Add the following block to your ansible.cfg:

[defaults]
callbacks_enabled = ansible.posix.timer, ansible.posix.profile_tasks

Example output

$ ansible-playbook -i inventory.ini create_workshop_environment.yml

PLAY [Create Workshop environment] ****************************************************************************************************

TASK [Gathering Facts] ****************************************************************************************************************
Saturday 07 September 2024  16:05:19 +0200 (0:00:00.004)       0:00:00.004 ****
ok: [localhost]

TASK [Get package facts] **************************************************************************************************************
Saturday 07 September 2024  16:05:20 +0200 (0:00:00.836)       0:00:00.840 ****
ok: [localhost]

[...cut for readability...]

PLAY RECAP ****************************************************************************************************************************
localhost                  : ok=10   changed=6    unreachable=0    failed=0    skipped=4    rescued=0    ignored=0  
node1                      : ok=5    changed=3    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0  
node2                      : ok=5    changed=3    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0  
node3                      : ok=5    changed=3    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0  

Playbook run took 0 days, 0 hours, 0 minutes, 43 seconds
Saturday 07 September 2024  16:06:03 +0200 (0:00:02.318)       0:00:43.633 ****
===============================================================================
Install SSH daemon ------------------------------------------------------------------------------------------------------------ 25.25s
Start managed node containers, publish 3 ports for each container -------------------------------------------------------------- 3.67s
Gathering Facts ---------------------------------------------------------------------------------------------------------------- 2.92s
Start SSH daemon --------------------------------------------------------------------------------------------------------------- 2.64s
Add public key of workshop SSH keypair to authorized_keys of ansible user ------------------------------------------------------ 2.32s
Remove /run/nologin to be able to login as unprivileged user ------------------------------------------------------------------- 2.20s
Create OpenSSH keypair for accessing managed nodes ----------------------------------------------------------------------------- 1.38s
Get package facts -------------------------------------------------------------------------------------------------------------- 0.84s
Gathering Facts ---------------------------------------------------------------------------------------------------------------- 0.84s
Pull image for managed node containers ----------------------------------------------------------------------------------------- 0.52s
Create workshop inventory file ------------------------------------------------------------------------------------------------- 0.28s
Deploy ansible.cfg to home directory ------------------------------------------------------------------------------------------- 0.19s
Create folder for workshop inventory ------------------------------------------------------------------------------------------- 0.18s
Add block to ssh_config for easy SSH access to managed nodes ------------------------------------------------------------------- 0.17s
Check for existing SSH keypair ------------------------------------------------------------------------------------------------- 0.14s
Install Podman ----------------------------------------------------------------------------------------------------------------- 0.03s
Backup file of .ansible.cfg created -------------------------------------------------------------------------------------------- 0.02s
Check if OpenSSH keypair does not match target configuration ------------------------------------------------------------------- 0.02s
Abort playbook if keypair was found and does not match target configuration ---------------------------------------------------- 0.02s

How much resources are consumed?

To measure system resources used by Ansible, you can use the following callback plugins, both are utilizing cgroups.

community.general.cgroup_memory_recap - profiles maximum memory usage of individual tasks and displays a recap at the end
ansible.posix.cgroup_perf_recap - profiles system activity of Ansible and individual tasks and displays a recap at the end of the playbook execution.

cgroups (abbreviated from control groups) is a Linux kernel feature that limits, accounts for, and isolates the resource usage (CPU, memory, disk I/O, etc) of a collection of processes. You can use the cgroup-tools (for Fedora-based systems the package is called libcgroup-tools) utilities to create a cgroup profile and interact with cgroups.

Warning

Installing cgroup-tools and creating the cgroup-profile requires sudo permissions.

Install the cgroup-tools which contains command-line programs, services and a daemon for manipulating control groups using the libcgroup library.

sudo apt install cgroup-tools

Create a cgroup which includes the CPU Accounting, the memory (RAM) and the PIDs subsystem:

sudo cgcreate -a ${USER}:${USER} -t ${USER}:${USER} -g cpuacct,memory,pids:ansible_profile

Install the necessary collections which include the desired callback plugins:

ansible-galaxy collection install ansible.posix community.general

Tip

To use the callback plugins, they need to be enabled and configured.

Show RAM usage

To show the memory usage for every task, you can use the cgroup_memory_recap callback plugin. Add the following block to your ansible.cfg:

[defaults]
callbacks_enabled = community.general.cgroup_memory_recap

[callback_cgroupmemrecap]
cur_mem_file = /sys/fs/cgroup/memory/ansible_profile/memory.usage_in_bytes
max_mem_file = /sys/fs/cgroup/memory/ansible_profile/memory.max_usage_in_bytes

The cgexec program executes a task command (in our case a playbook run) with arguments in given control groups (in our case the memory group only).

cgexec -g memory:ansible_profile ansible-playbook playbook.yml

Example output

$ cgexec -g memory:ansible_profile ansible-playbook -i inventory.ini create_workshop_environment.yml

PLAY [Create Workshop environment] ******************************************************

TASK [Gathering Facts] ******************************************************************
ok: [localhost]

TASK [Get package facts] ****************************************************************
ok: [localhost]

[...cut for readability...]

PLAY RECAP ******************************************************************************
localhost                  : ok=10   changed=6    unreachable=0    failed=0    skipped=4    rescued=0    ignored=0  
node1                      : ok=5    changed=3    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0  
node2                      : ok=5    changed=3    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0  
node3                      : ok=5    changed=3    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0  


CGROUP MEMORY RECAP *********************************************************************
Execution Maximum: 281.57MB

Gathering Facts (299e2579-3d81-65cc-ccd9-00000000001f): 148.23MB
Get package facts (299e2579-3d81-65cc-ccd9-000000000006): 220.73MB
Install Podman (299e2579-3d81-65cc-ccd9-000000000007): 166.30MB
Pull image for managed node containers (299e2579-3d81-65cc-ccd9-000000000008): 220.42MB
Start managed node containers, publish 3 ports for each container (299e2579-3d81-65cc-ccd9-000000000009): 227.33MB
Create folder for workshop inventory (299e2579-3d81-65cc-ccd9-00000000000a): 190.53MB
Create workshop inventory file (299e2579-3d81-65cc-ccd9-00000000000b): 203.59MB
Add block to ssh_config for easy SSH access to managed nodes (299e2579-3d81-65cc-ccd9-00000000000c): 192.20MB
Deploy ansible.cfg to home directory (299e2579-3d81-65cc-ccd9-00000000000d): 185.89MB
Backup file of .ansible.cfg created (299e2579-3d81-65cc-ccd9-00000000000e): 168.18MB
Check for existing SSH keypair (299e2579-3d81-65cc-ccd9-00000000000f): 191.01MB
Check if OpenSSH keypair does not match target configuration (299e2579-3d81-65cc-ccd9-000000000011): 168.10MB
Abort playbook if keypair was found and does not match target configuration (299e2579-3d81-65cc-ccd9-000000000012): 168.20MB
Create OpenSSH keypair for accessing managed nodes (299e2579-3d81-65cc-ccd9-000000000014): 210.39MB
Gathering Facts (299e2579-3d81-65cc-ccd9-000000000060): 251.42MB
Install SSH daemon (299e2579-3d81-65cc-ccd9-000000000017): 275.68MB
Start SSH daemon (299e2579-3d81-65cc-ccd9-000000000018): 281.44MB
Remove /run/nologin to be able to login as unprivileged user (299e2579-3d81-65cc-ccd9-000000000019): 250.57MB
Add public key of workshop SSH keypair to authorized_keys of ansible user (299e2579-3d81-65cc-ccd9-00000000001a): 273.89MB

Tip

Create an alias for the cgexec... part:

~/.bash_aliases

alias ansible-playbook-profile='cgexec -g memory:ansible_profile ansible-playbook'

First time usage requires source ~/.bash_aliases, now you can run:

ansible-playbook-profile -i inventory playbook.yml

Show RAM, CPU & PIDs usage

To show the memory and CPU usage, as well as forked processes for every task, you can use the cgroup_perf_recap callback plugin. Add the following block to your ansible.cfg:

[defaults]
callbacks_enabled = ansible.posix.cgroup_perf_recap

[callback_cgroup_perf_recap]
control_group = ansible_profile

The cgexec program executes a task command (in our case a playbook run) with arguments in given control groups.

cgexec -g cpuacct,memory,pids:ansible_profile ansible-playbook playbook.yml

Example output

$ cgexec -g cpuacct,memory,pids:ansible_profile ansible-playbook -i inventory.ini create_workshop_environment.yml

PLAY [Create Workshop environment] *****************************************************************************

TASK [Gathering Facts] *****************************************************************************************
ok: [localhost]

TASK [Get package facts] ***************************************************************************************
ok: [localhost]

[...cut for readability...]

PLAY RECAP *****************************************************************************************************
localhost                  : ok=10   changed=6    unreachable=0    failed=0    skipped=4    rescued=0    ignored=0  
node1                      : ok=5    changed=3    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0  
node2                      : ok=5    changed=3    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0  
node3                      : ok=5    changed=3    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0  


CGROUP PERF RECAP **********************************************************************************************
Memory Execution Maximum: 286.29MB
cpu Execution Maximum: 302.46%
pids Execution Maximum: 43.00

memory:
Gathering Facts (299e2579-3d81-800b-f0f1-00000000001f): 109.20MB
Get package facts (299e2579-3d81-800b-f0f1-000000000006): 182.14MB
Install Podman (299e2579-3d81-800b-f0f1-000000000007): 120.23MB
Pull image for managed node containers (299e2579-3d81-800b-f0f1-000000000008): 216.32MB
Start managed node containers, publish 3 ports for each container (299e2579-3d81-800b-f0f1-000000000009): 224.69MB
Create folder for workshop inventory (299e2579-3d81-800b-f0f1-00000000000a): 159.62MB
Create workshop inventory file (299e2579-3d81-800b-f0f1-00000000000b): 206.01MB
Add block to ssh_config for easy SSH access to managed nodes (299e2579-3d81-800b-f0f1-00000000000c): 162.30MB
Deploy ansible.cfg to home directory (299e2579-3d81-800b-f0f1-00000000000d): 162.27MB
Backup file of .ansible.cfg created (299e2579-3d81-800b-f0f1-00000000000e): 162.33MB
Check for existing SSH keypair (299e2579-3d81-800b-f0f1-00000000000f): 162.94MB
Check if OpenSSH keypair does not match target configuration (299e2579-3d81-800b-f0f1-000000000011): 163.47MB
Abort playbook if keypair was found and does not match target configuration (299e2579-3d81-800b-f0f1-000000000012): 166.45MB
Create OpenSSH keypair for accessing managed nodes (299e2579-3d81-800b-f0f1-000000000014): 216.06MB
Gathering Facts (299e2579-3d81-800b-f0f1-000000000060): 250.53MB
Install SSH daemon (299e2579-3d81-800b-f0f1-000000000017): 271.96MB
Start SSH daemon (299e2579-3d81-800b-f0f1-000000000018): 268.99MB
Remove /run/nologin to be able to login as unprivileged user (299e2579-3d81-800b-f0f1-000000000019): 246.32MB
Add public key of workshop SSH keypair to authorized_keys of ansible user (299e2579-3d81-800b-f0f1-00000000001a): 273.55MB

cpu:
Gathering Facts (299e2579-3d81-800b-f0f1-00000000001f): 92.82%
Get package facts (299e2579-3d81-800b-f0f1-000000000006): 101.37%
Install Podman (299e2579-3d81-800b-f0f1-000000000007): 0.00%
Pull image for managed node containers (299e2579-3d81-800b-f0f1-000000000008): 77.08%
Start managed node containers, publish 3 ports for each container (299e2579-3d81-800b-f0f1-000000000009): 82.08%
Create folder for workshop inventory (299e2579-3d81-800b-f0f1-00000000000a): 0.00%
Create workshop inventory file (299e2579-3d81-800b-f0f1-00000000000b): 101.61%
Add block to ssh_config for easy SSH access to managed nodes (299e2579-3d81-800b-f0f1-00000000000c): 0.00%
Deploy ansible.cfg to home directory (299e2579-3d81-800b-f0f1-00000000000d): 0.00%
Backup file of .ansible.cfg created (299e2579-3d81-800b-f0f1-00000000000e): 0.00%
Check for existing SSH keypair (299e2579-3d81-800b-f0f1-00000000000f): 0.00%
Check if OpenSSH keypair does not match target configuration (299e2579-3d81-800b-f0f1-000000000011): 0.00%
Abort playbook if keypair was found and does not match target configuration (299e2579-3d81-800b-f0f1-000000000012): 0.00%
Create OpenSSH keypair for accessing managed nodes (299e2579-3d81-800b-f0f1-000000000014): 101.40%
Gathering Facts (299e2579-3d81-800b-f0f1-000000000060): 144.79%
Install SSH daemon (299e2579-3d81-800b-f0f1-000000000017): 302.46%
Start SSH daemon (299e2579-3d81-800b-f0f1-000000000018): 245.07%
Remove /run/nologin to be able to login as unprivileged user (299e2579-3d81-800b-f0f1-000000000019): 151.99%
Add public key of workshop SSH keypair to authorized_keys of ansible user (299e2579-3d81-800b-f0f1-00000000001a): 175.70%

pids:
Gathering Facts (299e2579-3d81-800b-f0f1-00000000001f): 9.00
Get package facts (299e2579-3d81-800b-f0f1-000000000006): 9.00
Install Podman (299e2579-3d81-800b-f0f1-000000000007): 8.00
Pull image for managed node containers (299e2579-3d81-800b-f0f1-000000000008): 21.00
Start managed node containers, publish 3 ports for each container (299e2579-3d81-800b-f0f1-000000000009): 22.00
Create folder for workshop inventory (299e2579-3d81-800b-f0f1-00000000000a): 9.00
Create workshop inventory file (299e2579-3d81-800b-f0f1-00000000000b): 11.00
Add block to ssh_config for easy SSH access to managed nodes (299e2579-3d81-800b-f0f1-00000000000c): 8.00
Deploy ansible.cfg to home directory (299e2579-3d81-800b-f0f1-00000000000d): 12.00
Backup file of .ansible.cfg created (299e2579-3d81-800b-f0f1-00000000000e): 9.00
Check for existing SSH keypair (299e2579-3d81-800b-f0f1-00000000000f): 11.00
Check if OpenSSH keypair does not match target configuration (299e2579-3d81-800b-f0f1-000000000011): 11.00
Abort playbook if keypair was found and does not match target configuration (299e2579-3d81-800b-f0f1-000000000012): 14.00
Create OpenSSH keypair for accessing managed nodes (299e2579-3d81-800b-f0f1-000000000014): 17.00
Gathering Facts (299e2579-3d81-800b-f0f1-000000000060): 41.00
Install SSH daemon (299e2579-3d81-800b-f0f1-000000000017): 43.00
Start SSH daemon (299e2579-3d81-800b-f0f1-000000000018): 33.00
Remove /run/nologin to be able to login as unprivileged user (299e2579-3d81-800b-f0f1-000000000019): 29.00
Add public key of workshop SSH keypair to authorized_keys of ansible user (299e2579-3d81-800b-f0f1-00000000001a): 37.00

Tip

Create an alias for the cgexec... part:

~/.bash_aliases

alias ansible-playbook-profile='cgexec -g cpuacct,memory,pids:ansible_profile ansible-playbook'

First time usage requires source ~/.bash_aliases, now you can run:

ansible-playbook-profile -i inventory playbook.yml

Ansible Automation Platform

This topic is split into multiple sections, each section covers a different aspect of using the Ansible Automation Platform.

Credentials

Secret handling in AAP
Workflows

Everything regarding Workflow Job templates

Credentials

Credentials are utilized for authentication when launching Jobs against machines, synchronizing with inventory sources, and importing project content from a version control system.

You can grant users and teams the ability to use these credentials, without actually exposing the credential to the user.

Custom Credentials

Although a growing number of credential types are already available, it is possible to define additional custom credential types that works in ways similar to existing ones.
For example, you could create a custom credential type that injects an API token for a third-party web service into an environment variable, which your playbook or custom inventory script could consume.

For example, to provide login credentials for plugins and modules of the Dell EMC OpenManage Enterprise Collection you need to create a custom credential, as no existing credentials type is available.
You can set the environment variables OME_USERNAME and OME_PASSWORD by creating a new AAP credentials type.

In the left navigation bar, choose Credential Types and click Add, besides the name you need to fill two fields:

Configuration	Description
Input	Which input fields you will make available when creating a credential of this type.
Injector	What your credential type will provide to the playbook

Input Configuration

fields:
  - type: string
    id: username
    label: Username
  - type: string
    id: password
    label: Password
    secret: true
required:
  - username
  - password

Injector Configuration

env:
  OME_USERNAME: "{{ username }}"
  OME_PASSWORD: "{{ password }}"

Warning

You are responsible for avoiding collisions in the extra_vars, env, and file namespaces. Also, avoid environment variable or extra variable names that start with ANSIBLE_ because they are reserved.

Save your credential type, create a new credential of this type and attach it to the Job template with the playbook targeting the OpenManage Enterprise API.

An example task may look like this:

- name: Retrieve basic inventory of all devices
  dellemc.openmanage.ome_device_info:
    hostname: "{{ ansible_host }}"
    username: "{{ lookup('env', 'OME_USERNAME') }}"
    password: "{{ lookup('env', 'OME_PASSWORD') }}"

Tip

Depending on the module used, you may leave out the username and password key, environment variables are evaluated first. Take a look at the module documentation if this is possible, otherwise use the lookup plugin as shown above.

Additional information can be found in the Ansible documentation.

Automation and templating

Creating a custom credential with a playbook can be tricky as you need to provide the special, reserved curly braces character as part of the Injector Configuration.
During the playbook run, Ansible will try to template the values which will fail as they are undefined (and you want the literal string representation anyway). Therefore, prefix the values with !unsafe to prevent templating the values.

- name: Create custom Credential type for DELL OME
  awx.awx.credential_type:
    name: Dell EMC OpenManage Enterprise
    description: Sets environment variables for logging in to OpenManage Enterprise
    inputs:
      fields:
        - id: username
          type: string
          label: Username
        - id: password
          type: string
          label: Password
          secret: true
      required:
        - username
        - password
    injectors:
      env:
        OME_PASSWORD: !unsafe "{{ password }}"
        OME_USERNAME: !unsafe "{{ username }}"

Take a look at Disable variable templating for additional information.

Workflows

Workflows allow you to configure a sequence of disparate job templates (or workflow templates) that may or may not share inventory, playbooks, or permissions.

Variables across workflow steps

Transferring information across workflow steps can't be done by the set_fact module, these facts are only available during a normal playbook run. Workflow job template run separate Jobs targeting separate playbooks.

Possible Use-case

Think of a first workflow step searching for an available IP address in an IPAM tool. The second workflow step can't know this IP before the workflow itself starts, therefore this information needs to be transferred from the first workflow step to the second one.

In addition to the workflow extra_vars, jobs ran as part of a workflow can inherit variables in the artifacts dictionary of a parent job in the workflow. These artifacts can be defined by the set_stats module.

Info

The point of set_stats in workflows is to have a vehicle to pass data via --extra-vars to the next job template.

Setting stats

The first playbook (Job Template) in the workflow run defines a variable in the data dictionary.

Task of playbook in Workflow node 1

- name: Setting stat of free IP address for subsequent workflow step
  ansible.builtin.set_stats:
    data:
      available_ip: "{{ ipam_returned_ip }}"

Bug

Do not use the per_host parameter, it breaks the artifacts gathering!
You can't provide distinct stats per host (without workarounds).

Retrieving stats

The second playbook (Job Template) in the workflow run references the variable of the data dictionary.

Task of playbook in Workflow node 2

- name: Output available IP address from previous workflow step
  ansible.builtin.debug:
    msg: "{{ available_ip }}"

Display custom stats

Custom stats can be displayed at the playbook recap, you must set show_custom_stats in the [defaults] section of your Ansible configuration file:

ansible.cfg

[defaults]
show_custom_stats = true

Defining the environment variable ANSIBLE_SHOW_CUSTOM_STATS and setting to true achieves the same behavior.

Play recap with custom stats

PLAY RECAP *********************************************************************
localhost                  : ok=13    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0

CUSTOM STATS: ******************************************************************
    RUN: { "available_ip": 10.28.13.5"}

Ansible Best Practices

Best Practices

Mindset

The Zen of Ansible

Ansible is not Python

Clear, Concise, Simple

Helping users

It's a kind of Magic

Convention over configuration

Declarative

Avoid complexity

Hard to explain !?

Opportunity to automate!

Can't be improved?

Never ending story...

Further Reading

Source

Ansible

Ansible

Installation

Standard install method

Install Collections

List installed collections

Upgrade installed collections

Store collections with your project

Install collections offline

Execution environments

Ansible Builder

Ansible Runner

Ansible Navigator

Project

Version Control

Ansible configuration

Show check mode

Show task path when failed

Configure ansible-galaxy

Dependencies

Collections

Python packages

Directory structure

Filenames

YAML Syntax

Indentation

Booleans

Quoting

Comments

Inventory

Convert INI to YAML

Static inventory

Dynamic inventory

Custom dynamic inventory

In-Memory Inventory

Playbooks

Directory structure

Playbook definition

Plays

Module defaults

Collections in playbooks

Executing playbooks

With Ansible Navigator

Roles

Readme

Role structure

Role skeleton

Tasks

import vs. include

Naming tasks

Tags

Idempotence

command vs. shell module

creates and removes

failed_when and changed_when

Modules (and Collections)

Module parameters

Module defaults

Permissions

State definition

Files vs. Templates

Conditionals

Loops