YAML is a human-readable data serialization standard that can be used in conjunction with all programming languages and is often used to write configuration files.
Overview
The recursive YAML acroynym stands for “YAML Ain’t Markup Language,” denoting it as flexible and data-oriented. In fact, it can be used with nearly any application that needs to store or transmit data. Its flexibility is partially due to the fact that YAML is made up of bits and pieces of other languages. A few examples of these similarities include:
- Scalars, lists, and associative arrays are based on Perl.
- The document separator “—” is based on MIME.
- Escape sequences are based on C.
- Whitespace wrapping is based on HTML.
Features of YAML
Delimiter collision resistancy
YAML relies on indentation for structure, making it resistant to delimiter collision. Some languages require escape characters or sequences, padded quotation marks, and other workarounds for handling special characters. YAML is naturally insensitive to quotation marks and braces, making special characters easier to define, particularly in strings.
Security
In and of itself, YAML has no executable commands. It is simply a data-representation language. However, it’s integration with other languages allows Perl parsers, for example, which can execute Perl code. PyYAML, a parser and emitter for Python, includes documentation specifically warning against this security vulnerability and has a built-in function to protect against dangerous Python objects known as yaml.safe_load
.
How YAML Works
Full documentation for YAML can be found on its official site, but outlined below are some simple concepts that are important to understand when starting to use YAML.
- Scalars, or variables, are defined using a colon and a space
integer: 25
string: "25"
float: 25.0
boolean: Yes
- Associative arrays and lists can be defined using a conventional block format or an inline format that is similar to JSON.
--- # Shopping List in Block Format
- milk
- eggs
- juice
--- # Shopping List in Inline Format
[milk, eggs, juice]
- Strings can be denoted with a | character, which preserves newlines, or a > character, which folds newlines.
data: |
Each of these
Newlines
Will be broken up
data: >
This text is
wrapped and will
be formed into
a single paragraph
YAML vs. JSON
YAML 1.2 is a superset of JavaScript Object Notation (JSON) but has some built-in advantages. For example, YAML can self-reference, support complex datatypes, embed block literals, support comments, and more. Overall, YAML tends to be more readable than JSON as well. Below you can see the same process shown in JSON and YAML.
JSON version
{
"json": [
"rigid",
"better for data interchange"
],
"yaml": [
"slim and flexible",
"better for configuration"
],
"object": {
"key": "value",
"array": [
{
"null_value": null
},
{
"boolean": true
},
{
"integer": 1
}
]
},
"paragraph": "Blank lines denotenparagraph breaksn",
"content": "Or wencan autonconvert line breaksnto save space"
}
YAML version
---
# <- yaml supports comments, json does not
# did you know you can embed json in yaml?
# try uncommenting the next line
# { foo: 'bar' } json:
# - rigid
# - better for data interchange yaml:
# - slim and flexible
# - better for configuration object: key: value array:
# - null_value: - boolean: true - integer: 1 paragraph: >
# Blank lines denote
paragraph breaks
content: |-
Or we
can auto
convert line breaks
to save space
Most of the time JSON can be converted to YAML and vice-versa. Earlier versions of YAML are not entirely compatible with JSON but most JSON documents can still be parsed using Syck or XS.
Examples of YAML
By integrating their software with YAML, Red Hat developed Ansible, an open source software provisioning, configuration management, and application deployment tool. Ansible temporarily connects to servers via Secure Shell (SSH) to perform management tasks using playbooks which are blocks of YAML code that automate manual tasks.
In the example below, the playbook verify-apache.yml
has been defined.
---
- hosts: webservers
vars:
http_port: 80
max_clients: 200
remote_user: root
tasks:
- name: ensure apache is at the latest version
yum:
name: httpd
state: latest
- name: write the apache config file
template:
src: /srv/httpd.j2
dest: /etc/httpd.conf
notify:
- restart apache
- name: ensure apache is running
service:
name: httpd
state: started
handlers:
- name: restart apache
service:
name: httpd
state: restarted
This job indicates that it should only be run on the hosts in the webservers group and that the job should be run as the remote user, root. There are three tasks in this playbook:
- The first task updates Apache to the latest version using Red Hat’s yum command.
- The second task uses template to copy over the apache configuration file. Once the configuration file is written, the Apache service is restarted.
- The third task starts the Apache service, just in case it did not come back up.
Now that the playbook has been written, it has to be run from the command line. Although the paths will vary based on the environment, the playbook can be run using this command:
ansible-playbook -i hosts/groups verify_apache.yml
The i
option indicates which file contains the list of servers in the webservers group, which will limit the servers the playbook executes on.
Key Takeaways
- YAML is a data-oriented language that has features derived from Perl, C, HTML, and other languages.
- YAML is a superset of JSON that comes with multiple built-in advantages such as including comments, self-referencing, and support for complex datatypes.
- Multiple software packages have implemented YAML to create powerful configuration management tools such as Red Hat’s Ansible.
6 thoughts on “What is YAML?”