JSON vs YAML: Syntax, Trade-Offs, and the Norway Problem
A new developer joins your team and opens a Kubernetes deployment file for the first time. It is 200 lines of YAML with nested maps, multi-line strings, and anchor references. They ask: "Why don't we just use JSON?" The answer involves trade-offs between human readability, machine parseability, and the sharp edges hiding in both formats.
JSON and YAML: Same Data, Different Syntax
JSON (JavaScript Object Notation, RFC 8259) and YAML (YAML Ain't Markup Language, spec 1.2.2) both represent the same data structures: mappings (objects), sequences (arrays), and scalar values (strings, numbers, booleans, null). In fact, every valid JSON document is also valid YAML — the YAML 1.2 spec was deliberately designed as a superset of JSON.
The difference is syntax. JSON uses braces {}, brackets [], and mandatory double-quoted keys. YAML uses indentation and colons, with optional quoting. A simple configuration in JSON:
{
"server": {
"port": 8080,
"host": "0.0.0.0",
"ssl": true
}
}The same in YAML:
server:
port: 8080
host: 0.0.0.0
ssl: trueFour lines instead of seven. No braces, no quotes, no commas. For configuration files that humans read and edit daily, YAML is significantly more scannable.
Where YAML Wins: Configuration and DevOps
YAML dominates the DevOps ecosystem. Kubernetes, Docker Compose, Ansible, GitHub Actions, CircleCI, Helm, and Prometheus all use YAML as their primary configuration format. The reasons are practical:
- Comments — YAML supports
#comments. JSON does not (RFC 8259 explicitly forbids them). For configuration files that need inline documentation, this alone is decisive. - Multi-line strings — YAML's
|(literal block) and>(folded block) scalars let you embed scripts, certificates, or SQL queries without escape character gymnastics. - Anchors and aliases — YAML's
&and*syntax lets you define a value once and reference it elsewhere, reducing duplication in large configs. - Less visual noise — no trailing commas to debug, no bracket matching, fewer characters per line of meaningful data.
Where JSON Wins: APIs and Data Exchange
JSON is the default serialization format for REST APIs, with over 83% of public APIs using it as their primary response format (Postman State of APIs report, 2023). JSON's strengths:
- Unambiguous parsing — JSON has one way to represent each value. YAML has many:
yes,Yes,YES,true, andonall parse as boolean true. This implicit type coercion has caused real bugs — the infamous "Norway problem" where the country codeNOis parsed as boolean false. - Speed — JSON parsers are simpler and faster. Python's
json.loads()is 5-10× faster than PyYAML'ssafe_load()on equivalent documents. - Universal support — every programming language ships with a JSON parser. YAML requires a third-party library in most languages.
- No indentation sensitivity — a misplaced space in YAML changes the data structure silently. JSON's braces make nesting explicit.
YAML Gotchas That Bite in Production
YAML's flexibility creates a category of bugs that do not exist in JSON:
- Implicit typing —
version: 3.10parses as the float 3.1, not the string "3.10". Python 3.10 users discovered this in CI configs. Always quote version strings:version: "3.10". - The Norway problem — country codes like
NO,DE(parsed as a sexagesimal number in some parsers), and time-like strings like22:22get misinterpreted. YAML 1.2 tightened the rules, but many parsers still default to 1.1 behavior. - Indentation errors — tabs are forbidden in YAML (spec section 5.5). Mixed tabs and spaces cause parser errors that are invisible in some editors.
- Security: YAML deserialization — YAML supports custom type tags that can trigger code execution. Python's
yaml.load()withoutLoader=SafeLoaderis a known remote code execution vector (CVE-2017-18342). Always use safe loading functions.
Choosing the Right Format
The choice is context-dependent:
- API request/response bodies → JSON. It is the universal standard and faster to parse.
- Configuration files edited by humans → YAML. Comments and readability justify the complexity.
- Data interchange between services → JSON. Unambiguous parsing prevents silent type coercion bugs.
- Complex templating (Helm, Ansible) → YAML with anchors, but consider a JSON schema for validation.
Many teams use both: YAML for source configuration files and JSON as the wire format. Kubernetes kubectl accepts both and can convert between them with -o json / -o yaml.
Key Takeaways
- YAML is a superset of JSON — every JSON file is valid YAML, but not vice versa.
- Use JSON for APIs and machine-to-machine data; use YAML for human-edited configuration files.
- Always quote strings in YAML that could be misinterpreted as numbers, booleans, or null (
"3.10","NO","null"). - Never use
yaml.load()without a safe loader — it is a remote code execution risk. - Validate YAML configs with JSON Schema to catch structural errors before deployment.
Need to convert between formats? Use our JSON to YAML Converter to switch between JSON and YAML instantly in your browser. For formatting raw JSON, try the JSON formatting guide.
Frequently Asked Questions
- Is every JSON file valid YAML?
- Yes. YAML 1.2 was designed as a strict superset of JSON. Any valid JSON document can be parsed by a YAML 1.2 compliant parser. The reverse is not true — YAML features like comments, anchors, and unquoted strings have no JSON equivalent.
- What is the Norway problem in YAML?
- In YAML 1.1, the string 'NO' (the ISO country code for Norway) is parsed as boolean false because YAML treats 'no', 'NO', and 'No' as false values. This caused real bugs in internationalization configs. YAML 1.2 restricts booleans to only 'true' and 'false', but many parsers still default to 1.1 rules.
- Why is YAML considered less safe than JSON?
- YAML supports custom type tags that can instantiate arbitrary objects during deserialization. In Python, calling yaml.load() without SafeLoader can execute arbitrary code (CVE-2017-18342). JSON parsing is inherently safer because the format only supports primitive data types.