Toolverse

JSON vs YAML vs XML vs TOML: Choosing the Right Data Format

9 min read

Every software project eventually faces the same question: what format should I use to store or exchange this data? The answer shapes how readable your configuration is, how safe your deserialization logic is, how much tooling you can rely on, and how much friction your team experiences every time they open the file. JSON, YAML, XML, and TOML each solve the problem differently, with different trade-offs that are worth understanding explicitly rather than deciding by habit.

The Four Contenders

Before diving into the details, a quick orientation:

  • JSON (JavaScript Object Notation) — born in the early 2000s as a leaner alternative to XML for browser-to-server communication. Now the default wire format for REST APIs worldwide.
  • YAML (YAML Ain't Markup Language) — a recursive acronym from 2001, designed for human-readable configuration. Indentation-based, no brackets or quotes required.
  • XML (Extensible Markup Language) — the W3C standard from 1998, designed for structured document interchange. Verbose but powerful, with a rich ecosystem of schemas and transformation tools.
  • TOML (Tom's Obvious, Minimal Language) — created in 2013 by Tom Preston-Werner (GitHub co-founder) as an explicit-typing alternative to YAML for configuration files.

Each has a different design center. JSON optimizes for simplicity and machine parsing. YAML optimizes for human readability. XML optimizes for structure and schema validation. TOML optimizes for configuration clarity with minimal ambiguity.

JSON: The Universal Data Exchange Format

JSON is defined by RFC 8259, published by the IETF in 2017. The specification is four pages long — a deliberate design choice. JSON supports exactly six types: strings, numbers, booleans, null, arrays, and objects. That constraint is what makes it universally interoperable.

Every major programming language has a built-in or standard-library JSON parser. Browsers implement JSON.parse() natively in C++. The Node.js, Python, Ruby, Go, Rust, and Java ecosystems all treat JSON as a first-class citizen. This ubiquity means that choosing JSON for an API response or a data file creates near-zero friction for consumers.

JSON's limitations are well-known and largely by design:

  • No comments — Douglas Crockford deliberately excluded them to prevent abuse as parsing directives
  • No trailing commas — creates noise in version control diffs
  • No native date type — dates must be strings (typically RFC 3339 format)
  • Number precision limited to IEEE 754 double — integers above 253 lose precision
  • Verbose for deeply nested structures compared to YAML

For machine-generated configuration or any format that crosses a network boundary between systems, JSON remains the safest default. When you need to validate or format JSON during development, the JSON Formatter handles both pretty-printing and syntax validation instantly.

YAML: Configuration Made Readable

YAML's dominance in configuration files is not accidental. Kubernetes manifests, Docker Compose files, GitHub Actions workflows, Ansible playbooks, and Travis CI configs all use YAML. The appeal is immediate: no brackets, no quotes required for simple strings, support for comments, and indentation that maps naturally to hierarchy.

But YAML carries serious risks that bite developers regularly.

Implicit typing. YAML 1.1 (still the default in many parsers, including PyYAML and Ruby's Psych) infers types from values without quotes. This produces the famous "Norway problem": the string NO — the ISO 3166-1 alpha-2 country code for Norway — is parsed as boolean false because YAML 1.1 treats "no", "NO", "No", "off", "OFF", "false", and "FALSE" as false values. GitHub Actions workflows hit this bug with the on key (also a boolean in 1.1), which is why GitHub wraps it in quotes: "on":.

YAML 1.2 restricts booleans to only true and false, but PyYAML still defaults to 1.1 behavior as of 2024. Always check which spec version your parser implements.

Deserialization security. YAML supports custom type tags that can instantiate arbitrary objects. In Python, yaml.load(untrusted_input) withoutLoader=yaml.SafeLoader can execute arbitrary code — this is not a theoretical risk but a documented CVE (CVE-2017-18342). Always use yaml.safe_load() in Python or the equivalent safe loader in other languages when parsing YAML from untrusted sources.

Whitespace sensitivity. A single tab character where a space is expected causes a parse error. Mixing indentation levels silently produces different structure than intended. YAML files are fragile in ways that are difficult to spot in a code review.

Convert between JSON and YAML during development with the JSON ↔ YAML Converter, which handles both directions and displays the output side-by-side.

XML: The Enterprise Veteran

XML was the dominant data format for a decade before JSON arrived. It has been replaced in most new development, but "replaced" is not the same as "gone." In 2026, XML remains the mandatory format in several important domains:

  • Enterprise SOAP services — Banking, insurance, healthcare, and government systems built in the 2000s and 2010s use SOAP/WSDL. These systems are not being rewritten anytime soon.
  • Document formats — Microsoft Office OOXML, OpenDocument (ODF), and EPUB 3 are XML-based ZIP archives. Parsing a .docx file means reading XML.
  • Feed syndication — RSS (Really Simple Syndication) and Atom are XML. Podcast feeds, news aggregators, and many blog platforms still publish XML feeds.
  • SVG graphics — Scalable Vector Graphics is XML. Every SVG file edited in Inkscape or Figma is XML under the hood.
  • Android development — Android layout files, manifests, and resource files are XML. Kotlin Multiplatform may be shifting some of this, but XML remains central to the native Android ecosystem.

XML's genuine strengths are its schema ecosystem and transformation tooling. XML Schema (XSD) provides stronger validation than JSON Schema for complex document structures. XSLT enables structural transformations without custom code. XPath provides powerful document querying. For document-centric workflows where strong schema validation matters, XML's tooling is still unmatched.

When working with XML documents, the XML Formatter provides indentation, node collapsing, and syntax validation in the browser.

TOML: The Rising Alternative

TOML was created specifically to solve the problems with YAML in configuration files. Its design goals, stated explicitly in the TOML v1.0.0 specification, are: obvious, minimal, and maps unambiguously to a hash table.

TOML uses explicit typing. A string is always quoted. A number is always unquoted. An array uses square brackets. A boolean is exactly true or false. There is no implicit type inference, so the Norway problem is structurally impossible.

The format has seen rapid adoption in the Rust ecosystem — Cargo.toml is the manifest format for every Rust package — and in Python, where PEP 518 standardized pyproject.toml as the build system configuration format.uv, ruff, black, and hatch all configure via pyproject.toml. Hugo (the static site generator) uses TOML for front matter.

TOML's limitations are real: it has no anchor/alias mechanism (unlike YAML, which allows repeated blocks to be defined once and referenced multiple times), and deeply nested structures become verbose. It is also younger, with less library support than JSON or YAML in some ecosystems.

Feature Comparison Table

FeatureJSONYAMLXMLTOML
CommentsNoYes (#)Yes (<!-- -->)Yes (#)
SchemasJSON SchemaJSON Schema / customXSD / DTD / RelaxNGNone standard
Implicit typingNoYes (dangerous)No (all strings)No
Multiline strings\n escape onlyYes (| and >)CDATA sectionsYes (triple quotes)
Anchors/aliasesNoYes (& and *)No (IDREF is close)No
Browser-native parsingYes (JSON.parse)No (library required)Yes (DOMParser)No (library required)
Safe to deserialize untrusted inputYesOnly with SafeLoaderWith XXE protectionYes
Human readabilityModerateHighLow (verbose)High
VerbosityLowLowHighLow-Medium

Making the Right Choice

No single format wins in all contexts. The decision framework should start with the use case:

APIs and data interchange between systems: JSON is the clear choice. It is universally supported, has no implicit typing risks, parses natively in browsers, and its simplicity prevents ambiguity. The only exception is if you are integrating with an existing SOAP or XML-based system, where you have no choice.

Configuration files humans edit regularly: TOML if you want explicit typing and clarity, YAML if you need anchors/aliases or your tooling ecosystem has standardized on it (Kubernetes, GitHub Actions). Avoid JSON for configuration files humans touch — the lack of comments is a meaningful ergonomic penalty.

Machine-generated configuration: JSON. It is the simplest to generate programmatically without accidentally triggering YAML's implicit type rules or producing malformed structure.

Document exchange requiring schema validation: XML, especially for enterprise or regulated domains where XSD schemas are required by the counterparty.

Rust projects: TOML for configuration and manifests (it's the language's native ecosystem format), JSON for API responses.

Python projects: TOML for project configuration (pyproject.toml), YAML for infrastructure and CI, JSON for APIs.

Security posture should also influence the choice. Any application that deserializes user-provided YAML must use a safe loader. Any application that parses user-provided XML must disable external entity expansion (XXE). JSON has neither of these vulnerabilities by design — a meaningful advantage when parsing untrusted input.

The tools for working with these formats are readily available: use the JSON Formatter for validating and pretty-printing JSON, JSON ↔ YAML Converter for migrating between formats, and the XML Formatter for inspecting XML documents — all running client-side in your browser.

Frequently Asked Questions

Which data format should I use for configuration files?
YAML is the most popular choice for configuration (Kubernetes, Docker Compose, GitHub Actions) because it supports comments and is easy to read. TOML is gaining ground for simpler configs (Rust's Cargo.toml, Python's pyproject.toml) because it avoids YAML's implicit typing gotchas. JSON works when the config is machine-generated.
Is XML still relevant in 2026?
Yes, but in specific domains. XML remains dominant in enterprise SOAP services, document formats (OOXML, EPUB), feed syndication (RSS, Atom), vector graphics (SVG), and Android development. New projects generally choose JSON or YAML unless they need XML's schema validation or namespace features.
Why is YAML considered less safe than JSON?
YAML supports custom type tags that can instantiate arbitrary objects during deserialization. In Python, calling yaml.load() without SafeLoader can execute arbitrary code (CVE-2017-18342). JSON parsing is inherently safer because the format only supports primitive data types and has no code execution mechanism.