JSON Formatting and Validation: A Developer's Complete Guide
JSON has become the lingua franca of data exchange on the web. Every REST API, every configuration file, every NoSQL document store speaks it. Yet malformed JSON remains one of the most common causes of silent data corruption, API failures, and hours of debugging. Understanding how to format and validate JSON correctly — at the specification level — saves time and prevents bugs that linters cannot catch.
JSON Syntax Rules per RFC 8259
The JSON specification is defined in RFC 8259, published by the IETF in December 2017 as a replacement for the earlier RFC 7159. The format allows exactly six structural types: objects, arrays, strings, numbers, booleans (true/false), and null.
Several rules trip up even experienced developers:
- Strings must use double quotes. Single quotes (
'key') are invalid JSON. JavaScript accepts them, JSON does not. - No trailing commas.
{"a": 1, "b": 2,}is a syntax error per the spec. The last element in an object or array must not be followed by a comma. - No comments. Neither
//nor/* */are valid. Douglas Crockford deliberately removed comments from the spec to prevent abuse as parsing directives. - Keys must be strings.
{1: "value"}is valid JavaScript but invalid JSON. - No leading zeros in numbers.
007is invalid;7or0.07are valid. - No NaN or Infinity. These JavaScript number values have no JSON representation. Use
nullor a string like"Infinity"as a convention.
Common Validation Errors and How to Fix Them
A 2020 study of GitHub repositories found that JSON parse errors accounted for a significant share of runtime exceptions in JavaScript projects. The most frequent causes:
- Unexpected token at position 0 — The response is not JSON at all. Common when an API returns HTML error pages or when a BOM (Byte Order Mark, U+FEFF) precedes the opening brace.
- Unexpected end of input — Truncated JSON, usually from network timeouts or buffer limits. Always check
Content-Lengthheaders against received bytes. - Duplicate keys — RFC 8259 Section 4 states that object names SHOULD be unique but does not forbid duplicates. Parsers handle them inconsistently: Python's
jsonmodule keeps the last value, while some Java parsers throw exceptions. - Unescaped control characters — Characters U+0000 through U+001F must be escaped in strings. A raw tab character (
\tliteral) inside a JSON string is invalid; it must be written as the escape sequence\\t.
Pretty-Printing vs Minification
Pretty-printing adds whitespace (indentation and line breaks) to make JSON human-readable. Minification strips all non-essential whitespace. Both produce semantically identical data — no information is lost during either transformation.
The practical impact is size. A typical API response minified at 10KB might expand to 14-18KB when pretty-printed with 2-space indentation. For production APIs transferring millions of responses per day, this overhead adds up. Combine minification with Content-Encoding: gzip for optimal transfer sizes — gzip compresses pretty-printed and minified JSON to nearly the same size, making the distinction mostly about readability during development.
Standard indentation conventions: 2 spaces (JavaScript/TypeScript ecosystem, Google style guide), 4 spaces (Python json.dumps(data, indent=4)), or tabs. Most formatters default to 2 spaces.
JSON vs JSONL vs JSON5
JSON has spawned several related formats, each addressing a specific limitation:
- JSONL (JSON Lines) — One valid JSON value per line, separated by
\n. Defined at jsonlines.org. Designed for streaming and log processing where parsing the entire file into memory is impractical. Tools likejq, BigQuery, and Amazon Athena support JSONL natively. Each line is independently parseable, so a corrupt line does not invalidate the entire file. - JSON5 — A superset of JSON defined at json5.org that allows single-line and multiline comments, trailing commas, single-quoted strings, unquoted keys, and hexadecimal numbers. Commonly used in configuration files (Babel, ESLint legacy config). Not suitable for data interchange — most APIs and databases do not accept JSON5.
- JSONC (JSON with Comments) — Used by VS Code for
settings.jsonandtsconfig.json. Allows//and/* */comments but otherwise follows standard JSON syntax. Trailing commas are also permitted in the VS Code dialect.
The Trailing Comma Debate
Trailing commas are the most requested addition to JSON. They reduce diff noise in version control — adding an item to the end of a list changes one line instead of two. JavaScript, Python, Rust, Go, and most modern languages allow trailing commas in their syntax.
Despite this, RFC 8259 explicitly forbids them, and there is no active IETF proposal to change this. The reason is interoperability: JSON is consumed by parsers in hundreds of languages, and changing the grammar would break existing strict parsers. The practical solution is to use JSONC or JSON5 for configuration files where humans edit the data, and standard JSON for data interchange between systems.
Security: JSON Injection and Safe Parsing
JSON injection occurs when untrusted data is concatenated into a JSON string without proper escaping. Consider building JSON via string concatenation:
// DANGEROUS — never do this
const json = '{"name": "' + userInput + '"}';If userInput contains ", "admin": true, "x": ", the resulting string becomes a valid JSON object with an injected admin field. The OWASP Injection Prevention Cheat Sheet recommends always using a proper serializer (JSON.stringify(), json.dumps()) instead of string concatenation.
Additional security considerations:
- Prototype pollution — In JavaScript, parsing JSON with keys like
__proto__orconstructorcan modify object prototypes. Libraries like secure-json-parse filter these keys. - Denial of service — Deeply nested JSON (thousands of levels) can crash recursive parsers via stack overflow. Set a maximum depth limit when parsing untrusted input. Python's
jsonmodule defaults to a recursion limit of 1000. - Number precision — JSON numbers have no defined precision. JavaScript loses precision for integers beyond 253 (Number.MAX_SAFE_INTEGER = 9,007,199,254,740,991). APIs returning large IDs (like Twitter/X snowflake IDs) should send them as strings.
Validating JSON in Practice
Validation goes beyond syntax checking. Schema validation using JSON Schema (current specification: Draft 2020-12) verifies that values conform to expected types, ranges, and patterns. A schema can enforce that an email field matches an email pattern, that an age field is a positive integer, or that required fields are present.
For quick syntax validation and formatting during development, a formatter that highlights error positions, shows line numbers, and supports both minification and pretty-printing eliminates the most common friction points.
Need to validate or format a JSON payload right now? The JSON Formatter on Toolverse parses, validates, pretty-prints, and minifies JSON with instant error highlighting — entirely in the browser, with no data sent to any server.
Frequently Asked Questions
- What is the difference between JSON and JSON5?
- JSON5 extends JSON with features from ECMAScript 5.1: trailing commas, single-quoted strings, comments, unquoted keys, and multiline strings. Standard JSON (RFC 8259) forbids all of these. Most APIs and databases expect strict JSON.
- Does minifying JSON improve performance?
- Minification removes whitespace, reducing file size by 10-30%. However, if the server uses gzip or brotli compression, the difference shrinks to 1-3% because compression algorithms already eliminate repetitive whitespace patterns.
- Can JSON have duplicate keys?
- RFC 8259 recommends unique keys but does not forbid duplicates. Behavior with duplicate keys is implementation-defined — most parsers silently use the last value, which can mask bugs or create security vulnerabilities.