URL Encoding Explained: Percent-Encoding, RFC 3986, and Common Mistakes
URLs can only contain a limited subset of ASCII characters. Spaces, accented letters, emoji, and even common punctuation like & and = must be transformed before they can appear in a web address. This transformation — percent-encoding — is defined by a precise set of rules in RFC 3986. Understanding how URL encoding works prevents a class of bugs that cause broken links, failed API calls, and security vulnerabilities.
Percent-Encoding: The Core Mechanism
Percent-encoding replaces each unsafe byte with a percent sign followed by two hexadecimal digits representing the byte's value. A space (ASCII 0x20) becomes %20. A forward slash (ASCII 0x2F) becomes %2F. The percent sign itself (ASCII 0x25) becomes %25.
RFC 3986 Section 2.3 defines the unreserved characters that never require encoding:
ALPHA = A-Z / a-z
DIGIT = 0-9
unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"Everything else — including characters that look harmless like !, ', (, ), and * — is technically subject to encoding depending on context. RFC 3986 Section 2.2 lists the reserved characters that serve as URI delimiters:
gen-delims = : / ? # [ ] @
sub-delims = ! $ & ' ( ) * + , ; =Reserved characters must be encoded when they appear in a URI component where they do not serve their reserved purpose. For example, = is a delimiter in query strings (key=value) but must be encoded as %3D if it appears inside a key or value.
encodeURI vs encodeURIComponent
JavaScript provides two encoding functions with different scopes, and confusing them is one of the most common URL-related bugs:
encodeURI()— Encodes a complete URI. It preserves reserved characters that have structural meaning in a full URL::,/,?,#,[,],@,!,$,&,',(,),*,+,,,;,=. Use this only when encoding an entire URL where the structure (scheme, host, path, query) is already valid.encodeURIComponent()— Encodes a single URI component (a path segment, a query parameter key, or a query parameter value). It encodes all reserved characters including/,?,&, and=. This is the correct choice for embedding user input into a URL.
A concrete example demonstrates the difference. Given a search query of cats & dogs:
encodeURI("https://example.com/search?q=cats & dogs")
// "https://example.com/search?q=cats%20&%20dogs"
// WRONG — the & is preserved, creating a second parameter
encodeURIComponent("cats & dogs")
// "cats%20%26%20dogs"
// CORRECT — the & is encoded, safe to use as a parameter value
// Proper usage:
"https://example.com/search?q=" + encodeURIComponent("cats & dogs")
// "https://example.com/search?q=cats%20%26%20dogs"Other languages have their own equivalents: urllib.parse.quote() and urllib.parse.quote_plus() in Python, url.QueryEscape() in Go, and URLEncoder.encode() in Java (which uses + for spaces, following the older application/x-www-form-urlencoded convention from the WHATWG URL Standard).
UTF-8 Multibyte Characters in URLs
When a URL contains non-ASCII characters — accented letters, CJK characters, emoji — the character is first encoded as UTF-8 bytes, then each byte is percent-encoded individually. RFC 3986 Section 2.5 mandates UTF-8 for new URI schemes.
The German word straße (street) contains ß (U+00DF), which encodes as two UTF-8 bytes: 0xC3 0x9F. The percent-encoded form is stra%C3%9Fe. A Japanese character like 日 (U+65E5) requires three UTF-8 bytes: 0xE6 0x97 0xA5, becoming %E6%97%A5. An emoji like the face with tears of joy (U+1F602) takes four bytes: %F0%9F%98%82.
Modern browsers display Internationalized Resource Identifiers (IRIs) in their decoded Unicode form in the address bar while transmitting the percent-encoded version over HTTP. This means https://example.com/caf%C3%A9 displays as https://example.com/café in the browser but uses the encoded form in the actual HTTP request.
Common Mistakes: Double Encoding and Beyond
Double encoding happens when an already-encoded string is encoded again. The space character becomes %20 after the first encoding. If encoded a second time, the % in %20 becomes %25, producing %2520. The server receives %2520, decodes it once to %20, and treats the literal text %20 as the value instead of a space.
This typically occurs when:
- A framework or library automatically encodes URL parameters, and the developer also encodes them manually before passing them in
- A URL is stored in encoded form in a database, then encoded again when constructing a redirect
- Middleware processes a URL and re-encodes it without checking whether encoding has already been applied
Other frequent mistakes:
- Using
+for spaces in paths — The+-for-space convention applies only toapplication/x-www-form-urlencodeddata (HTML form submissions). In URL paths,+is a literal plus sign. Use%20for spaces in paths. - Not encoding
#in query values — An unencoded#terminates the query string and begins a fragment identifier. If a search query containsC#, the#must be encoded as%23or the server receives onlyC. - Encoding the entire URL when only a component needs it — Using
encodeURIComponent()on a full URL encodes the://and path separators, breaking the URL structure.
Query String Best Practices
Query strings follow the ?key=value&key2=value2 format. While RFC 3986 defines the query component syntax, the key-value pair convention comes from HTML form submission ( WHATWG HTML Standard).
- Encode both keys and values. Keys can contain special characters too. A parameter like
user[name]=Johnrequires encoding the brackets:user%5Bname%5D=John. - Use
URLSearchParamsin JavaScript. It handles encoding automatically:new URLSearchParams({q: "cats & dogs", page: "2"}).toString()producesq=cats+%26+dogs&page=2. Note thatURLSearchParamsuses+for spaces (the form-encoding convention), which is correct for query strings. - Maximum URL length. While RFC 3986 imposes no limit, browsers and servers enforce practical limits. Chrome supports URLs up to approximately 2MB, but many servers (Apache, Nginx) default to 8KB. For large payloads, use POST requests instead of encoding data in the URL.
- Consistent ordering for caching. Query parameters
?a=1&b=2and?b=2&a=1are semantically equivalent but may be cached separately by CDNs and proxies. Sort parameters alphabetically when constructing URLs for cache-sensitive endpoints.
Need to encode or decode a URL right now? The URL Encoder/Decoder handles percent-encoding, component-level encoding, and full URL parsing — entirely in the browser with no data leaving the machine.
Frequently Asked Questions
- What is the difference between encodeURI and encodeURIComponent?
- encodeURI encodes a full URL but preserves characters that have meaning in URLs (: / ? # @ etc.). encodeURIComponent encodes everything except unreserved characters (A-Z a-z 0-9 - _ . ~). Use encodeURIComponent for query parameter values and encodeURI for complete URLs.
- Why does my URL break when it contains a # character?
- The # character marks the start of a URL fragment. Everything after it is not sent to the server. If # appears in a query parameter value, it must be encoded as %23 to be transmitted correctly.
- What causes double-encoding bugs?
- Double encoding happens when an already-encoded string is encoded again, turning %20 into %2520. This typically occurs when URL-encoding a value before passing it to a library that encodes it again. Always encode at the boundary, once.