Tools

URL Encoding and Percent-Encoding: Everything Developers Must Know

February 7, 2025 · Faizzyhon · 11 min read

Improperly encoded URLs silently corrupt data, break integrations, and create security vulnerabilities. Here's the complete technical reference on percent-encoding, query strings, and every edge case that matters.

URL encoding is one of those topics that feels obvious until the moment it breaks something in production. A form submission that works perfectly in English silently corrupts Arabic names. An API call that passes tests fails when a password contains an ampersand. A redirect that functions in development breaks on the live server because an environment variable wasn't encoded. These failures are entirely preventable — but only if you understand the rules that govern how characters are represented in URLs, and why those rules exist in the first place.

Why URL Encoding Exists

URLs are defined by RFC 3986, which specifies that they may only contain a restricted set of ASCII characters. The characters that are always safe in a URL — called unreserved characters — are the 26 uppercase letters, 26 lowercase letters, 10 digits, and four symbols: hyphen, underscore, period, and tilde. Everything else must be encoded if it needs to appear in a URL.

Percent-encoding works by replacing an unsafe character with a percent sign followed by two hexadecimal digits representing the character's UTF-8 byte value. A space becomes %20. An at-sign becomes %40. A forward slash in a query value becomes %2F. The encoding is called "percent-encoding" rather than "URL encoding" in the RFC, though both terms are used interchangeably in practice.

The reason for the restriction is historical. Early internet protocols — email, FTP, early HTTP — were designed to transport ASCII text through systems that could not reliably handle arbitrary binary data or control characters. URL syntax was constrained to the same safe ASCII subset to ensure it could be reliably copied, parsed, and transmitted across all these systems without corruption.

Modern systems can handle Unicode natively, but URLs must still conform to the ASCII-only specification for compatibility. The solution is to encode Unicode characters using their UTF-8 byte sequences, each byte percent-encoded. The Chinese character 中 has the UTF-8 byte sequence E4 B8 AD, so it encodes to %E4%B8%AD in a URL. This is why international domain names (IDN) require Punycode encoding for the hostname, while path and query components use percent-encoding.

Reserved vs Unreserved Characters

RFC 3986 divides ASCII characters into three categories for URL purposes. Unreserved characters are always safe and should never be encoded: A–Z, a–z, 0–9, hyphen (-), underscore (_), period (.), tilde (~). Reserved characters have structural meaning in URLs and must be encoded when they appear as data rather than as URL delimiters. Everything else is simply not allowed and must always be encoded.

The reserved characters are further divided into "gen-delims" (characters used as major URL structural delimiters): colon (:), forward slash (/), question mark (?), hash (#), left bracket ([), right bracket (]), at sign (@); and "sub-delims" (characters used as secondary delimiters within URL components): exclamation (!), dollar ($), ampersand (&), single quote ('), left paren ((), right paren ()), asterisk (*), plus (+), comma (,), semicolon (;), equals (=).

Whether a reserved character needs to be encoded depends on where in the URL it appears. A forward slash is a valid path delimiter and must not be encoded in the path structure, but must be encoded as %2F if it appears as a value within a path segment. An ampersand is a query string delimiter and must be encoded as %26 if it appears in a query parameter value. Understanding this distinction is the single most important rule for correctly encoding URLs.

Query String Encoding Specifics

Query strings have their own encoding rules that are slightly different from the rest of the URL. The query string begins after the ? character and contains key-value pairs separated by &, with keys and values separated by =. Within a query string, these three characters (?, &, =) are structural delimiters and must be encoded when they appear as data.

The space character has a special case in query strings. HTML forms traditionally encode spaces as plus signs (+) in the application/x-www-form-urlencoded format, which is the default encoding for form submissions. This is a legacy convention that predates RFC 3986. Modern APIs typically use percent-encoding (%20) for spaces, which is unambiguous. When receiving query strings, assume a plus sign could mean either a literal plus (%2B) or a space — the correct interpretation depends on the encoding convention the sender used.

Nested or complex values (arrays, objects, JSON) require careful handling. There is no universal standard for encoding an array as a query parameter. Different frameworks use different conventions: ids[]=1&ids[]=2 (PHP/Laravel), ids=1&ids=2 (repeated keys), or ids=1,2 (comma-separated). If you're building an API that accepts complex query parameters, document your convention explicitly and ensure your encoder and decoder use the same one.

JavaScript's URL Encoding Functions

JavaScript provides four URL-related encoding/decoding functions, and choosing the wrong one is a common source of bugs. Understanding when each applies is essential.

encodeURI() encodes a complete URL, preserving all characters that have structural meaning in URLs. It does not encode: unreserved characters, gen-delims, sub-delims, or the percent sign itself. Use this when you have a complete URL string and want to ensure it's safely encoded without breaking its structure. Do not use it for encoding individual parameter values — it won't encode ampersands, equals signs, or plus signs that have structural meaning in query strings.

encodeURIComponent() encodes everything except unreserved characters. It encodes all reserved characters, making it safe for encoding values that will be inserted into a URL component. This is the correct function for encoding query parameter keys and values, path segments that may contain slashes, and any user-provided data going into a URL. It encodes spaces as %20, not as plus signs.

decodeURI() reverses encodeURI(), leaving percent-encoded sequences for characters that have structural URL meaning. decodeURIComponent() decodes all percent-encoded sequences. The matching decode function should always be used with the corresponding encode function to avoid double-decoding or incorrect interpretation of structural characters.

For more complex scenarios — building URL objects programmatically, handling base URLs and relative paths — the URL and URLSearchParams APIs provide a higher-level, object-oriented interface that handles encoding automatically and correctly. new URL('https://example.com') creates a URL object whose searchParams property is a URLSearchParams instance. Calling searchParams.set('key', 'value with spaces & symbols') and then reading url.toString() produces a correctly encoded URL without any manual encoding work.

Path Segment Encoding

Path segments — the slash-delimited parts of the URL path — have their own encoding requirements. A path segment may contain any unreserved characters and most sub-delimiters without encoding. It must encode: forward slashes (when they appear as data, not delimiters), question marks (to prevent premature query string parsing), hash signs (to prevent premature fragment parsing), and any characters outside the ASCII printable range.

A common misconception is that path segments don't need encoding if they only contain "normal" characters. The problem arises with user-generated content used as path segments — usernames, blog slugs, product names. A username containing a slash will break routing. A product name containing a hash will truncate the URL at the browser level. A blog title containing Unicode characters will work in modern browsers but may fail in older tools or systems that don't handle Unicode paths correctly.

The safest approach for dynamic path segments is to generate clean slugs — lowercase ASCII, hyphens instead of spaces, no special characters — rather than relying on encoding to make arbitrary strings safe as paths. Our URL Slug Generator tool automates this. For path segments that must include arbitrary user data, encode them with encodeURIComponent() and document this in your URL design.

Security Implications of Incorrect Encoding

Improper URL encoding creates several classes of security vulnerability beyond simple breakage. Double encoding — encoding a string that is already encoded — can be used to bypass security filters. A filter that rejects ../ (a path traversal sequence) may not reject %2E%2E%2F or %252E%252E%252F if it decodes inconsistently. Normalise and decode URLs exactly once, on input, and validate after decoding.

Open redirect vulnerabilities often exploit malformed URL encoding. An endpoint that redirects to a URL from a query parameter, without strict validation, can be tricked into redirecting to an attacker-controlled domain by encoding the colon and slashes of a full URL in a way the validator misses. Always validate redirect destinations against an allowlist of known-safe origins, regardless of encoding.

URL injection in log files is a lower-severity but real concern. If URLs containing newline characters (%0A, %0D) reach your access logs unvalidated, an attacker can inject fake log entries, potentially obscuring real attacks. Strip or encode control characters from all URLs before logging.

Common Mistakes and Their Fixes

Double-encoding is the most frequent mistake. It happens when a value is already percent-encoded and then passed through an encoding function again, turning %20 into %2520. The fix is to decode before encoding if you're unsure of the input's current state, or to use a URL object API that manages encoding state automatically.

Forgetting to encode special characters in OAuth and authentication flows causes silent failures that are difficult to diagnose. Redirect URIs, client secrets, and state parameters often contain characters like +, =, and / that must be encoded in query strings. Always use encodeURIComponent() for any value being inserted into a query string, regardless of its apparent simplicity.

Not encoding user input used in URL construction is both a correctness issue and a security concern. Any string received from user input, a database, or an external API should be treated as potentially containing characters that need encoding. Our free URL Encoder tool makes it trivial to inspect and encode any string, providing both the encoded output and a character-by-character breakdown of what was changed and why.