URL encoding, formally known as percent encoding, is the process of converting characters into a format that can be safely transmitted within a Uniform Resource Locator (URL). URLs can only contain a limited set of characters from the US-ASCII character set, and certain characters within that set have special reserved meanings. Any character that falls outside the permitted set -- or a reserved character used outside its special purpose -- must be encoded before it can appear in a URL.
The encoding mechanism is simple: each unsafe character is replaced by a percent sign (%) followed by exactly two hexadecimal digits that represent the character's byte value. For example, a space character (ASCII value 32, hexadecimal 20) is encoded as %20. An ampersand (&, ASCII value 38, hexadecimal 26) is encoded as %26.
URL encoding exists because of a fundamental design constraint. When Tim Berners-Lee designed the URL syntax in the early 1990s, URLs needed to be compact, universally transmittable, and parseable by software. Characters like spaces, angle brackets, curly braces, and non-ASCII characters could cause ambiguity or break parsers. Reserved characters like ?, &, =, and # serve as delimiters within the URL structure -- using them as literal data without encoding would confuse parsers about where one component ends and another begins.
Consider a search query like price < $50 & color = blue. If you placed this directly in a URL query string, the & would be misinterpreted as a parameter separator, the = as a key-value delimiter, the < as potentially dangerous input, the $ as a special character, and the spaces would truncate the URL in many contexts. URL encoding transforms this into price%20%3C%20%2450%20%26%20color%20%3D%20blue, making every character unambiguous.
Today, URL encoding is a fundamental building block of the web. Every browser, web server, HTTP client library, and API framework implements URL encoding. Every time you submit a form, click a link with query parameters, or call a REST API, URL encoding is at work behind the scenes ensuring that your data arrives intact and unambiguous.
Percent encoding is conceptually straightforward, but the details matter. Understanding the algorithm helps you debug encoding issues and choose the right encoding function for your use case.
For ASCII characters, the encoding process works as follows:
%HH.Here are some common examples:
Character ASCII Value Hex Encoded
Space 32 20 %20
! 33 21 %21
# 35 23 %23
$ 36 24 %24
& 38 26 %26
+ 43 2B %2B
/ 47 2F %2F
: 58 3A %3A
= 61 3D %3D
? 63 3F %3F
@ 64 40 %40
For characters outside the ASCII range (code points above 127), the character is first encoded into its UTF-8 byte sequence, and then each byte is individually percent-encoded. This is the approach mandated by modern standards and is sometimes called IRI-to-URI conversion.
For example, the euro sign (€, U+20AC) has the UTF-8 byte sequence E2 82 AC, so it is encoded as %E2%82%AC.
Character Code Point UTF-8 Bytes Encoded
ü (u-umlaut) U+00FC C3 BC %C3%BC
€ (euro) U+20AC E2 82 AC %E2%82%AC
ß (sharp s) U+00DF C3 9F %C3%9F
你 (Chinese) U+4F60 E4 BD A0 %E4%BD%A0
URL decoding (also called percent decoding) is the reverse process. A decoder scans the string for percent signs. When it finds one, it reads the next two hexadecimal characters, converts them to a byte value, and replaces the three-character sequence (%HH) with the corresponding byte. After all percent sequences are decoded, the resulting byte sequence is interpreted as UTF-8 text.
Decoders must also handle the + sign as a space character when processing application/x-www-form-urlencoded data, though this is specific to form data and not part of the general percent-encoding specification.
RFC 3986, published in January 2005, is the current definitive standard for Uniform Resource Identifier (URI) syntax. It supersedes RFC 2396 and is the specification that governs how URLs are constructed, parsed, and resolved. Understanding RFC 3986 is essential for anyone who works with URLs programmatically.
RFC 3986 defines the generic URI syntax with the following components:
scheme://authority/path?query#fragment
https://user:pass@example.com:8080/search?q=hello&lang=en#results
\___/ \________________________/\_____/ \_____________/ \_____/
| | | | |
scheme authority path query fragment
Each component has its own rules about which characters are allowed literally and which must be percent-encoded. A character that is valid in one component may need encoding in another. For instance, @ is a delimiter in the authority component but can appear unencoded in a query string.
RFC 3986 defines the allowed characters using an ABNF (Augmented Backus-Naur Form) grammar. The key productions are:
URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ]
hier-part = "//" authority path-abempty
/ path-absolute
/ path-rootless
/ path-empty
authority = [ userinfo "@" ] host [ ":" port ]
query = *( pchar / "/" / "?" )
fragment = *( pchar / "/" / "?" )
pchar = unreserved / pct-encoded / sub-delims / ":" / "@"
pct-encoded = "%" HEXDIG HEXDIG
unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
reserved = gen-delims / sub-delims
gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@"
sub-delims = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="
This grammar is precise: it tells you exactly which characters are allowed in each position without encoding.
RFC 3986 also defines URI normalization -- the process of transforming a URI into a canonical form for comparison. Key normalization rules include:
%2F, not %2f).%41 (letter A) should be normalized to A.. and ..) should be resolved. For example, /a/b/../c becomes /a/c.RFC 3986 divides characters into three categories: unreserved, reserved, and all other characters. Understanding these categories is critical for knowing when to encode and when not to.
Unreserved characters can appear in any part of a URI without being percent-encoded. In fact, RFC 3986 states that unreserved characters should not be encoded, and if they are encoded, they must be decoded during normalization.
Unreserved = A-Z a-z 0-9 - . _ ~
Letters: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
a b c d e f g h i j k l m n o p q r s t u v w x y z
Digits: 0 1 2 3 4 5 6 7 8 9
Others: - (hyphen) . (period) _ (underscore) ~ (tilde)
These 66 characters are safe everywhere in a URL. You never need to worry about encoding them.
Reserved characters have special meaning within the URI syntax. They act as delimiters that separate the scheme, authority, path, query, and fragment components. Whether a reserved character needs encoding depends on the context.
General delimiters (gen-delims): : / ? # [ ] @
Sub-delimiters (sub-delims): ! $ & ' ( ) * + , ; =
When serving their delimiter purpose, reserved characters must NOT be encoded. For example, the ? that separates the path from the query string must remain as a literal ?.
When used as data within a component, reserved characters MUST be encoded. For example, if a query parameter value contains an &, it must be encoded as %26 to avoid being interpreted as a parameter separator.
// Correct: & as delimiter between parameters
https://example.com/search?q=hello&lang=en
// Correct: & as literal data within a parameter value
https://example.com/search?company=AT%26T
Any character that is neither unreserved nor reserved must always be percent-encoded when it appears in a URI. This includes:
%20){ } | \ ^ ` < > "Query parameters are the most common context where developers need to perform URL encoding. Understanding how query strings work -- and the difference between the URI specification and form encoding -- is essential for building correct URLs.
A query string begins with a ? and consists of key-value pairs separated by &. Each key is separated from its value by =:
https://example.com/search?q=url+encoding&page=2&sort=date
Key-Value Pairs:
q = url encoding
page = 2
sort = date
There are two encoding standards for query parameters, and they differ in how they handle spaces:
RFC 3986 (URI encoding): Spaces are encoded as %20. This is the general-purpose URI encoding defined by the URI specification. It applies to all parts of the URI.
application/x-www-form-urlencoded (form encoding): Spaces are encoded as +. This format is defined by the HTML specification (originally from the CGI specification) and is used specifically when browsers submit HTML forms with method="GET" or method="POST" with the default content type.
Original: hello world & goodbye
RFC 3986: hello%20world%20%26%20goodbye
Form-encoded: hello+world+%26+goodbye
Most web servers and frameworks accept both %20 and + as spaces when parsing query strings. However, outside of query strings (such as in the path component), + is a literal plus sign, not a space.
A critical mistake is to apply URL encoding to an entire URL rather than to individual parameter values. If you encode an entire URL, you will encode the structural characters (://, /, ?, =, &) that are needed for the URL to function:
// WRONG: Encoding the entire URL
https%3A%2F%2Fexample.com%2Fsearch%3Fq%3Dhello%26page%3D1
// This is now a broken, unusable URL
// CORRECT: Encode only the parameter values
https://example.com/search?q=hello%20world&page=1
Always build your URL by encoding individual values and then assembling them with the proper delimiters.
A common scenario is passing a URL as a query parameter value -- for example, a redirect URL or a callback URL. The entire nested URL must be encoded:
// Original callback URL:
https://myapp.com/callback?status=ok
// Nested as a query parameter value:
https://auth.example.com/login?redirect=https%3A%2F%2Fmyapp.com%2Fcallback%3Fstatus%3Dok
The nested URL's ://, /, ?, and = must all be encoded so they are not interpreted as part of the outer URL's structure.
The web is global, and URLs frequently contain non-ASCII characters -- accented letters in European languages, ideographs in Chinese, Japanese, and Korean, Arabic and Cyrillic scripts, and even emoji. URL encoding handles all of these through UTF-8.
RFC 3987 defines Internationalized Resource Identifiers (IRIs), which extend URIs to allow Unicode characters directly. An IRI is converted to a URI by percent-encoding all non-ASCII characters using their UTF-8 byte sequences. Modern browsers display IRIs in the address bar for readability but transmit the percent-encoded URI form over HTTP.
When a non-ASCII character needs to appear in a URL, the standard process is:
Character: cafe with accent (café)
é = U+00E9
UTF-8: C3 A9
URL: caf%C3%A9
Character: Tokyo in Japanese (東京)
東 = U+6771 -> UTF-8: E6 9D B1 -> %E6%9D%B1
京 = U+4EAC -> UTF-8: E4 BA AC -> %E4%BA%AC
URL: %E6%9D%B1%E4%BA%AC
Character: Smiley emoji (😀)
😀 = U+1F600 -> UTF-8: F0 9F 98 80 -> %F0%9F%98%80
Domain names use a different encoding system called Punycode (defined in RFC 3492) rather than percent encoding. An internationalized domain name is converted to an ASCII-compatible encoding (ACE) with an xn-- prefix:
Unicode domain: münchen.de
Punycode (ACE): xn--mnchen-3ya.de
Unicode domain: 例え.jp
Punycode (ACE): xn--r8jz45g.jp
Modern browsers display the Unicode form in the address bar but resolve the Punycode form through DNS. This distinction is important: domain names use Punycode, while path and query components use percent-encoded UTF-8.
Before UTF-8 became the standard, web pages used various character encodings (ISO-8859-1, Windows-1252, Shift_JIS, EUC-KR, etc.), and form data was encoded using the page's character encoding. This led to ambiguity -- the same percent-encoded sequence could represent different characters depending on the assumed encoding. Today, UTF-8 is the universal standard for URL encoding, and modern specifications explicitly require UTF-8. If you encounter legacy systems that use other encodings, convert to UTF-8 at the boundary.
Every major programming language provides built-in functions for URL encoding and decoding. However, the functions differ in subtle but important ways. Choosing the right function is critical for correct behavior.
// encodeURIComponent -- use for encoding parameter values
encodeURIComponent("hello world & goodbye")
// "hello%20world%20%26%20goodbye"
// decodeURIComponent -- decode a single component
decodeURIComponent("hello%20world%20%26%20goodbye")
// "hello world & goodbye"
// encodeURI -- use for encoding a full URI (preserves : / ? # & =)
encodeURI("https://example.com/path?q=hello world")
// "https://example.com/path?q=hello%20world"
// decodeURI -- decode a full URI
decodeURI("https://example.com/path?q=hello%20world")
// "https://example.com/path?q=hello world"
// URLSearchParams -- handles form encoding automatically
const params = new URLSearchParams();
params.set("q", "hello world & goodbye");
params.toString();
// "q=hello+world+%26+goodbye" (uses + for spaces)
Key distinction: Use encodeURIComponent() for individual query parameter keys and values. Use encodeURI() only when you have a full URL that just needs non-ASCII characters or spaces encoded. Never use escape() -- it is deprecated and does not handle UTF-8 correctly.
from urllib.parse import quote, unquote, urlencode, quote_plus
# quote -- RFC 3986 encoding (spaces become %20)
quote("hello world & goodbye")
# "hello%20world%20%26%20goodbye"
# quote with safe parameter -- preserve certain characters
quote("hello/world", safe="/")
# "hello/world"
# quote_plus -- form encoding (spaces become +)
quote_plus("hello world & goodbye")
# "hello+world+%26+goodbye"
# unquote -- decode percent-encoded strings
unquote("hello%20world%20%26%20goodbye")
# "hello world & goodbye"
# urlencode -- encode a dictionary of parameters
urlencode({"q": "hello world", "page": "1"})
# "q=hello+world&page=1"
import "net/url"
// url.QueryEscape -- form encoding for query parameters
url.QueryEscape("hello world & goodbye")
// "hello+world+%26+goodbye"
// url.PathEscape -- RFC 3986 encoding for path segments
url.PathEscape("hello world & goodbye")
// "hello%20world%20&%20goodbye"
// url.QueryUnescape -- decode query-encoded strings
url.QueryUnescape("hello+world+%26+goodbye")
// "hello world & goodbye"
// url.Values -- build query strings safely
v := url.Values{}
v.Set("q", "hello world")
v.Set("page", "1")
v.Encode()
// "page=1&q=hello+world"
import java.net.URLEncoder;
import java.net.URLDecoder;
import java.nio.charset.StandardCharsets;
// URLEncoder.encode -- form encoding (spaces become +)
URLEncoder.encode("hello world & goodbye", StandardCharsets.UTF_8);
// "hello+world+%26+goodbye"
// URLDecoder.decode -- decode form-encoded strings
URLDecoder.decode("hello+world+%26+goodbye", StandardCharsets.UTF_8);
// "hello world & goodbye"
// For RFC 3986 encoding, use java.net.URI
new URI("https", "example.com", "/path", "q=hello world", null).toASCIIString();
// "https://example.com/path?q=hello%20world"
// urlencode -- form encoding (spaces become +)
urlencode("hello world & goodbye");
// "hello+world+%26+goodbye"
// rawurlencode -- RFC 3986 encoding (spaces become %20)
rawurlencode("hello world & goodbye");
// "hello%20world+%26%20goodbye"
// urldecode / rawurldecode -- corresponding decoders
urldecode("hello+world+%26+goodbye");
// "hello world & goodbye"
// http_build_query -- build query strings from arrays
http_build_query(["q" => "hello world", "page" => "1"]);
// "q=hello+world&page=1"
URL encoding seems simple, but subtle mistakes can cause bugs that are difficult to diagnose. Here are the most common pitfalls developers encounter.
Double encoding occurs when data that has already been percent-encoded is encoded again. The percent sign (%) itself gets encoded as %25, turning the already-encoded sequences into garbled output:
Original: hello world
First encoding: hello%20world (correct)
Double encoding: hello%2520world (broken! %25 = %, so this decodes to hello%20world)
This typically happens when:
Prevention: Encode data exactly once, at the point where you construct the URL. Never encode data "just in case" -- know whether your framework or library handles encoding automatically.
As discussed in the query parameters section, encoding an entire URL destroys its structure. Always build URLs by encoding individual components and assembling them:
// WRONG
const url = encodeURIComponent(`https://example.com/search?q=${query}`);
// CORRECT
const url = `https://example.com/search?q=${encodeURIComponent(query)}`;
In JavaScript, using encodeURI() to encode a query parameter value will fail to encode critical characters like &, =, and +, because encodeURI() assumes these are structural delimiters:
const value = "a=1&b=2";
// WRONG: encodeURI doesn't encode & and =
encodeURI(value); // "a=1&b=2" (unchanged! breaks the query string)
// CORRECT: encodeURIComponent encodes everything
encodeURIComponent(value); // "a%3D1%26b%3D2" (safe as a parameter value)
The + character means a space in application/x-www-form-urlencoded context (query strings from form submissions), but it is a literal + in other URL components. This causes problems when:
+ will be interpreted literally, not as a space)+ in a query string (you must encode it as %2B)%20 for spaces, not +)
It is easy to remember to encode query parameter values but forget that path segments also need encoding. If a path segment contains a /, it will be misinterpreted as a path delimiter:
// File path as a URL segment: "reports/2026/Q1"
// WRONG: this creates three path segments
/files/reports/2026/Q1
// CORRECT: encode the value as a single segment
/files/reports%2F2026%2FQ1
Older encoding functions in some languages use Latin-1 or the system's default encoding instead of UTF-8. Always specify UTF-8 explicitly when available, and verify that your encoding functions produce UTF-8 percent-encoded output. A telltale sign of encoding mismatch is that accented characters or ideographs decode as garbage characters (mojibake).
UTF-8 is the universal standard for URL encoding. All modern specifications require it, and all modern browsers and servers expect it. Never use legacy encodings like Latin-1 or Shift_JIS for URL encoding. If you interface with legacy systems, convert to UTF-8 at the boundary.
Encode values at the moment you build the URL, not before and not after. This avoids double encoding and ensures that every value is encoded exactly once. Use URL builder APIs (like JavaScript's URL and URLSearchParams, Python's urllib.parse.urlencode, or Go's url.Values) that handle encoding automatically.
Instead of manually concatenating URL strings, use your language's URL builder:
// JavaScript -- URL and URLSearchParams
const url = new URL("https://example.com/search");
url.searchParams.set("q", "hello world & goodbye");
url.searchParams.set("page", "1");
url.toString();
// "https://example.com/search?q=hello+world+%26+goodbye&page=1"
# Python -- urllib.parse
from urllib.parse import urlencode, urljoin
base = "https://example.com/search"
params = urlencode({"q": "hello world & goodbye", "page": "1"})
full_url = f"{base}?{params}"
These APIs handle the details correctly: they encode parameter values, insert the proper delimiters, and avoid double encoding.
When you receive URL-encoded data (from query parameters, form submissions, or API responses), decode it immediately into its natural form for processing. When you need to include data in a URL, encode it at the last moment before constructing the URL. This "decode early, encode late" principle keeps your application logic clean and prevents encoding errors from propagating.
Always validate user input after URL decoding, not before. A malicious input like %3Cscript%3E will pass validation if you check the encoded form (it looks harmless), but after decoding it becomes <script>. Security validation (XSS prevention, SQL injection prevention, path traversal prevention) must always operate on the decoded data.
RFC 3986 recommends using uppercase hexadecimal digits in percent-encoded triplets (%2F rather than %2f). While most decoders accept either, using uppercase is the normalized form and ensures maximum interoperability.
Encoding unreserved characters (like letters, digits, hyphens, and underscores) is technically valid but unnecessary. It makes URLs harder to read and violates the normalization rules of RFC 3986. For example, %41 should be A, and %2D should be -.
When building URL handling code, test with these edge cases:
:/?#[]@!$&'()*+,;=Our free URL Encoder/Decoder tool makes it easy to encode and decode URLs and URL components directly in your browser. No data is sent to any server -- all processing happens locally on your machine.
Paste any text and instantly get the URL-encoded output. The tool supports both RFC 3986 percent encoding (spaces as %20) and form encoding (spaces as +), so you can choose the format that matches your use case.
Paste a URL-encoded string and see the decoded output immediately. The tool automatically handles both %20 and + as spaces. Multi-byte UTF-8 sequences are decoded correctly, and invalid sequences are flagged with clear error messages.
Stop guessing which characters need encoding. Use our free tool to encode and decode URLs, query parameters, and international text right in your browser -- with zero data sent to any server.
Try the URL Encoder/Decoder Now
URL encoding, also called percent encoding, is a mechanism for converting characters that are not allowed in a URL into a safe representation. Each unsafe character is replaced with a percent sign (%) followed by two hexadecimal digits representing the character's byte value. For example, a space becomes %20 and an ampersand becomes %26.
encodeURI() encodes a full URI and preserves characters that have special meaning in URLs, such as :, /, ?, #, &, and =. encodeURIComponent() encodes a single URI component (like a query parameter value) and does encode those special characters. Use encodeURIComponent() for encoding individual values and encodeURI() for encoding a complete URL.
The %20 encoding comes from RFC 3986 (generic URI syntax) and is universally valid in any part of a URL. The + encoding for spaces comes from the application/x-www-form-urlencoded format defined in the HTML specification, which is used specifically for form submissions and query strings. Both are correct in their respective contexts, but %20 is the safer choice when you are unsure.
Characters that must be URL encoded include: spaces, non-ASCII characters (accented letters, CJK characters, emoji), and any reserved characters when used outside their special purpose. Reserved characters include : / ? # [ ] @ ! $ & ' ( ) * + , ; =. Unreserved characters that never need encoding are A-Z, a-z, 0-9, hyphen (-), underscore (_), period (.), and tilde (~).
International characters are first converted to their UTF-8 byte representation, and then each byte is percent-encoded. For example, the German u-umlaut (U+00FC) is encoded as %C3%BC because its UTF-8 representation is the two bytes C3 and BC. A Chinese character may require three percent-encoded bytes. Modern browsers display the decoded characters in the address bar for readability but send the encoded form in HTTP requests.
RFC 3986 is the current standard for Uniform Resource Identifier (URI) syntax, published in 2005. It defines which characters are allowed in each part of a URI, which characters are reserved, and how percent-encoding must be performed. It matters because it is the authoritative specification that browsers, servers, and libraries follow when constructing and parsing URLs. Following RFC 3986 ensures your URLs are interoperable across all platforms.
Master Base64 encoding and decoding, the algorithm, common use cases, Base64URL differences, and code examples in multiple languages.
Learn JWT structure, claims, signing algorithms, and how to decode and verify tokens for secure authentication.
Master JSON syntax, formatting best practices, validation techniques, and common parsing errors.