URL Encoding, also known as percent-encoding, is a mechanism for encoding information in a Uniform Resource Identifier (URI) under specific circumstances. It replaces certain characters with a "%" followed by two hexadecimal digits representing the ASCII code of the character. This is primarily used to handle characters that are not allowed in URLs or to safely transmit non-ASCII data over HTTP.
History and Purpose
The practice of URL encoding has its roots in the early days of the internet, where the need for a standardized way to encode non-alphanumeric characters in URLs became apparent:
- In 1991, the first draft of the URI specification was published, which included a basic form of percent-encoding.
- By 1994, with the publication of RFC 1630, the concept was formalized, specifying that non-ASCII characters and unsafe characters should be encoded.
- RFC 1738 in 1994 expanded on this, defining the rules for encoding data in URLs for various protocols.
- Subsequent updates like RFC 2396 in 1998, RFC 3986 in 2005, and others refined the encoding rules to ensure compatibility and clarity in HTTP requests.
Encoding Rules
The key rules for URL encoding include:
- Alphanumeric characters (A-Z, a-z, 0-9) are not encoded.
- Safe characters like "-", "_", ".", and "~" are not encoded.
- Reserved characters like "/", "?", "&", and "=" are encoded when they are not used for their intended purpose in the URL.
- Other characters, including spaces, are encoded. Space can be encoded as "+" (application/x-www-form-urlencoded) or "%20" (percent-encoding).
Examples of Encoded Characters
- Space: %20 or +
- Exclamation mark (!): %21
- Number sign (#): %23
- Percent sign (%): %25
Applications
URL encoding is crucial in various contexts:
- In HTML forms, where data submitted via POST or GET methods needs encoding.
- In query strings to handle parameters in URLs.
- For passing data in HTTP headers or cookies.
Implementation
Most programming languages and web frameworks provide built-in functions or libraries to perform URL encoding:
- JavaScript has the `encodeURIComponent()` function.
- Python uses `urllib.parse.quote()` from the standard library.
- PHP has `urlencode()` and `rawurlencode()` functions.
External Links:
Related Topics: