Convert an Internationalized Resource Identifier (IRI) portion to a URI portion that is suitable for inclusion in a URL. This is the algorithm from RFC 3987 Section 3.1, slightly simplified since the input is assumed to be a string rather than an arbitrary byte stream. Take an
(iri)
| 105 | |
| 106 | |
| 107 | def iri_to_uri(iri): |
| 108 | """ |
| 109 | Convert an Internationalized Resource Identifier (IRI) portion to a URI |
| 110 | portion that is suitable for inclusion in a URL. |
| 111 | |
| 112 | This is the algorithm from RFC 3987 Section 3.1, slightly simplified since |
| 113 | the input is assumed to be a string rather than an arbitrary byte stream. |
| 114 | |
| 115 | Take an IRI (string or UTF-8 bytes, e.g. '/I ♥ Django/' or |
| 116 | b'/I \xe2\x99\xa5 Django/') and return a string containing the encoded |
| 117 | result with ASCII chars only (e.g. '/I%20%E2%99%A5%20Django/'). |
| 118 | """ |
| 119 | # The list of safe characters here is constructed from the "reserved" and |
| 120 | # "unreserved" characters specified in RFC 3986 Sections 2.2 and 2.3: |
| 121 | # reserved = gen-delims / sub-delims |
| 122 | # gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@" |
| 123 | # sub-delims = "!" / "$" / "&" / "'" / "(" / ")" |
| 124 | # / "*" / "+" / "," / ";" / "=" |
| 125 | # unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~" |
| 126 | # Of the unreserved characters, urllib.parse.quote() already considers all |
| 127 | # but the ~ safe. |
| 128 | # The % character is also added to the list of safe characters here, as the |
| 129 | # end of RFC 3987 Section 3.1 specifically mentions that % must not be |
| 130 | # converted. |
| 131 | if iri is None: |
| 132 | return iri |
| 133 | elif isinstance(iri, Promise): |
| 134 | iri = str(iri) |
| 135 | return quote(iri, safe="/#%[]=:;$&()+,!?*@'~") |
| 136 | |
| 137 | |
| 138 | # List of byte values that uri_to_iri() decodes from percent encoding. |