Map character sets to their email properties. This class provides information about the requirements imposed on email for a specific character set. It also provides convenience routines for converting between character sets, given the availability of the applicable codecs. Given a
| 160 | |
| 161 | |
| 162 | class Charset: |
| 163 | """Map character sets to their email properties. |
| 164 | |
| 165 | This class provides information about the requirements imposed on email |
| 166 | for a specific character set. It also provides convenience routines for |
| 167 | converting between character sets, given the availability of the |
| 168 | applicable codecs. Given a character set, it will do its best to provide |
| 169 | information on how to use that character set in an email in an |
| 170 | RFC-compliant way. |
| 171 | |
| 172 | Certain character sets must be encoded with quoted-printable or base64 |
| 173 | when used in email headers or bodies. Certain character sets must be |
| 174 | converted outright, and are not allowed in email. Instances of this |
| 175 | module expose the following information about a character set: |
| 176 | |
| 177 | input_charset: The initial character set specified. Common aliases |
| 178 | are converted to their 'official' email names (e.g. latin_1 |
| 179 | is converted to iso-8859-1). Defaults to 7-bit us-ascii. |
| 180 | |
| 181 | header_encoding: If the character set must be encoded before it can be |
| 182 | used in an email header, this attribute will be set to |
| 183 | charset.QP (for quoted-printable), charset.BASE64 (for |
| 184 | base64 encoding), or charset.SHORTEST for the shortest of |
| 185 | QP or BASE64 encoding. Otherwise, it will be None. |
| 186 | |
| 187 | body_encoding: Same as header_encoding, but describes the encoding for the |
| 188 | mail message's body, which indeed may be different than the |
| 189 | header encoding. charset.SHORTEST is not allowed for |
| 190 | body_encoding. |
| 191 | |
| 192 | output_charset: Some character sets must be converted before they can be |
| 193 | used in email headers or bodies. If the input_charset is |
| 194 | one of them, this attribute will contain the name of the |
| 195 | charset output will be converted to. Otherwise, it will |
| 196 | be None. |
| 197 | |
| 198 | input_codec: The name of the Python codec used to convert the |
| 199 | input_charset to Unicode. If no conversion codec is |
| 200 | necessary, this attribute will be None. |
| 201 | |
| 202 | output_codec: The name of the Python codec used to convert Unicode |
| 203 | to the output_charset. If no conversion codec is necessary, |
| 204 | this attribute will have the same value as the input_codec. |
| 205 | """ |
| 206 | def __init__(self, input_charset=DEFAULT_CHARSET): |
| 207 | # RFC 2046, $4.1.2 says charsets are not case sensitive. We coerce to |
| 208 | # unicode because its .lower() is locale insensitive. If the argument |
| 209 | # is already a unicode, we leave it at that, but ensure that the |
| 210 | # charset is ASCII, as the standard (RFC XXX) requires. |
| 211 | try: |
| 212 | if isinstance(input_charset, str): |
| 213 | input_charset.encode('ascii') |
| 214 | else: |
| 215 | input_charset = str(input_charset, 'ascii') |
| 216 | except UnicodeError: |
| 217 | raise errors.CharsetError(input_charset) |
| 218 | input_charset = input_charset.lower() |
| 219 | # Set the input charset after filtering through the aliases |
no outgoing calls
searching dependent graphs…