MCPcopy Index your code
hub / github.com/python/cpython / Charset

Class Charset

Lib/email/charset.py:162–398  ·  view source on GitHub ↗

Map character sets to their email properties. This class provides information about the requirements imposed on email for a specific character set. It also provides convenience routines for converting between character sets, given the availability of the applicable codecs. Given a

Source from the content-addressed store, hash-verified

160
161
162class Charset:
163 """Map character sets to their email properties.
164
165 This class provides information about the requirements imposed on email
166 for a specific character set. It also provides convenience routines for
167 converting between character sets, given the availability of the
168 applicable codecs. Given a character set, it will do its best to provide
169 information on how to use that character set in an email in an
170 RFC-compliant way.
171
172 Certain character sets must be encoded with quoted-printable or base64
173 when used in email headers or bodies. Certain character sets must be
174 converted outright, and are not allowed in email. Instances of this
175 module expose the following information about a character set:
176
177 input_charset: The initial character set specified. Common aliases
178 are converted to their 'official' email names (e.g. latin_1
179 is converted to iso-8859-1). Defaults to 7-bit us-ascii.
180
181 header_encoding: If the character set must be encoded before it can be
182 used in an email header, this attribute will be set to
183 charset.QP (for quoted-printable), charset.BASE64 (for
184 base64 encoding), or charset.SHORTEST for the shortest of
185 QP or BASE64 encoding. Otherwise, it will be None.
186
187 body_encoding: Same as header_encoding, but describes the encoding for the
188 mail message's body, which indeed may be different than the
189 header encoding. charset.SHORTEST is not allowed for
190 body_encoding.
191
192 output_charset: Some character sets must be converted before they can be
193 used in email headers or bodies. If the input_charset is
194 one of them, this attribute will contain the name of the
195 charset output will be converted to. Otherwise, it will
196 be None.
197
198 input_codec: The name of the Python codec used to convert the
199 input_charset to Unicode. If no conversion codec is
200 necessary, this attribute will be None.
201
202 output_codec: The name of the Python codec used to convert Unicode
203 to the output_charset. If no conversion codec is necessary,
204 this attribute will have the same value as the input_codec.
205 """
206 def __init__(self, input_charset=DEFAULT_CHARSET):
207 # RFC 2046, $4.1.2 says charsets are not case sensitive. We coerce to
208 # unicode because its .lower() is locale insensitive. If the argument
209 # is already a unicode, we leave it at that, but ensure that the
210 # charset is ASCII, as the standard (RFC XXX) requires.
211 try:
212 if isinstance(input_charset, str):
213 input_charset.encode('ascii')
214 else:
215 input_charset = str(input_charset, 'ascii')
216 except UnicodeError:
217 raise errors.CharsetError(input_charset)
218 input_charset = input_charset.lower()
219 # Set the input charset after filtering through the aliases

Calls

no outgoing calls

Used in the wild real call sites across dependent graphs

searching dependent graphs…