MCPcopy Index your code
hub / github.com/python/cpython / detect_encoding

Function detect_encoding

Lib/tokenize.py:358–454  ·  view source on GitHub ↗

The detect_encoding() function is used to detect the encoding that should be used to decode a Python source file. It requires one argument, readline, in the same way as the tokenize() generator. It will call readline a maximum of twice, and return the encoding used (as a strin

(readline)

Source from the content-addressed store, hash-verified

356 return orig_enc
357
358def detect_encoding(readline):
359 """
360 The detect_encoding() function is used to detect the encoding that should
361 be used to decode a Python source file. It requires one argument, readline,
362 in the same way as the tokenize() generator.
363
364 It will call readline a maximum of twice, and return the encoding used
365 (as a string) and a list of any lines (left as bytes) it has read in.
366
367 It detects the encoding from the presence of a utf-8 bom or an encoding
368 cookie as specified in pep-0263. If both a bom and a cookie are present,
369 but disagree, a SyntaxError will be raised. If the encoding cookie is an
370 invalid charset, raise a SyntaxError. Note that if a utf-8 bom is found,
371 'utf-8-sig' is returned.
372
373 If no encoding is specified, then the default of 'utf-8' will be returned.
374 """
375 try:
376 filename = readline.__self__.name
377 except AttributeError:
378 filename = None
379 bom_found = False
380 encoding = None
381 default = 'utf-8'
382 def read_or_stop():
383 try:
384 return readline()
385 except StopIteration:
386 return b''
387
388 def check(line, encoding):
389 # Check if the line matches the encoding.
390 if 0 in line:
391 raise SyntaxError("source code cannot contain null bytes")
392 try:
393 line.decode(encoding)
394 except UnicodeDecodeError:
395 msg = "invalid or missing encoding declaration"
396 if filename is not None:
397 msg = '{} for {!r}'.format(msg, filename)
398 raise SyntaxError(msg)
399
400 def find_cookie(line):
401 match = cookie_re.match(line)
402 if not match:
403 return None
404 encoding = _get_normal_name(match.group(1).decode())
405 try:
406 lookup(encoding)
407 except LookupError:
408 # This behaviour mimics the Python interpreter
409 if filename is None:
410 msg = "unknown encoding: " + encoding
411 else:
412 msg = "unknown encoding for {!r}: {}".format(filename,
413 encoding)
414 raise SyntaxError(msg)
415

Callers 2

openFunction · 0.70
tokenizeFunction · 0.70

Calls 5

read_or_stopFunction · 0.85
find_cookieFunction · 0.85
checkFunction · 0.70
startswithMethod · 0.45
matchMethod · 0.45

Tested by

no test coverage detected

Used in the wild real call sites across dependent graphs

searching dependent graphs…