MCPcopy Index your code
hub / github.com/python/cpython / SequenceMatcher

Class SequenceMatcher

Lib/difflib.py:45–664  ·  view source on GitHub ↗

SequenceMatcher is a flexible class for comparing pairs of sequences of any type, so long as the sequence elements are hashable. The basic algorithm predates, and is a little fancier than, an algorithm published in the late 1980's by Ratcliff and Obershelp under the hyperbolic

Source from the content-addressed store, hash-verified

43 return 1.0
44
45class SequenceMatcher:
46
47 """
48 SequenceMatcher is a flexible class for comparing pairs of sequences of
49 any type, so long as the sequence elements are hashable. The basic
50 algorithm predates, and is a little fancier than, an algorithm
51 published in the late 1980's by Ratcliff and Obershelp under the
52 hyperbolic name "gestalt pattern matching". The basic idea is to find
53 the longest contiguous matching subsequence that contains no "junk"
54 elements (R-O doesn't address junk). The same idea is then applied
55 recursively to the pieces of the sequences to the left and to the right
56 of the matching subsequence. This does not yield minimal edit
57 sequences, but does tend to yield matches that "look right" to people.
58
59 SequenceMatcher tries to compute a "human-friendly diff" between two
60 sequences. Unlike e.g. UNIX(tm) diff, the fundamental notion is the
61 longest *contiguous* & junk-free matching subsequence. That's what
62 catches peoples' eyes. The Windows(tm) windiff has another interesting
63 notion, pairing up elements that appear uniquely in each sequence.
64 That, and the method here, appear to yield more intuitive difference
65 reports than does diff. This method appears to be the least vulnerable
66 to syncing up on blocks of "junk lines", though (like blank lines in
67 ordinary text files, or maybe "<P>" lines in HTML files). That may be
68 because this is the only method of the 3 that has a *concept* of
69 "junk" <wink>.
70
71 Example, comparing two strings, and considering blanks to be "junk":
72
73 >>> s = SequenceMatcher(lambda x: x == " ",
74 ... "private Thread currentThread;",
75 ... "private volatile Thread currentThread;")
76 >>>
77
78 .ratio() returns a float in [0, 1], measuring the "similarity" of the
79 sequences. As a rule of thumb, a .ratio() value over 0.6 means the
80 sequences are close matches:
81
82 >>> print(round(s.ratio(), 2))
83 0.87
84 >>>
85
86 If you&#x27;re only interested in where the sequences match,
87 .get_matching_blocks() is handy:
88
89 >>> for block in s.get_matching_blocks():
90 ... print("a[%d] and b[%d] match for %d elements" % block)
91 a[0] and b[0] match for 8 elements
92 a[8] and b[17] match for 21 elements
93 a[29] and b[38] match for 0 elements
94
95 Note that the last tuple returned by .get_matching_blocks() is always a
96 dummy, (len(a), len(b), 0), and this is the only case in which the last
97 tuple element (number of elements matched) is 0.
98
99 If you want to know how to change the first sequence into the second,
100 use .get_opcodes():
101
102 >>> for opcode in s.get_opcodes():

Callers 5

get_close_matchesFunction · 0.85
compareMethod · 0.85
_fancy_replaceMethod · 0.85
unified_diffFunction · 0.85
context_diffFunction · 0.85

Calls

no outgoing calls

Tested by

no test coverage detected

Used in the wild real call sites across dependent graphs

searching dependent graphs…