MCPcopy
hub / github.com/Textualize/rich / split_graphemes

Function split_graphemes

rich/cells.py:161–232  ·  view source on GitHub ↗

Divide text into spans that define a single grapheme, and additionally return the cell length of the whole string. The returned spans will cover every index in the string, with no gaps. It is possible for some graphemes to have a cell length of zero. This can occur for nonsense strings like

(
    text: str, unicode_version: str = "auto"
)

Source from the content-addressed store, hash-verified

159
160
161def split_graphemes(
162 text: str, unicode_version: str = "auto"
163) -> "tuple[list[CellSpan], int]":
164 """Divide text into spans that define a single grapheme, and additionally return the cell length of the whole string.
165
166 The returned spans will cover every index in the string, with no gaps. It is possible for some graphemes to have a cell length of zero.
167 This can occur for nonsense strings like two zero width joiners, or for control codes that don't contribute to the grapheme size.
168
169 Args:
170 text: String to split.
171 unicode_version: Unicode version, `"auto"` to auto detect, `"latest"` for the latest unicode version.
172
173 Returns:
174 A tuple of a list of *spans* and the cell length of the entire string. A span is a list of tuples
175 of three values consisting of (<START>, <END>, <CELL LENGTH>), where START and END are string indices,
176 and CELL LENGTH is the cell length of the single grapheme.
177 """
178
179 cell_table = load_cell_table(unicode_version)
180 codepoint_count = len(text)
181 index = 0
182 last_measured_character: str | None = None
183
184 total_width = 0
185 spans: list[tuple[int, int, int]] = []
186 SPECIAL = {"\u200d", "\ufe0f"}
187 while index < codepoint_count:
188 if (character := text[index]) in SPECIAL:
189 if not spans:
190 # ZWJ or variation selector at the beginning of the string doesn't really make sense.
191 # But handle it, we must.
192 spans.append((index, index := index + 1, 0))
193 continue
194 if character == "\u200d":
195 # zero width joiner
196 # The condition handles the case where a ZWJ is at the end of the string, and has nothing to join
197 index += 2 if index < (codepoint_count - 1) else 1
198 start, _end, cell_length = spans[-1]
199 spans[-1] = (start, index, cell_length)
200 else:
201 # variation selector 16
202 index += 1
203 if last_measured_character:
204 start, _end, cell_length = spans[-1]
205 if last_measured_character in cell_table.narrow_to_wide:
206 last_measured_character = None
207 cell_length += 1
208 total_width += 1
209 spans[-1] = (start, index, cell_length)
210 else:
211 # No previous character to change the size of.
212 # Shouldn't occur in practice.
213 # But handle it, we must.
214 start, _end, cell_length = spans[-1]
215 spans[-1] = (start, index, cell_length)
216 continue
217
218 if character_width := get_character_cell_size(character, unicode_version):

Callers 3

test_split_graphemesFunction · 0.90
_split_textFunction · 0.85
chop_cellsFunction · 0.85

Calls 2

get_character_cell_sizeFunction · 0.85
appendMethod · 0.45

Tested by 1

test_split_graphemesFunction · 0.72