hub / github.com/Textualize/rich / split_graphemes

Function split_graphemes

rich/cells.py:161–232 · view source on GitHub ↗

Divide text into spans that define a single grapheme, and additionally return the cell length of the whole string. The returned spans will cover every index in the string, with no gaps. It is possible for some graphemes to have a cell length of zero. This can occur for nonsense strings like

(
    text: str, unicode_version: str = "auto"
)

Source from the content-addressed store, hash-verified

159
160
161	def split_graphemes(
162	text: str, unicode_version: str = "auto"
163	) -> "tuple[list[CellSpan], int]":
164	"""Divide text into spans that define a single grapheme, and additionally return the cell length of the whole string.
165
166	The returned spans will cover every index in the string, with no gaps. It is possible for some graphemes to have a cell length of zero.
167	This can occur for nonsense strings like two zero width joiners, or for control codes that don't contribute to the grapheme size.
168
169	Args:
170	text: String to split.
171	unicode_version: Unicode version, `"auto"` to auto detect, `"latest"` for the latest unicode version.
172
173	Returns:
174	A tuple of a list of spans and the cell length of the entire string. A span is a list of tuples
175	of three values consisting of (<START>, <END>, <CELL LENGTH>), where START and END are string indices,
176	and CELL LENGTH is the cell length of the single grapheme.
177	"""
178
179	cell_table = load_cell_table(unicode_version)
180	codepoint_count = len(text)
181	index = 0
182	last_measured_character: str \| None = None
183
184	total_width = 0
185	spans: list[tuple[int, int, int]] = []
186	SPECIAL = {"\u200d", "\ufe0f"}
187	while index < codepoint_count:
188	if (character := text[index]) in SPECIAL:
189	if not spans:
190	# ZWJ or variation selector at the beginning of the string doesn't really make sense.
191	# But handle it, we must.
192	spans.append((index, index := index + 1, 0))
193	continue
194	if character == "\u200d":
195	# zero width joiner
196	# The condition handles the case where a ZWJ is at the end of the string, and has nothing to join
197	index += 2 if index < (codepoint_count - 1) else 1
198	start, _end, cell_length = spans[-1]
199	spans[-1] = (start, index, cell_length)
200	else:
201	# variation selector 16
202	index += 1
203	if last_measured_character:
204	start, _end, cell_length = spans[-1]
205	if last_measured_character in cell_table.narrow_to_wide:
206	last_measured_character = None
207	cell_length += 1
208	total_width += 1
209	spans[-1] = (start, index, cell_length)
210	else:
211	# No previous character to change the size of.
212	# Shouldn't occur in practice.
213	# But handle it, we must.
214	start, _end, cell_length = spans[-1]
215	spans[-1] = (start, index, cell_length)
216	continue
217
218	if character_width := get_character_cell_size(character, unicode_version):

Callers 3

test_split_graphemesFunction · 0.90

_split_textFunction · 0.85

chop_cellsFunction · 0.85

Calls 2

get_character_cell_sizeFunction · 0.85

appendMethod · 0.45

Tested by 1

test_split_graphemesFunction · 0.72