MCPcopy
hub / github.com/huggingface/transformers / add

Method add

src/transformers/tokenization_python.py:67–96  ·  view source on GitHub ↗

Passes over every char (utf-8 char) on word and recursively adds it to the internal `data` trie representation. The special key `""` in `self._termination_char` is used to represent termination. This function is idempotent, adding twice the same word will leave the trie unc

(self, word: str)

Source from the content-addressed store, hash-verified

65 self.add(token)
66
67 def add(self, word: str):
68 """
69 Passes over every char (utf-8 char) on word and recursively adds it to the internal `data` trie representation.
70 The special key `""` in `self._termination_char` is used to represent termination.
71
72 This function is idempotent, adding twice the same word will leave the trie unchanged
73
74 Example:
75
76 ```python
77 >>> trie = Trie()
78 >>> trie.add("Hello 友達")
79 >>> trie.data
80 {"H": {"e": {"l": {"l": {"o": {" ": {"友": {"達": {"": 1}}}}}}}}}
81
82 >>> trie.add("Hello")
83 >>> trie.data
84 {"H": {"e": {"l": {"l": {"o": {"": 1, " ": {"友": {"達": {"": 1}}}}}}}}}
85 ```
86 """
87 if not word:
88 # Prevent empty string
89 return
90
91 self._tokens.add(word)
92 ref = self.data
93 for char in word:
94 ref[char] = ref.setdefault(char, {})
95 ref = ref[char]
96 ref[self._termination_char] = 1
97
98 def split(self, text: str) -> list[str]:
99 """

Callers 15

updateMethod · 0.95
test_trieMethod · 0.95
test_trie_splitMethod · 0.95
test_trie_singleMethod · 0.95
test_trie_finalMethod · 0.95
test_trie_subtokensMethod · 0.95
test_trie_skipMethod · 0.95
parse_lineFunction · 0.45
parse_summary_fileFunction · 0.45

Calls 1

setdefaultMethod · 0.80

Tested by 10

test_trieMethod · 0.76
test_trie_splitMethod · 0.76
test_trie_singleMethod · 0.76
test_trie_finalMethod · 0.76
test_trie_subtokensMethod · 0.76
test_trie_skipMethod · 0.76
parse_summary_fileFunction · 0.36
warn_onceFunction · 0.36