Passes over every char (utf-8 char) on word and recursively adds it to the internal `data` trie representation. The special key `""` in `self._termination_char` is used to represent termination. This function is idempotent, adding twice the same word will leave the trie unc
(self, word: str)
| 65 | self.add(token) |
| 66 | |
| 67 | def add(self, word: str): |
| 68 | """ |
| 69 | Passes over every char (utf-8 char) on word and recursively adds it to the internal `data` trie representation. |
| 70 | The special key `""` in `self._termination_char` is used to represent termination. |
| 71 | |
| 72 | This function is idempotent, adding twice the same word will leave the trie unchanged |
| 73 | |
| 74 | Example: |
| 75 | |
| 76 | ```python |
| 77 | >>> trie = Trie() |
| 78 | >>> trie.add("Hello 友達") |
| 79 | >>> trie.data |
| 80 | {"H": {"e": {"l": {"l": {"o": {" ": {"友": {"達": {"": 1}}}}}}}}} |
| 81 | |
| 82 | >>> trie.add("Hello") |
| 83 | >>> trie.data |
| 84 | {"H": {"e": {"l": {"l": {"o": {"": 1, " ": {"友": {"達": {"": 1}}}}}}}}} |
| 85 | ``` |
| 86 | """ |
| 87 | if not word: |
| 88 | # Prevent empty string |
| 89 | return |
| 90 | |
| 91 | self._tokens.add(word) |
| 92 | ref = self.data |
| 93 | for char in word: |
| 94 | ref[char] = ref.setdefault(char, {}) |
| 95 | ref = ref[char] |
| 96 | ref[self._termination_char] = 1 |
| 97 | |
| 98 | def split(self, text: str) -> list[str]: |
| 99 | """ |