sudachipy.tokenizer package
Note
- Import from
sudachipy.tokenizeris deprecated. Use
from sudachipy import Tokenizerinstead.You can also import
SplitMode:from sudachipy import SplitMode.
- Import from
Module contents
- class sudachipy.tokenizer.Tokenizer
A sudachi tokenizer
Create using Dictionary.create method.
- SplitMode = SplitMode.C
- mode
SplitMode of the tokenizer.
- tokenize(self, /, text: str, mode=None, logger=None, out=None) MorphemeList
–
Break text into morphemes.
- Parameters:
text (str) – text to analyze.
mode (SplitMode | str | None) – analysis mode. This parameter is deprecated. Pass the analysis mode at the Tokenizer creation time and create different tokenizers for different modes. If you need multi-level splitting, prefer using
Morpheme.split()method instead.logger – Arg for v0.5.* compatibility. Ignored.
out (MorphemeList) – tokenization results will be written into this MorphemeList, a new one will be created instead. See https://worksapplications.github.io/sudachi.rs/python/topics/out_param.html for details.