sudachipy.tokenizer package
Note
- Import from
sudachipy.tokenizer
is deprecated. Use
from sudachipy import Tokenizer
instead.You can also import
SplitMode
:from sudachipy import SplitMode
.
- Import from
Module contents
- class sudachipy.tokenizer.Tokenizer
A sudachi tokenizer
Create using Dictionary.create method.
- SplitMode = SplitMode.C
- mode
SplitMode of the tokenizer.
- tokenize(self, /, text: str, mode=None, logger=None, out=None) MorphemeList
–
Break text into morphemes.
- Parameters:
text (str) – text to analyze.
mode (SplitMode | str | None) – analysis mode. This parameter is deprecated. Pass the analysis mode at the Tokenizer creation time and create different tokenizers for different modes. If you need multi-level splitting, prefer using
Morpheme.split()
method instead.logger – Arg for v0.5.* compatibility. Ignored.
out (MorphemeList) – tokenization results will be written into this MorphemeList, a new one will be created instead. See https://worksapplications.github.io/sudachi.rs/python/topics/out_param.html for details.