sudachipy.tokenizer package
Note
- Import from
sudachipy.tokenizer
is deprecated. Use
from sudachipy import Tokenizer
instead.You can also import
SplitMode
:from sudachipy import SplitMode
.
- Import from
Module contents
- class sudachipy.tokenizer.Tokenizer
Sudachi Tokenizer, Python version
- SplitMode = SplitMode.C
- mode
- tokenize($self, text: str, mode = None, logger = None, out = None) sudachipy.MorphemeList
–
Break text into morphemes.
SudachiPy 0.5.* had logger parameter, it is accepted, but ignored.
- Parameters:
text (str) – text to analyze
mode (sudachipy.SplitMode) – analysis mode. This parameter is deprecated. Pass the analysis mode at the Tokenizer creation time and create different tokenizers for different modes. If you need multi-level splitting, prefer using
Morpheme.split()
method instead.out (sudachipy.MorphemeList) – tokenization results will be written into this MorphemeList, a new one will be created instead. See https://worksapplications.github.io/sudachi.rs/python/topics/out_param.html for details.