sudachipy.tokenizer package

Note

Import from sudachipy.tokenizer is deprecated.
- Use from sudachipy import Tokenizer instead.
- You can also import SplitMode: from sudachipy import SplitMode.

class sudachipy.tokenizer.Tokenizer

A sudachi tokenizer

Create using Dictionary.create method.

tokenize(self, /, text: str, mode=None, logger=None, out=None) → MorphemeList

–

Break text into morphemes.

Parameters:

text (str) – text to analyze.
mode (SplitMode | str | None) – analysis mode. This parameter is deprecated. Pass the analysis mode at the Tokenizer creation time and create different tokenizers for different modes. If you need multi-level splitting, prefer using Morpheme.split() method instead.
logger – Arg for v0.5.* compatibility. Ignored.
out (MorphemeList) – tokenization results will be written into this MorphemeList, a new one will be created instead. See https://worksapplications.github.io/sudachi.rs/python/topics/out_param.html for details.