sudachipy.tokenizer package

Note

  • Import from sudachipy.tokenizer is deprecated.
    • Use from sudachipy import Tokenizer instead.

    • You can also import SplitMode: from sudachipy import SplitMode.

Module contents

class sudachipy.tokenizer.Tokenizer

A sudachi tokenizer

Create using Dictionary.create method.

SplitMode = SplitMode.C
mode

SplitMode of the tokenizer.

tokenize(self, /, text: str, mode=None, logger=None, out=None) MorphemeList

Break text into morphemes.

Parameters:
  • text (str) – text to analyze.

  • mode (SplitMode | str | None) – analysis mode. This parameter is deprecated. Pass the analysis mode at the Tokenizer creation time and create different tokenizers for different modes. If you need multi-level splitting, prefer using Morpheme.split() method instead.

  • logger – Arg for v0.5.* compatibility. Ignored.

  • out (MorphemeList) – tokenization results will be written into this MorphemeList, a new one will be created instead. See https://worksapplications.github.io/sudachi.rs/python/topics/out_param.html for details.