sudachipy.dictionary package

Note

  • Import from sudachipy.dictionary is deprecated.
    • Use from sudachipy import Dictionary instead.

  • Dictionary does not provide an access to the grammar and lexicon.

Module contents

class sudachipy.dictionary.Dictionary(config_path=None, resource_dir=None, dict=None, dict_type=None, *, config=None)

A sudachi dictionary

close()

Close this dictionary

create($self, mode: sudachipy.SplitMode = sudachipy.SplitMode.C) sudachipy.Tokenizer

Creates a sudachi tokenizer.

Parameters:
lookup($self, surface, out = None) sudachipy.MorphemeList

Look up morphemes in the binary dictionary without performing the analysis. All morphemes from the dictionary with the given surface string are returned, with the last user dictionary searched first and the system dictionary searched last. Inside a dictionary, morphemes are outputted in-binary-dictionary order. Morphemes which are not indexed are not returned.

Parameters:

type: out: sudachipy.MorphemeList

pos_matcher(target)

Creates a POS matcher object

If target is a function, then it must return whether a POS should match or not. If target a list, it should contain partially specified POS. By partially specified it means that it is possible to omit POS fields or use None as a sentinel value that matches any POS.

For example, (‘名詞’,) will match any noun and (None, None, None, None, None, ‘終止形‐一般’) will match any word in 終止形‐一般 conjugation form.

Parameters:

target – can be either a callable or list of POS partial tuples

pos_of()

Get POS Tuple by its id

pre_tokenizer($self, mode, fields, handler) tokenizers.PreTokenizer

Creates HuggingFace Tokenizers-compatible PreTokenizer. Requires package tokenizers to be installed.

Parameters: