WordInfo subsetting
It is possible to ask Suachi to return only a subset of fields in the WordInfo.
To do that, you can use the fields
parameter of the sudachipy.Dictionary.create()
method.
The parameter accepts a set of strings, each one representing a field to be returned.
By default, all fields are returned.
Allowed values:
surface
: in-dictionary surface word form.sudachipy.Morpheme.surface()
method returns the slice of the input text and is not affected by that flag.
pos
orpos_id
: part-of-speech tag.normalized_form
dictionary_form
reading_form
word_structure
synonym_group_id
splits_a
splits_b
Note
If you want only tokenization (e.g. use only sudachipy.Morpheme.surface()
,
passing empty set is allowed.
You need to load splits if you want to use sudachipy.Morpheme.split()
method.
If performing the tokenization with non-default mode, the required splits will be loaded automatically and can be omitted.:
dic.create(SplitMode.B, fields={}) # implicitly becomes fields={'splits_b'}
Warning
Using a field not included in the passed subset will produce an incorrect result without any warning. Use tests to ensure that the needed fields are loaded when using this parameter.