sudachi::input_text

Struct InputBuffer

source
pub struct InputBuffer { /* private fields */ }
Expand description

InputBuffer - prepares the input data for the analysis

By saying char we actually mean Unicode codepoint here. In the context of this struct these terms are synonyms.

Implementations§

source§

impl InputBuffer

source

pub fn new() -> InputBuffer

Creates new InputBuffer

source

pub fn reset(&mut self) -> &mut String

Resets the input buffer, so it could be used to process new input. New input should be written to the returned mutable reference.

source

pub fn from<T: AsRef<str>>(data: T) -> InputBuffer

Creates input from the passed string. Should be used mostly for tests.

Panics if the input string is too long.

source

pub fn start_build(&mut self) -> SudachiResult<()>

Moves InputBuffer into RW state, making it possible to perform edits on it

source

pub fn build(&mut self, grammar: &Grammar<'_>) -> SudachiResult<()>

Finalizes InputBuffer state, making it RO

source

pub fn with_editor<'a, F>(&mut self, func: F) -> SudachiResult<()>

Execute a function which can modify the contents of the current buffer

Edit can borrow &str from the context with the borrow checker working correctly

source

pub fn refresh_chars(&mut self)

Recompute chars from modified string (useful if the processing will use chars)

source§

impl InputBuffer

source

pub fn original(&self) -> &str

Borrow original data

source

pub fn current(&self) -> &str

Borrow modified data

source

pub fn current_chars(&self) -> &[char]

Borrow array of current characters

source

pub fn curr_byte_offsets(&self) -> &[usize]

Returns byte offsets of current chars

source

pub fn get_original_index(&self, index: usize) -> usize

Get index of the current byte in original sentence Bytes not on character boundaries are not supported

source

pub fn to_orig_byte_idx(&self, index: usize) -> usize

Mod Char Idx -> Orig Byte Idx

source

pub fn to_orig_char_idx(&self, index: usize) -> usize

Mod Char Idx -> Orig Char Idx

source

pub fn to_curr_byte_idx(&self, index: usize) -> usize

Mod Char Idx -> Mod Byte Idx

source

pub fn curr_slice_c(&self, data: Range<usize>) -> &str

Input: Mod Char Idx

source

pub fn orig_slice_c(&self, data: Range<usize>) -> &str

Input: Mod Char Idx

source

pub fn ch_idx(&self, idx: usize) -> usize

source

pub fn swap_original(&mut self, target: &mut String)

Swaps original data with the passed location

source

pub fn into_original(self) -> String

Return original data as owned, consuming itself

source

pub fn can_bow(&self, offset: usize) -> bool

Whether the byte can start a new word. Supports bytes not on character boundaries.

source

pub fn get_word_candidate_length(&self, char_idx: usize) -> usize

Returns char length to the next can_bow point

Used by SimpleOOV plugin

Trait Implementations§

source§

impl Clone for InputBuffer

source§

fn clone(&self) -> InputBuffer

Returns a copy of the value. Read more
1.0.0 · source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
source§

impl Default for InputBuffer

source§

fn default() -> InputBuffer

Returns the “default value” for a type. Read more
source§

impl InputTextIndex for InputBuffer

source§

fn cat_of_range(&self, range: Range<usize>) -> CategoryType

Common character category inside the range. Indexed by chars.
source§

fn cat_at_char(&self, offset: usize) -> CategoryType

Character category at char offset
source§

fn cat_continuous_len(&self, offset: usize) -> usize

Number of chars to the right of the offset with the same character category Read more
source§

fn char_distance(&self, cpt: usize, offset: usize) -> usize

Distance in chars between the char indexed by index and the char, relative to it by offset. Java name: getCodePointsOffsetLength
source§

fn orig_slice(&self, range: Range<usize>) -> &str

Returns substring of original text by indices from the current text
source§

fn curr_slice(&self, range: Range<usize>) -> &str

Returns substring of the current (modified) text by indices from the current text
source§

fn to_orig(&self, range: Range<usize>) -> Range<usize>

Translate range from current state to original. Byte-indexed.

Auto Trait Implementations§

Blanket Implementations§

source§

impl<T> Any for T
where T: 'static + ?Sized,

source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
source§

impl<T> Borrow<T> for T
where T: ?Sized,

source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
source§

impl<T> CloneToUninit for T
where T: Clone,

source§

unsafe fn clone_to_uninit(&self, dst: *mut T)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dst. Read more
source§

impl<T> From<T> for T

source§

fn from(t: T) -> T

Returns the argument unchanged.

source§

impl<T, U> Into<U> for T
where U: From<T>,

source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

source§

impl<T> IntoEither for T

source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
source§

impl<T> ToOwned for T
where T: Clone,

source§

type Owned = T

The resulting type after obtaining ownership.
source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

source§

type Error = Infallible

The type returned in the event of a conversion error.
source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.