Class ICUNormalizer2CharFilter

All Implemented Interfaces:
Closeable, AutoCloseable, Readable

public final class ICUNormalizer2CharFilter extends BaseCharFilter
Normalize token text with ICU's Normalizer2.
  • Field Details

    • normalizer

      private final com.ibm.icu.text.Normalizer2 normalizer
    • inputBuffer

      private final StringBuilder inputBuffer
    • resultBuffer

      private final StringBuilder resultBuffer
    • inputFinished

      private boolean inputFinished
    • afterQuickCheckYes

      private boolean afterQuickCheckYes
    • checkedInputBoundary

      private int checkedInputBoundary
    • charCount

      private int charCount
    • tmpBuffer

      private final CharacterUtils.CharacterBuffer tmpBuffer
  • Constructor Details

    • ICUNormalizer2CharFilter

      public ICUNormalizer2CharFilter(Reader in)
      Create a new Normalizer2CharFilter that combines NFKC normalization, Case Folding, and removes Default Ignorables (NFKC_Casefold)
    • ICUNormalizer2CharFilter

      public ICUNormalizer2CharFilter(Reader in, com.ibm.icu.text.Normalizer2 normalizer)
      Create a new Normalizer2CharFilter with the specified Normalizer2
      Parameters:
      in - text
      normalizer - normalizer to use
    • ICUNormalizer2CharFilter

      ICUNormalizer2CharFilter(Reader in, com.ibm.icu.text.Normalizer2 normalizer, int bufferSize)
  • Method Details

    • read

      public int read(char[] cbuf, int off, int len) throws IOException
      Specified by:
      read in class Reader
      Throws:
      IOException
    • readInputToBuffer

      private void readInputToBuffer() throws IOException
      Throws:
      IOException
    • readAndNormalizeFromInput

      private int readAndNormalizeFromInput()
    • readFromInputWhileSpanQuickCheckYes

      private int readFromInputWhileSpanQuickCheckYes()
    • readFromIoNormalizeUptoBoundary

      private int readFromIoNormalizeUptoBoundary()
    • normalizeInputUpto

      private int normalizeInputUpto(int length)
    • recordOffsetDiff

      private void recordOffsetDiff(int inputLength, int outputLength)
    • outputFromResultBuffer

      private int outputFromResultBuffer(char[] cbuf, int begin, int len)