Class OrdinalMap

java.lang.Object
org.apache.lucene.index.OrdinalMap
All Implemented Interfaces:
Accountable

public class OrdinalMap extends Object implements Accountable
Maps per-segment ordinals to/from global ordinal space, using a compact packed-ints representation.

NOTE: this is a costly operation, as it must merge sort all terms, and may require non-trivial RAM once done. It's better to operate in segment-private ordinal space instead when possible.

  • Field Details

    • BASE_RAM_BYTES_USED

      private static final long BASE_RAM_BYTES_USED
    • owner

      public final IndexReader.CacheKey owner
      Cache key of whoever asked for this awful thing
    • valueCount

      final long valueCount
    • globalOrdDeltas

      final LongValues globalOrdDeltas
    • firstSegments

      final LongValues firstSegments
    • segmentToGlobalOrds

      final LongValues[] segmentToGlobalOrds
    • segmentMap

      final OrdinalMap.SegmentMap segmentMap
    • ramBytesUsed

      final long ramBytesUsed
  • Constructor Details

    • OrdinalMap

      OrdinalMap(IndexReader.CacheKey owner, TermsEnum[] subs, OrdinalMap.SegmentMap segmentMap, float acceptableOverheadRatio) throws IOException
      Here is how the OrdinalMap encodes the mapping from global ords to local segment ords. Assume we have the following global mapping for a doc values field:
      bar -> 0, cat -> 1, dog -> 2, foo -> 3
      And our index is split into 2 segments with the following local mappings for that same doc values field:
      Segment 0: bar -> 0, foo -> 1
      Segment 1: cat -> 0, dog -> 1
      We will then encode delta between the local and global mapping in a packed 2d array keyed by (segmentIndex, segmentOrd). So the following 2d array will be created by OrdinalMap:
      [[0, 2], [1, 1]]

      The general algorithm for creating an OrdinalMap (skipping over some implementation details and optimizations) is as follows:

      [1] Create and populate a PQ with (TermsEnum, index) tuples where index is the position of the termEnum in an array of termEnum's sorted by descending size. The PQ itself will be ordered by TermsEnum.term()

      [2] We will iterate through every term in the index now. In order to do so, we will start with the first term at the top of the PQ . We keep track of a global ord, and track the difference between the global ord and TermsEnum.ord() in ordDeltas, which maps:
      (segmentIndex, TermsEnum.ord()) -> globalTermOrdinal - TermsEnum.ord()
      We then call BytesRefIterator.next() then update the PQ to iterate (remember the PQ maintains and order based on TermsEnum.term() which changes on the next() calls). If the current term exists in some other segment, the top of the queue will contain that segment. If not, the top of the queue will contain a segment with the next term in the index and the global ord will also be incremented.

      [3] We use some information gathered in the previous step to perform optimizations on memory usage and building time in the following steps, for more detail on those, look at the code.

      [4] We will then populate segmentToGlobalOrds, which maps (segmentIndex, segmentOrd) -> globalOrd. Using the information we tracked in ordDeltas, we can construct this information relatively easily.

      Parameters:
      owner - For caching purposes
      subs - A TermsEnum[], where each index corresponds to a segment
      segmentMap - Provides two maps, newToOld which lists segments in descending 'weight' order (see OrdinalMap.SegmentMap for more details) and a oldToNew map which maps each original segment index to their position in newToOld
      acceptableOverheadRatio - Acceptable overhead memory usage for some packed data structures
      Throws:
      IOException - throws IOException
  • Method Details