Class TextAnalyzerProperties

java.lang.Object
com.arangodb.entity.arangosearch.analyzer.TextAnalyzerProperties

public final class TextAnalyzerProperties extends Object
Author:
Michele Rastelli
  • Constructor Details

    • TextAnalyzerProperties

      public TextAnalyzerProperties()
  • Method Details

    • getLocale

      public String getLocale()
      Returns:
      a locale in the format `language[_COUNTRY][.encoding][@variant]` (square brackets denote optional parts), e.g. `de.utf-8` or `en_US.utf-8`. Only UTF-8 encoding is meaningful in ArangoDB.
      See Also:
    • setLocale

      public void setLocale(String locale)
    • isAccent

      public boolean isAccent()
      Returns:
      true to preserve accented characters (default) false to convert accented characters to their base characters
    • setAccent

      public void setAccent(boolean accent)
    • getAnalyzerCase

      public SearchAnalyzerCase getAnalyzerCase()
    • setAnalyzerCase

      public void setAnalyzerCase(SearchAnalyzerCase analyzerCase)
      Parameters:
      analyzerCase - defaults to SearchAnalyzerCase.lower
    • isStemming

      public boolean isStemming()
      Returns:
      true to apply stemming on returned words (default) false to leave the tokenized words as-is
    • setStemming

      public void setStemming(boolean stemming)
    • getEdgeNgram

      public EdgeNgram getEdgeNgram()
      Returns:
      if present, then edge n-grams are generated for each token (word). That is, the start of the n-gram is anchored to the beginning of the token, whereas the ngram Analyzer would produce all possible substrings from a single input token (within the defined length restrictions). Edge n-grams can be used to cover word-based auto-completion queries with an index, for which you should set the following other options: - accent: false - case: SearchAnalyzerCase.lower - stemming: false
    • setEdgeNgram

      public void setEdgeNgram(EdgeNgram edgeNgram)
    • getStopwords

      public List<String> getStopwords()
      Returns:
      an array of strings with words to omit from result. Default: load words from stopwordsPath. To disable stop-word filtering provide an empty array []. If both stopwords and stopwordsPath are provided then both word sources are combined.
    • setStopwords

      public void setStopwords(List<String> stopwords)
    • getStopwordsPath

      public String getStopwordsPath()
      Returns:
      path with a language sub-directory (e.g. en for a locale en_US.utf-8) containing files with words to omit. Each word has to be on a separate line. Everything after the first whitespace character on a line will be ignored and can be used for comments. The files can be named arbitrarily and have any file extension (or none).

      Default: if no path is provided then the value of the environment variable IRESEARCH_TEXT_STOPWORD_PATH is used to determine the path, or if it is undefined then the current working directory is assumed. If the stopwords attribute is provided then no stop-words are loaded from files, unless an explicit stopwordsPath is also provided.

      Note that if the stopwordsPath can not be accessed, is missing language sub-directories or has no files for a language required by an Analyzer, then the creation of a new Analyzer is refused. If such an issue is discovered for an existing Analyzer during startup then the server will abort with a fatal error.

    • setStopwordsPath

      public void setStopwordsPath(String stopwordsPath)
    • equals

      public boolean equals(Object o)
      Overrides:
      equals in class Object
    • hashCode

      public int hashCode()
      Overrides:
      hashCode in class Object