Class TextAnalyzerProperties
java.lang.Object
com.arangodb.entity.arangosearch.analyzer.TextAnalyzerProperties
- Author:
- Michele Rastelli
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionboolean
int
hashCode()
boolean
isAccent()
boolean
void
setAccent
(boolean accent) void
setAnalyzerCase
(SearchAnalyzerCase analyzerCase) void
setEdgeNgram
(EdgeNgram edgeNgram) void
void
setStemming
(boolean stemming) void
setStopwords
(List<String> stopwords) void
setStopwordsPath
(String stopwordsPath)
-
Constructor Details
-
TextAnalyzerProperties
public TextAnalyzerProperties()
-
-
Method Details
-
getLocale
- Returns:
- a locale in the format `language[_COUNTRY][.encoding][@variant]` (square brackets denote optional parts), e.g. `de.utf-8` or `en_US.utf-8`. Only UTF-8 encoding is meaningful in ArangoDB.
- See Also:
-
setLocale
-
isAccent
public boolean isAccent()- Returns:
true
to preserve accented characters (default)false
to convert accented characters to their base characters
-
setAccent
public void setAccent(boolean accent) -
getAnalyzerCase
-
setAnalyzerCase
- Parameters:
analyzerCase
- defaults toSearchAnalyzerCase.lower
-
isStemming
public boolean isStemming()- Returns:
true
to apply stemming on returned words (default)false
to leave the tokenized words as-is
-
setStemming
public void setStemming(boolean stemming) -
getEdgeNgram
- Returns:
- if present, then edge n-grams are generated for each token (word). That is, the start of the n-gram is
anchored to the beginning of the token, whereas the ngram Analyzer would produce all possible substrings from a
single input token (within the defined length restrictions). Edge n-grams can be used to cover word-based
auto-completion queries with an index, for which you should set the following other options:
- accent: false
- case:
SearchAnalyzerCase.lower
- stemming: false
-
setEdgeNgram
-
getStopwords
- Returns:
- an array of strings with words to omit from result. Default: load words from stopwordsPath. To disable stop-word filtering provide an empty array []. If both stopwords and stopwordsPath are provided then both word sources are combined.
-
setStopwords
-
getStopwordsPath
- Returns:
- path with a language sub-directory (e.g. en for a locale en_US.utf-8) containing files with words to
omit.
Each word has to be on a separate line. Everything after the first whitespace character on a line will be ignored
and can be used for comments. The files can be named arbitrarily and have any file extension (or none).
Default: if no path is provided then the value of the environment variable IRESEARCH_TEXT_STOPWORD_PATH is used to determine the path, or if it is undefined then the current working directory is assumed. If the stopwords attribute is provided then no stop-words are loaded from files, unless an explicit stopwordsPath is also provided.
Note that if the stopwordsPath can not be accessed, is missing language sub-directories or has no files for a language required by an Analyzer, then the creation of a new Analyzer is refused. If such an issue is discovered for an existing Analyzer during startup then the server will abort with a fatal error.
-
setStopwordsPath
-
equals
-
hashCode
public int hashCode()
-