Example: Use Lucene Filter on String
1 min readJan 27, 2020
In case you search for a simple example to run a string through a Lucene filter in Java.
Here is an example.
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.Tokenizer;
import org.apache.lucene.analysis.core.LetterTokenizer;
import org.apache.lucene.analysis.miscellaneous.ASCIIFoldingFilter;
import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
private static List<String> process(String text) throws IOException {
Analyzer analyzer = new Analyzer() {
@Override
protected TokenStreamComponents createComponents(String s) {
Tokenizer tokenizer = new LetterTokenizer();
return new TokenStreamComponents(
tokenizer,
new ASCIIFoldingFilter(tokenizer)); } };
TokenStream tokenStream = analyzer.tokenStream("*", "text");
CharTermAttribute attr =
tokenStream.addAttribute(CharTermAttribute.class);
List<String> result = new ArrayList<>();
tokenStream.reset();
while (tokenStream.incrementToken()) {
result.add(attr.toString());
}
return result;
}
In this example, we use ASCIIFoldingFilter. It transforms special characters used in other languages than English into their ASCII equivalents.
Find more details on the JavaDoc page of Lucene.