我需要搜索单词“I + D”,而我的分析器不能使用+
(加号)和-
(减号)符号。我该如何搜索?
我的个人分析器:
/**
* Copyright (c) 2006 Hugo Zaragoza and Jose R. P�rez-Ag�era
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
* 3. Neither the name of copyright holders nor the names of its
* contributors may be used to endorse or promote products derived
* from this software without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
* ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
* TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
* PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL COPYRIGHT HOLDERS OR CONTRIBUTORS
* BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
* CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
* SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
* INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
* CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
* ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
* POSSIBILITY OF SUCH DAMAGE.
*/
import java.io.BufferedReader;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.Reader;
import java.util.ArrayList;
import java.util.Set;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.LowerCaseFilter;
import org.apache.lucene.analysis.StopFilter;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.standard.StandardFilter;
import org.apache.lucene.analysis.standard.StandardTokenizer;
/**
* Spanish Lucene analyzer
* @author Hugo Zaragoza and Jose R. P�rez-Ag�era
*/
public class SpanishAnalyzer extends Analyzer {
private Set stopSet;
/**
* Creates the Lucene Spanish Analyzer
* @throws IOException
*/
public SpanishAnalyzer() throws IOException {
super();
stopSet = StopFilter.makeStopSet(loadStopWords());
}
/** Constructs a {@link StandardTokenizer} filtered by a {@link
StandardFilter}, a {@link LowerCaseFilter} and a {@link StopFilter}. */
public TokenStream tokenStream(String fieldName, Reader reader) {
TokenStream result = new StandardTokenizer(reader);
result = new StandardFilter(result);
result = new LowerCaseFilter(result);
result = new StopFilter(result, stopSet);
result = new SpanishStemmerFilter(result);
return result;
}
/**
* Loads the spanish stop-words list
* @throws IOException
*/
private static String[] loadStopWords() throws IOException {
InputStream inputStream = new FileInputStream("stopwords-spanish.txt");
//InputStream inputStream = new FileInputStream("/home/becario/Escritorio/CVTKAxel/lib/stopwords-spanish.txt");
Reader reader = new InputStreamReader(inputStream);
BufferedReader br = new BufferedReader(reader);
String line = br.readLine();
ArrayList<String> list = new ArrayList<String>();
while (line != null) {
list.add(line.trim());
line = br.readLine();
}
String stopWords[] = new String[list.toArray().length];
for (int i = 0; i < list.toArray().length; i++) {
stopWords[i] = (String) list.get(i);
}
return stopWords;
}
}
答案 0 :(得分:1)
“它不起作用”是什么意思?分析仪应该能够正常处理这些字符。你指的是QueryParser吗?如果是这样,您可以绕过它并手动创建查询,例如TermQuery。
Query q = new TermQuery(new Term("field", "I+D"));
或者你指的是StandardTokenizer在非单词字符(例如'+'或' - ')上分割标记这一事实?如果是这样,您可以简单地使用另一个(例如WhitespaceTokenizer)或实现自己的。