查询扩展lucene

时间:2016-04-23 05:21:23

标签: lucene wordnet

我是lucene的新手,我正在尝试进行查询扩展。

我已经提到了这两篇帖子(firstsecond),我已经设法以适合6.0.0版本的方式重用代码,因为之前的版本已被弃用

问题是,要么我没有得到结果,要么我没有适当地访问结果(扩展查询)。

这是我的代码:

import com.sun.corba.se.impl.util.Version;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileReader;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.Reader;
import java.io.StringReader;
import java.io.UnsupportedEncodingException;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import java.net.URLDecoder;
import java.net.URLEncoder;
import java.text.ParseException;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.Tokenizer;
import org.apache.lucene.analysis.core.LowerCaseFilter;
import org.apache.lucene.analysis.standard.ClassicTokenizer;
import org.apache.lucene.analysis.standard.StandardFilter;
import org.apache.lucene.analysis.synonym.SynonymFilter;
import org.apache.lucene.analysis.synonym.SynonymMap;
import org.apache.lucene.analysis.synonym.WordnetSynonymParser;
import org.apache.lucene.analysis.util.CharArraySet;
import org.apache.lucene.util.*;


public class Graph extends Analyzer 
{ 

  protected static TokenStreamComponents createComponents(String fieldName, Reader reader) throws ParseException{
      System.out.println("1");
    // TODO Auto-generated method stub
    Tokenizer source = new ClassicTokenizer();

    source.setReader(reader);
    TokenStream filter = new StandardFilter( source);

    filter = new LowerCaseFilter(filter);
    SynonymMap mySynonymMap = null;

    try {

        mySynonymMap = buildSynonym();

    } catch (IOException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }
    filter = new SynonymFilter(filter, mySynonymMap, false);     

    return new TokenStreamComponents(source, filter);

}

private static SynonymMap buildSynonym() throws IOException, ParseException
{    System.out.print("build");
    File file = new File("wn\\wn_s.pl");

    InputStream stream = new FileInputStream(file);

    Reader rulesReader = new InputStreamReader(stream); 
    SynonymMap.Builder parser = null;
    parser = new WordnetSynonymParser(true, true, new StandardAnalyzer(CharArraySet.EMPTY_SET));
    System.out.print(parser.toString());
   ((WordnetSynonymParser) parser).parse(rulesReader);  
    SynonymMap synonymMap = parser.build();
    return synonymMap;
}

public static void main (String[] args) throws UnsupportedEncodingException, IOException, ParseException
{
Reader reader = new FileReader("C:\\input.txt"); // here I have the queries that I want to expand 
TokenStreamComponents TSC = createComponents( "" , new StringReader("some text goes here")); 
**System.out.print(TSC); //How to get the result from TSC????**
}

    @Override
    protected TokenStreamComponents createComponents(String string) 
    {
      throw new UnsupportedOperationException("Not supported yet."); //To change body of generated methods, choose Tools | Templates.
    }
 } 

请建议一些方法来帮助我访问扩展的查询!

1 个答案:

答案 0 :(得分:3)

那么,您是否只想弄清楚如何在主方法中迭代TokenStreamComponents TSC = createComponents( "" , new StringReader("some text goes here")); TokenStream stream = TSC.getTokenStream(); CharTermAttribute termattr = stream.addAttribute(CharTermAttribute.class); stream.reset(); while (stream.incrementToken()) { System.out.println(termattr.toString()); } 中的术语?

这样的事情:

sed -i "s|\(eval sh ~/\.config/fish/colors/base16-\)\([^.]*\)\.\([^.]*\)\\(.*\)|\1$theme.$background\4|
" ~/Developer/dotfiles/config.fish
sed -i "s/\(base16\)\([-_]\)\([a-zA-Z]*\)/\1\2$theme/g" ~/Developer/dotfiles/init.vim