使用数字字段的Lucene查询找不到任何内容

时间:2014-05-21 07:55:21

标签: search lucene numeric

我试着理解lucene查询语法是如何工作的,所以我编写了这个小程序。 当使用NumericRangeQuery时,我可以找到我想要的文件但是在尝试解析搜索条件时,它找不到任何命中,尽管我使用相同的条件。 我知道差异可以通过分析器解释,但使用的StandardAnalyzer不会删除数值。

有人可以告诉我我做错了什么吗? 感谢。

package org.burre.lucene.matching;

import java.io.IOException;

import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.*;
import org.apache.lucene.index.*;
import org.apache.lucene.queryparser.classic.ParseException;
import org.apache.lucene.queryparser.classic.QueryParser;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.NumericRangeQuery;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.store.*;
import org.apache.lucene.util.Version;

public class SmallestEngine {
  private static final Version VERSION=Version.LUCENE_48;
  private StandardAnalyzer analyzer = new StandardAnalyzer(VERSION);
  private Directory index = new RAMDirectory();

  private Document buildDoc(String name, int beds) {
    Document doc = new Document();
    doc.add(new StringField("name", name, Field.Store.YES));
    doc.add(new IntField("beds", beds, Field.Store.YES));
    return doc;
  }

  public void buildSearchEngine() throws IOException {
    IndexWriterConfig config = new IndexWriterConfig(VERSION,
            analyzer);

    IndexWriter w = new IndexWriter(index, config);
    // Generate 10 houses with 0 to 3 beds
    for (int i=0;i<10;i++)
        w.addDocument(buildDoc("house"+(100+i),i % 4));
    w.close();
  }
  /**
   * Execute the query and show the result
   */
  public void search(Query q) throws IOException {
    System.out.println("executing query\""+q+"\"");
    IndexReader reader = DirectoryReader.open(index);
    try {
        IndexSearcher searcher = new IndexSearcher(reader);
        ScoreDoc[] hits = searcher.search(q, 10).scoreDocs;
        System.out.println("Found " + hits.length + " hits.");
        for (int i = 0; i < hits.length; ++i) {
            int docId = hits[i].doc;
            Document d = searcher.doc(docId);
            System.out.println(""+(i+1)+". " + d.get("name") + ", beds:"
                    + d.get("beds"));
        }
    } finally {
        if (reader != null)
            reader.close();
    }
  }

  public static void main(String[] args) throws IOException, ParseException {
    SmallestEngine me = new SmallestEngine();
    me.buildSearchEngine();
    System.out.println("SearchByRange");
    me.search(NumericRangeQuery.newIntRange("beds", 3, 3,true,true));
    System.out.println("-----------------");
    System.out.println("SearchName");
    me.search(new QueryParser(VERSION,"name",me.analyzer).parse("house107"));
    System.out.println("-----------------");
    System.out.println("Search3Beds");
    me.search(new QueryParser(VERSION,"beds",me.analyzer).parse("3"));
    System.out.println("-----------------");
    System.out.println("Search3BedsInRange");
    me.search(new QueryParser(VERSION,"name",me.analyzer).parse("beds:[3 TO 3]"));
   }
}

该程序的输出是:

SearchByRange
executing query"beds:[3 TO 3]"
Found 2 hits.
1. house103, beds:3
2. house107, beds:3
-----------------
SearchName
executing query"name:house107"
Found 1 hits.
1. house107, beds:3
-----------------
Search3Beds
executing query"beds:3"
Found 0 hits.
-----------------
Search3BedsInRange
executing query"beds:[3 TO 3]"
Found 0 hits.

3 个答案:

答案 0 :(得分:0)

您需要使用NumericRangeQuery在数字字段上执行搜索。

答案here可以为您提供一些见解。

答案here也说

  

对于数值(longs,date,float等),您需要使用NumericRangeQuery。否则Lucene不知道你想如何定义相似性。

答案 1 :(得分:0)

您需要做的是编写自己的QueryParser

public class CustomQueryParser extends QueryParser {

    // ctor omitted 

    @Override
    public Query newTermQuery(Term term) {
        if (term.field().equals("beds")) {
           // manually construct and return non-range query for numeric value
        } else {
           return super.newTermQuery(term);
        }
    }

    @Override
    public Query newRangeQuery(String field, String part1, String part2, boolean startInclusive, boolean endInclusive) {
        if (field.equals("beds")) {
           // manually construct and return range query for numeric value
        } else {
           return super.newRangeQuery(field, part1, part2, startInclusive, endInclusive);
        }
    }
}

答案 2 :(得分:0)

您似乎总是必须将NumericRangeQuery用于数字条件。 (感谢Mindas)所以他建议我创建自己更智能的QueryParser。 使用Apache commons-lang函数StringUtils.isNumeric()我可以创建一个更通用的QueryParser:

public class IntelligentQueryParser extends QueryParser {
    // take over super constructors
@Override
protected org.apache.lucene.search.Query newRangeQuery(String field,
        String part1, String part2, boolean part1Inclusive, boolean part2Inclusive) {
    if(StringUtils.isNumeric(part1))
    {
        return NumericRangeQuery.newIntRange(field, Integer.parseInt(part1),Integer.parseInt(part2),part1Inclusive,part2Inclusive);
    }
    return super.newRangeQuery(field, part1, part2, part1Inclusive, part2Inclusive);
}

@Override
protected org.apache.lucene.search.Query newTermQuery(
        org.apache.lucene.index.Term term) {
    if(StringUtils.isNumeric(term.text()))
    {
        return NumericRangeQuery.newIntRange(term.field(), Integer.parseInt(term.text()),Integer.parseInt(term.text()),true,true);
    }
    return super.newTermQuery(term);
}
}

只是想分享一下。