如何使用lucene.net实现多个过滤器的搜索

时间:2014-05-26 09:59:43

标签: lucene.net

我是lucene.net的新手。我想在客户端数据库上实现搜索功能。我有以下情况:

  • 用户将根据当前选定的城市搜索客户。
  • 如果用户想要搜索其他城市的客户,则必须更改城市并再次执行搜索。
  • 要优化搜索结果,我们需要在区域(多个),Pincode等上提供过滤器。换句话说,我需要对以下sql查询进行等效的lucene查询:

    SELECT * FROM CLIENTS
         WHERE CITY = N'City1'
         AND (Area like N'%area1%' OR Area like N'%area2%')
    
    SELECT * FROM CILENTS
        WHERE CITY IN ('MUMBAI', 'DELHI')
        AND CLIENTTYPE IN ('GOLD', 'SILVER')
    

以下是我用来提供搜索城市作为过滤器的代码:

private static IEnumerable<ClientSearchIndexItemDto> _search(string searchQuery, string city, string searchField = "")
{
    // validation
    if (string.IsNullOrEmpty(searchQuery.Replace("*", "").Replace("?", "")))
        return new List<ClientSearchIndexItemDto>();

    // set up Lucene searcher
    using (var searcher = new IndexSearcher(_directory, false))
    {
        var hits_limit = 1000;
        var analyzer = new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_30);

        // search by single field
        if (!string.IsNullOrEmpty(searchField))
        {
            var parser = new QueryParser(Lucene.Net.Util.Version.LUCENE_30, searchField, analyzer);
            var query = parseQuery(searchQuery, parser);
            var hits = searcher.Search(query, hits_limit).ScoreDocs;
            var results = _mapLuceneToDataList(hits, searcher);
            analyzer.Close();
            searcher.Dispose();
            return results;
        }
        else // search by multiple fields (ordered by RELEVANCE)
        {
            var parser = new MultiFieldQueryParser(Lucene.Net.Util.Version.LUCENE_30, new[]
            {
                "ClientId",
                "ClientName",
                "ClientTypeNames",
                "CountryName",
                "StateName",
                "DistrictName",
                "City",
                "Area",
                "Street",
                "Pincode",
                "ContactNumber",
                "DateModified"
            }, analyzer);
            var query = parseQuery(searchQuery, parser);
            var f = new FieldCacheTermsFilter("City",new[] { city });
            var hits = searcher.Search(query, f, hits_limit, Sort.RELEVANCE).ScoreDocs;
            var results = _mapLuceneToDataList(hits, searcher);
            analyzer.Close();
            searcher.Dispose();
            return results;
        }
    }
}

现在我必须在Area,Pincode等上提供更多过滤器,其中Area是多个。我尝试过如下的BooleanQuery:

var cityFilter = new TermQuery(new Term("City", city));
var areasFilter = new FieldCacheTermsFilter("Area",areas); -- where type of areas is string[]

BooleanQuery filterQuery = new BooleanQuery();
filterQuery.Add(cityFilter, Occur.MUST);
filterQuery.Add(areasFilter, Occur.MUST); -- here filterQuery.Add not have an overloaded method which accepts string[]

如果我们对单个区域执行相同的操作,那么它可以正常工作。

我已经尝试过ChainedFilter,如下所示,但似乎并不满足要求。以下代码对城市和地区执行或操作。但要求是在给定城市提供的区域之间执行OR操作。

var f = new ChainedFilter(new Filter[] { cityFilter, areasFilter });

有人可以告诉我如何在lucene.net中实现这一目标吗?我们将非常感谢您的帮助。

2 个答案:

答案 0 :(得分:14)

您正在寻找BooleanFilter。几乎任何查询对象都有匹配的过滤器对象。

如果您的索引与TermsFilter的要求不符,请查看Lucene.Net.Contrib(来自FieldCacheTermsFilter。查询)。从后来的文件; “此过滤器要求该字段仅包含所有文档的单个术语”。

var cityFilter = new FieldCacheTermsFilter("CITY", new[] {"MUMBAI", "DELHI"});
var clientTypeFilter = new FieldCacheTermsFilter("CLIENTTYPE", new [] { "GOLD", "SILVER" });

var areaFilter = new TermsFilter();
areaFilter.AddTerm(new Term("Area", "area1"));
areaFilter.AddTerm(new Term("Area", "area2"));

var filter = new BooleanFilter();
filter.Add(new FilterClause(cityFilter, Occur.MUST));
filter.Add(new FilterClause(clientTypeFilter, Occur.MUST));
filter.Add(new FilterClause(areaFilter, Occur.MUST));

IndexSearcher searcher = null; // TODO.
Query query = null; // TODO.
Int32 hits_limit = 0; // TODO.
var hits = searcher.Search(query, filter, hits_limit, Sort.RELEVANCE).ScoreDocs;

答案 1 :(得分:2)

您正在寻找的是嵌套的布尔查询,以便您拥有或(在您的城市上),但整个组(与或匹配)本身匹配为和

filter1 AND filter2 AND filter3 AND (filtercity1 OR filtercity2 OR filtercity3)

这里有一个很好的描述如何做到这一点:

How to create nested boolean query with lucene API (a AND (b OR c))?