Question

我正在尝试设置Solr索引来搜索产品信息数据库。为此，我填充了产品详细信息的数据库并使用了Solr 6.0.0。对于给定的产品详细信息（标题，品牌，其他关键字），我想知道数据库中是否有与给定详细信息非常匹配的产品。我已经启动了dataimport并创建了索引。但是，当我搜索时，尽管列出的产品不同，匹配产品的分数都是相同的。我尝试了不同的搜索关键字组合，但结果在每种情况下都相似。我也尝试过使用不同的Tokenizer和Filters。

我尝试的schema.xml示例是：

<?xml version="1.0" encoding="UTF-8" ?>

<schema name="example" version="1.5">
 <field name="id" type="Int"  indexed="true" stored="true"/>
  <field name="name" type="text_general"  indexed="true" stored="true" />
  <field name="brand" type="text_general"  indexed="true" stored="true"/>
  <field name="category" type="text_general"  indexed="true" stored="true"/>
  <field name="description" type="text_general" indexed="true" stored="true" /> 
  <field name="catchall" type="text_general" indexed="true" stored="true" multiValued="true" />
    <copyField source="id" dest="catchall" />
    <copyField source="name" dest="catchall" />
    <copyField source="brand" dest="catchall" />
    <copyField source="category" dest="catchall" />
    <copyField source="description" dest="catchall" />
    <uniqueKey>id</uniqueKey>
    <defaultSearchField>catchall</defaultSearchField>
    <types>
        <fieldtype name="string" class="solr.StrField" sortMissingLast="true" />
        <fieldtype name="Int" class="solr.TrieIntField" precisionStep="0" positionIncrementGap="0"/>
        <fieldtype name="text_general" class="solr.TextField" positionIncrementGap="100">
          <analyzer type="index">
            <charFilter class="solr.HTMLStripCharFilterFactory"/>
            <tokenizer class="solr.WhitespaceTokenizerFactory"/>
            <filter class="solr.WordDelimiterFilterFactory"
                    generateWordParts="1" 
                    splitOnNumerics="1"
                    splitOnCaseChange="1"
                    generateNumberParts="1"
                    catenateWords="0"
                    catenateNumbers="0"
                    catenateAll="0"
                    preserveOriginal="1"
                    />

            <filter class="solr.ASCIIFoldingFilterFactory" preserveOriginal="true"/>
            <filter class="solr.ICUFoldingFilterFactory"/>
            <filter class="solr.LowerCaseFilterFactory"/>
          </analyzer>
          <analyzer type="query">
            <charFilter class="solr.HTMLStripCharFilterFactory"/>
            <tokenizer class="solr.WhitespaceTokenizerFactory"/>
            <filter class="solr.WordDelimiterFilterFactory"
                    generateWordParts="1" 
                    splitOnNumerics="1"
                    splitOnCaseChange="1"
                    generateNumberParts="1"
                    catenateWords="0"
                    catenateNumbers="0"
                    catenateAll="0"
                    preserveOriginal="1"
                    />
            <filter class="solr.ASCIIFoldingFilterFactory" preserveOriginal="true"/>
            <filter class="solr.ICUFoldingFilterFactory"/>
            <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
            <filter class="solr.LowerCaseFilterFactory"/>
          </analyzer>
        </fieldtype>
        <fieldtype name="ignored" stored="false" indexed="false" multiValued="true" class="solr.StrField" />
    </types>
</schema>

修改

data-config.xml的实体定义如下

<entity name="master_products"  
    pk="id"
    query="select p.* ,b.*  from master_products p ,master_brands b  where b.id=p.brand_id"
    deltaImportQuery="SELECT * FROM master_products WHERE product_name='${dataimporter.delta.product_name}' "
    >
    <!-- or b.brnad='${dataimporter.delta.brand}' -->

     <field column="product_name" name="name"/> 
     <field column="product_description" name="description"/> 
     <field column="id" name="id"/>
     <field column="mrp" name="mrp"/> 
     <field column="brand" name="brand"/>


  <entity name="master_brands" 
    query="select * from master_brands"
    deltaImportQuery="select * from master_brands where id ={master_products.brand_id}" processor="SqlEntityProcessor" cacheImpl="SortedMapBackedCache" >

  </entity>

  <entity name="master_product_categories" 
    query="select * from master_product_categories"
    deltaImportQuery="select * from master_product_categories where id ={master_products.   product_category_id}" processor="SqlEntityProcessor" cacheImpl="SortedMapBackedCache" >
    <field column="category" name="category" />
  </entity>

 </entity>

修改查询如下。

http://localhost:8983/solr/myproducts/select?fl=* score&fq=brand:Nikon&fq=mrp:28950*&indent=on&q=name:*"Nikon D3200 (Black) DSLR with  AF-S 18-55mm VR Kit Lens"*&wt=json

我想帮助实现我的目标。你可以指导我创建符合我目的的正确配置吗？提前谢谢。

Answer 1

Wildcard queries are constant scoring，意味着他们不会更改匹配的文档的分数。您可能希望使用常规查询（而不是通配符）来在文档之间获得适当的评分。

范围查询[a TO z]，前缀查询a *，通配符查询a * b是常量评分（所有匹配的文档获得相等的分数）。不使用评分因子tf，idf，index boost和coord。匹配的术语数量没有限制（与过去版本的Lucene一样）。

fq条款不会影响分数，只会过滤结果集。

用于评分搜索的Solr配置

1 个答案: