我实现了一个更像这个处理程序的solr来寻找类似的客户。
我有2个客户,姓名不同,住在同一个地址。我想给一个entity_id给solr并让所有客户端返回类似的名字/地址。只需点击一下按钮,客户就可以将两个客户链接在一起。
我使用SolariumBundle在代码中执行此操作,但它应该足以让它首先使用原始查询,如果可行,我可以自己调整它到日光浴室。
这是我的 solrconfig.xml
<?xml version="1.0" encoding="UTF-8" ?>
<config>
<luceneMatchVersion>LUCENE_36</luceneMatchVersion>
<directoryFactory name="DirectoryFactory" class="${solr.directoryFactory:solr.StandardDirectoryFactory}"/>
<updateHandler class="solr.DirectUpdateHandler2" />
<requestDispatcher handleSelect="true" >
<requestParsers enableRemoteStreaming="false" multipartUploadLimitInKB="2048" />
</requestDispatcher>
<!-- request handlers -->
<requestHandler name="standard" class="solr.StandardRequestHandler" default="true" />
<requestHandler name="/update" class="solr.XmlUpdateRequestHandler" />
<requestHandler name="/mlt" class="solr.MoreLikeThisHandler">
<lst name="defaults">
<int name="mlt.mintf">2</int>
<int name="mlt.mindf">1</int>
<int name="mlt.minwl">5</int>
<int name="mlt.maxwl">1000</int>
<int name="mlt.maxqt">50</int>
<int name="mlt.maxntp">50000</int>
<bool name="mlt.boost">true</bool>
<str name="mlt.fl">customer_data,entity_data,street</str>
<bool name="mlt.match.include">false</bool>
</lst>
</requestHandler>
<requestHandler name="/admin/" class="org.apache.solr.handler.admin.AdminHandlers" />
<!-- config for the admin interface -->
<admin>
<defaultQuery>solr</defaultQuery>
</admin>
</config>
schema.xml 的相关部分是:
<fields>
<!-- general -->
<field name="id" type="string" indexed="true" stored="true" multiValued="false" required="true" />
<field name="type" type="string" indexed="true" stored="true" multiValued="false" required="true"/>
<field name="entity_id" type="string" indexed="true" stored="true" multiValued="false" required="true"/>
<field name="sort_id" type="int" indexed="true" stored="true" multiValued="false"/>
<field name="external_id" type="string" indexed="true" stored="true" multiValued="false"/>
<field name="status" type="text" indexed="true" stored="true" multiValued="false"/>
<field name="language" type="string" indexed="true" stored="true" multiValued="false"/>
<field name="created" type="int" indexed="true" stored="true" multiValued="false"/>
<field name="name" type="text" indexed="true" stored="true" multiValued="false"/>
<field name="email" type="string" indexed="true" stored="true" multiValued="false"/>
<field name="city" type="string" indexed="true" stored="false" multiValued="false"/>
<field name="country" type="string" indexed="true" stored="false" multiValued="false"/>
<field name="street" type="string" indexed="true" stored="false" multiValued="false"/>
<field name="zipcode" type="string" indexed="true" stored="false" multiValued="false"/>
<field name="entity_data" type="text_ngrm" indexed="true" stored="true" multiValued="true"/>
<field name="customer_data" type="text_ngrm" indexed="true" stored="true" multiValued="true" termVectors="true" />
<!-- Entity data filling -->
<copyField source="entity_id" dest="entity_data"/>
<copyField source="briljant_id" dest="entity_data"/>
<copyField source="name" dest="entity_data"/>
<copyField source="email" dest="entity_data"/>
<!-- End entity data -->
<!-- Customer data -->
<copyField source="name" dest="customer_data"/>
<copyField source="email" dest="customer_data"/>
<copyField source="city" dest="customer_data"/>
<copyField source="country" dest="customer_data"/>
<copyField source="street" dest="customer_data"/>
<copyField source="zipcode" dest="customer_data"/>
<!-- End customer data -->
</fields>
我目前执行此查询:http://localhost:8983/solr/core0/mlt?q=entity_id%3A50&wt=json&indent=true&mlt.fl:customer_data
,这确实会返回具有相似名称的客户的结果。
例如,如果customer_id:50(我查询的那个)的名称为&#34; Foo Bar&#34;,它确实会返回名称为&#34; Foo Bar&#34;,&#34的客户; Bar Foo&#34;,&#34; John Foo&#34;。街道/国家/邮政编码的相似性并不起作用。
在debug:parsedquery中,我可以看到customer_data:Foo customer_data:Bar customer_data oo Bar, ...
的不同突变,但地址部分没有任何突变。
如何确保查询适用于:customer_data:Foo customer_data:Bar customer_data:teststreet customer_data:Antwerp
?
答案 0 :(得分:1)
定义为类型function updateMail($new) {
$data = array(
'email' => $new
);
$this->db->where('email', $this->session->userdata('email'));
$result = $this->db->update('person', $data);
$afftectedRows = $this->db->affected_rows();
if ($afftectedRows > 0) {
return TRUE;
} else {
return FALSE;
}
}
的字段不会被标记化,因此MLT会找到较少类似的文档。
将受影响的字段更改为类string
的类型,它应该可以正常工作。
E.g:
solr.TextField