在Zend_Search_Lucene
我使用下面的代码进行索引,我更改了默认分析器来搜索数值。
public function executeIndexIT() {
$path = '/home/project/mgh/lib/';
set_include_path(get_include_path() . PATH_SEPARATOR . $path);
require_once '/home/project/mgh/lib/Zend/Search/Lucene.php';
Zend_Search_Lucene_Analysis_Analyzer::setDefault(new Zend_Search_Lucene_Analysis_Analyzer_Common_TextNum_CaseInsensitive());
$index = new Zend_Search_Lucene('/home/project/mgh/data/search_file/lucene.customer.index',true);
$filenames1='/home/project/mgh/web/cvcollection/data8/ASBABranches10546.pdf';
$filenames2='/home/project/mgh/web/cvcollection/data2/manoj_new10550.pdf';
$fc1=htmlentities("'".$this->ConvertPDF($filenames1)."'");
$fc2=htmlentities("'".$this->ConvertPDF($filenames2)."'");
$doc = new Zend_Search_Lucene_Document();
$doc->addField(Zend_Search_Lucene_Field::unIndexed('URL', $filenames1));
$doc->addField(Zend_Search_Lucene_Field::text('contents',$fc1));
$index->addDocument($doc);
$doc = new Zend_Search_Lucene_Document();
$doc->addField(Zend_Search_Lucene_Field::unIndexed('URL', $filenames2));
$doc->addField(Zend_Search_Lucene_Field::text('contents',$fc2));
$index->addDocument($doc);
$index->commit();
exit;
}
在索引搜索后我使用下面的代码:
public function executeSearchLucene() {
$path = '/home/project/mgh/lib/';
set_include_path(get_include_path() . PATH_SEPARATOR . $path);
require_once('Zend/Search/Lucene.php');
Zend_Search_Lucene_Analysis_Analyzer::setDefault(new Zend_Search_Lucene_Analysis_Analyzer_Common_TextNum_CaseInsensitive());
$hits = array();
$txtSearch='@';
try {
$query = Zend_Search_Lucene_Search_QueryParser::parse($txtSearch);
} catch (Zend_Search_Lucene_Search_QueryParserException $e) {
echo "Query syntax error: " . $e->getMessage() . "\n";
}
$index = new Zend_Search_Lucene('/home/project/mgh/data/search_file/lucene.customer.index');
//**added on 29 may**/
$results = $index->find($query);
echo count($results);
foreach ( $results as $result ) {
echo "<pre>";
var_dump($result->URL);
}
exit;
}
此处$fc2
包含的电子邮件地址很少,我需要搜索它们。
但我的命中率为0。
如何使用@
搜索!
或Zend_Search_Lucene
等字符?
答案 0 :(得分:0)
它仅适用于keyword
字段,因为它们未被标记化。因此,您需要确保将电子邮件(或其他带有特殊字符的文本)作为单独的数据提供,例如示例。此外,您无法使用查询解析器,因为查询解析器会将其转换为Zend_Search_Lucene_Search_Query_Preprocessing_Term
对象:
echo('<pre>');
var_dump(Zend_Search_Lucene_Search_QueryParser::parse("*@*"));
var_dump(Zend_Search_Lucene_Search_QueryParser::parse("@"));
echo('</pre>');
die();
根据文件:
实际上并未涉及查询执行
所以工作代码如下:
$index = Zend_Search_Lucene::create('/tmp/index');
$doc1 = new Zend_Search_Lucene_Document;
$doc1->addField(Zend_Search_Lucene_Field::text('title', 'Some Title Here'))
->addField(Zend_Search_Lucene_Field::keyword('content', 'test@test.com'));
$index->addDocument($doc1);
$doc2 = new Zend_Search_Lucene_Document;
$doc2->addField(Zend_Search_Lucene_Field::text('title', 'Another title Here'))
->addField(Zend_Search_Lucene_Field::keyword('content', 'test!test.com'));
$index->addDocument($doc2);
$index->commit();
Zend_Search_Lucene_Search_Query_Wildcard::setMinPrefixLength(0);
$term = new Zend_Search_Lucene_Index_Term("*@*");
$query = new Zend_Search_Lucene_Search_Query_Wildcard($term);
$hits = $index->find($query);
echo('<pre>');
var_dump(count($hits));
foreach($hits as $hit) {
var_dump($hit->title);
var_dump($hit->content);
}
echo('</pre>');
Zend_Search_Lucene_Search_Query_Wildcard::setMinPrefixLength(0);
$term = new Zend_Search_Lucene_Index_Term("*!*");
$query = new Zend_Search_Lucene_Search_Query_Wildcard($term);
$hits = $index->find($query);
echo('<pre>');
var_dump(count($hits));
foreach($hits as $hit) {
var_dump($hit->title);
var_dump($hit->content);
}
echo('</pre>');
die();
希望现在很清楚。 Zend Lucene的实施有很多局限性。