如何在Zend Search Lucene中对@等特殊字符应用搜索查询?

时间:2012-05-29 09:24:52

标签: php zend-framework utf-8 special-characters zend-search-lucene

Zend_Search_Lucene我使用下面的代码进行索引,我更改了默认分析器来搜索数值。

public function executeIndexIT() {

   $path = '/home/project/mgh/lib/';
   set_include_path(get_include_path() . PATH_SEPARATOR . $path);       
   require_once '/home/project/mgh/lib/Zend/Search/Lucene.php';

   Zend_Search_Lucene_Analysis_Analyzer::setDefault(new Zend_Search_Lucene_Analysis_Analyzer_Common_TextNum_CaseInsensitive());

   $index = new Zend_Search_Lucene('/home/project/mgh/data/search_file/lucene.customer.index',true);

   $filenames1='/home/project/mgh/web/cvcollection/data8/ASBABranches10546.pdf';
   $filenames2='/home/project/mgh/web/cvcollection/data2/manoj_new10550.pdf';

   $fc1=htmlentities("'".$this->ConvertPDF($filenames1)."'");       
   $fc2=htmlentities("'".$this->ConvertPDF($filenames2)."'");

   $doc = new Zend_Search_Lucene_Document();
   $doc->addField(Zend_Search_Lucene_Field::unIndexed('URL', $filenames1));
   $doc->addField(Zend_Search_Lucene_Field::text('contents',$fc1));     
   $index->addDocument($doc);

   $doc = new Zend_Search_Lucene_Document();
   $doc->addField(Zend_Search_Lucene_Field::unIndexed('URL', $filenames2));
   $doc->addField(Zend_Search_Lucene_Field::text('contents',$fc2));     
   $index->addDocument($doc);

   $index->commit();
   exit;
}

在索引搜索后我使用下面的代码:

public function executeSearchLucene() {

    $path = '/home/project/mgh/lib/';
    set_include_path(get_include_path() . PATH_SEPARATOR . $path);
    require_once('Zend/Search/Lucene.php');

    Zend_Search_Lucene_Analysis_Analyzer::setDefault(new Zend_Search_Lucene_Analysis_Analyzer_Common_TextNum_CaseInsensitive());

    $hits = array();
    $txtSearch='@';
    try {
        $query = Zend_Search_Lucene_Search_QueryParser::parse($txtSearch);
    } catch (Zend_Search_Lucene_Search_QueryParserException $e) {
        echo "Query syntax error: " . $e->getMessage() . "\n";
    }

    $index = new Zend_Search_Lucene('/home/project/mgh/data/search_file/lucene.customer.index');

    //**added on 29 may**/      
    $results = $index->find($query);
    echo count($results);
    foreach ( $results as $result ) {
        echo "<pre>";
        var_dump($result->URL); 
   }
   exit;
}

此处$fc2包含的电子邮件地址很少,我需要搜索它们。 但我的命中率为0。

如何使用@搜索!Zend_Search_Lucene等字符?

1 个答案:

答案 0 :(得分:0)

它仅适用于keyword字段,因为它们未被标记化。因此,您需要确保将电子邮件(或其他带有特殊字符的文本)作为单独的数据提供,例如示例。此外,您无法使用查询解析器,因为查询解析器会将其转换为Zend_Search_Lucene_Search_Query_Preprocessing_Term对象:

echo('<pre>');
var_dump(Zend_Search_Lucene_Search_QueryParser::parse("*@*"));
var_dump(Zend_Search_Lucene_Search_QueryParser::parse("@"));
echo('</pre>');
die();

根据文件:

  

实际上并未涉及查询执行

所以工作代码如下:

$index = Zend_Search_Lucene::create('/tmp/index');

$doc1 = new Zend_Search_Lucene_Document;
$doc1->addField(Zend_Search_Lucene_Field::text('title', 'Some Title Here'))
    ->addField(Zend_Search_Lucene_Field::keyword('content', 'test@test.com'));
$index->addDocument($doc1);

$doc2 = new Zend_Search_Lucene_Document;
$doc2->addField(Zend_Search_Lucene_Field::text('title', 'Another title Here'))
    ->addField(Zend_Search_Lucene_Field::keyword('content', 'test!test.com'));
$index->addDocument($doc2);

$index->commit();

Zend_Search_Lucene_Search_Query_Wildcard::setMinPrefixLength(0);
$term  = new Zend_Search_Lucene_Index_Term("*@*");
$query = new Zend_Search_Lucene_Search_Query_Wildcard($term);

$hits = $index->find($query);
echo('<pre>');
var_dump(count($hits));
foreach($hits as $hit) {
    var_dump($hit->title);
    var_dump($hit->content);
}
echo('</pre>');

Zend_Search_Lucene_Search_Query_Wildcard::setMinPrefixLength(0);
$term  = new Zend_Search_Lucene_Index_Term("*!*");
$query = new Zend_Search_Lucene_Search_Query_Wildcard($term);

$hits = $index->find($query);
echo('<pre>');
var_dump(count($hits));
foreach($hits as $hit) {
    var_dump($hit->title);
    var_dump($hit->content);
}
echo('</pre>');

die();

希望现在很清楚。 Zend Lucene的实施有很多局限性。