Question

我有大约250kb的静态HTML，我必须搜索。我想我会使用Zend Lucene。创建索引需要几秒钟，而且一切都很好，除非我搜索“关于”它最终得到这个：

Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 
3503812093817007931 bytes) in /var/www/u1938159/data/www/-----
/protected/vendors/Zend/Search/Lucene/Storage/File/Filesystem.php on line 163

其他的话似乎还可以。而且，这些文件包含一些外国文本。所以我必须使用不区分大小写的分析器

Zend_Search_Lucene_Analysis_Analyzer::setDefault(
    new Zend_Search_Lucene_Analysis_Analyzer_Common_Utf8_CaseInsensitive()
);
Zend_Search_Lucene_Search_QueryParser::setDefaultEncoding('utf-8');

在这种情况下，它需要一个永恒的加载，并且完全不起作用：

Error occured while file reading.

Lucene是否有严重的问题或者我自己搞砸了什么？

Answer 1

Lucene没有这些问题，但Zend_Search_Lucene有。{1}}。我不确定你有多少搜索，如果这是一次性的事情，但我会调查Apache Solr或ElasticSearch。

你能用一些数据扩展你的问题吗？

还有一些托管服务，如果您需要更多指示，请告诉我。

Answer 2

我不知道Zend Lucene的具体问题是什么，但是如果你想搜索相对较小的HTML文件，你可能想尝试使用grep。例如，在命令行上：

cat file.html | grep -i about查找包含“。”字样的行。

或

cat file.html | grep -i -o -P '.{30}About.{30}'如果你想在单词的两边只有30个字符。

Zend_Search_Lucene尝试分配3503812093817007931字节

2 个答案: