弹性搜索与大数据崩溃

时间:2018-04-13 08:03:00

标签: php sql elasticsearch elastica

所以我有一个查询,结果给了我500万个数据,然后我循环上面为每个结果添加20个文档。弹性搜索没有响应(页面永远加载),然后给出Couldn't connect to host, Elasticsearch down?错误。因此,在循环遍历所有数据时,它会崩溃。

弹性搜索无法回应的原因是什么?我不能用它来浏览数以百万计的信息吗?它是Elasticserch配置问题吗?

以下是我用来循环的代码:

$query = tep_db_query(" 
// Query giving 500k results
");

$achatsDocs = array();

while($array_collections = tep_db_fetch_array($query)){
    //looping the query


    $achatsDocs[] = new \Elastica\Document('', \Glam\HttpUtils::jsonEncode(
        array(
            // documents
        )
    ));
}

$achatsReportType->addDocuments($achatsDocs);
$achatsReportType->getIndex()->refresh();

我被告知发送合理数量的文件,例如每个批量请求1000,而不是所有这些请求同时。所以我这样做了:

// while we didn't loop through every data
while(condition) {

    $query = tep_db_query("
        // get first/next 1000
    ");

    // put data inside first 1000
    while($array_collections = tep_db_fetch_array($query))

        $achatsDocs[] = new \Elastica\Document('', \Glam\HttpUtils::jsonEncode(
            array(
                // 20 documents
            )
        ));
    }

    $achatsReportType->addDocuments($achatsDocs);
    $achatsReportType->getIndex()->refresh();

    // go over next 1000
    $limit_start = $limit_start + 1000;
    $limit_end = $limit_end + 1000;


}

但即使在此之后它仍然崩溃,这最终会在崩溃之前添加70k结果:

Fatal error: Uncaught exception 'Elastica\Exception\Connection\HttpException' with message 'Unknown error:52' in /var/www/vendor/ruflin/elastica/lib/Elastica/Transport/Http.php:167 Stack trace: #0 /var/www/vendor/ruflin/elastica/lib/Elastica/Request.php(171): Elastica\Transport\Http->exec(Object(Elastica\Request), Array) #1 /var/www/vendor/ruflin/elastica/lib/Elastica/Client.php(621): Elastica\Request->send() #2 /var/www/vendor/ruflin/elastica/lib/Elastica/Bulk.php(360): Elastica\Client->request('_bulk', 'PUT', '{"index":{"_ind...', Array) #3 /var/www/vendor/ruflin/elastica/lib/Elastica/Client.php(314): Elastica\Bulk->send() #4 /var/www/vendor/ruflin/elastica/lib/Elastica/Index.php(150): Elastica\Client->addDocuments(Array) #5 /var/www/vendor/ruflin/elastica/lib/Elastica/Type.php(196): Elastica\Index->addDocuments(Array) #6 /var/www/htdocs/adm54140/achatsReport_map.php(280): Elastica\Type->addDocuments(Array) #7 {main} thrown in /var/www/vendor/ruflin/elastica/lib/Elastica/Transport/Http.php on line 167 array(4) { ["code"]=> string(7) "E_ERROR" ["message"]=> string(928) "Uncaught exception 'Elastica\Exception\Connection\HttpException' with message 'Unknown error:52' in /var/www/vendor/ruflin/elastica/lib/Elastica/Transport/Http.php:167 Stack trace: #0 /var/www/vendor/ruflin/elastica/lib/Elastica/Request.php(171): Elastica\Transport\Http->exec(Object(Elastica\Request), Array) #1 /var/www/vendor/ruflin/elastica/lib/Elastica/Client.php(621): Elastica\Request->send() #2 /var/www/vendor/ruflin/elastica/lib/Elastica/Bulk.php(360): Elastica\Client->request('_bulk', 'PUT', '{"index":{"_ind...', Array) #3 /var/www/vendor/ruflin/elastica/lib/Elastica/Client.php(314): Elastica\Bulk->send() #4 /var/www/vendor/ruflin/elastica/lib/Elastica/Index.php(150): Elastica\Client->addDocuments(Array) #5 /var/www/vendor/ruflin/elastica/lib/Elastica/Type.php(196): Elastica\Index->addDocuments(Array) #6 /var/www/htdocs/adm54140/achatsReport_map.php(280): Elastica\Type->addDocuments(Array) #7 {main} thrown" ["file"]=> string(63) "/var/www/vendor/ruflin/elastica/lib/Elastica/Transport/Http.php" ["line"]=> int(167) }

然后弹性搜索崩溃了。

2 个答案:

答案 0 :(得分:0)

" big"的实时分页 - 当然是主观的,但我说500K +结果已经足够大了#34; - 对于分布式数据,结果集也非常困难,在Elasticsearch中也是如此。但是有一个解决方案:使用Scroll API来查看结果。我认为它更适合您的需求。

如果您想知道为什么群集崩溃了,请查看日志。

答案 1 :(得分:0)

我在导入大于400k文档的数据集时也遇到了问题。 我最终使用了BULK API,将数据分成1000个文档组(如前所述)。

Elasticsearch Bulk API

也许Elastica客户端存在问题,我建议您将1000块保存到json文件中,格式如下:

{ "index" : { "_index" : "my_index", "_type" : "mappingType", "_id" :  "1234"} }
{ "name" : "John", "age" : 12, "_id" :  "1234"}
{ "index" : { "_index" : "my_index", "_type" : "mappingType", "_id" :  "1235"} }
{ "name" : "Maria", "age" : 17, "_id" :  "1235"}

然后为每个json文档创建一个包含以下行之一的.sh脚本:

curl -u user:"pass" -s -H "Content-Type: application/x-ndjson" -XPOST localhost:9200/_bulk --data-binary "@es_request1.json"
curl -u user:"pass" -s -H "Content-Type: application/x-ndjson" -XPOST localhost:9200/_bulk --data-binary "@es_request2.json"