我正在尝试将this script转换为使用新的Elasticsearch official client而不是旧版(现已弃用)ElasticSearch.pm,但我无法使滚动搜索生效。这就是我所拥有的:
#! /usr/bin/perl
use strict;
use warnings;
use 5.010;
use Elasticsearch ();
use Elasticsearch::Scroll ();
my $es = Elasticsearch->new(
nodes => 'http://api.metacpan.org:80',
cxn => 'NetCurl',
cxn_pool => 'Static::NoPing',
#log_to => 'Stderr',
#trace_to => 'Stderr',
);
say 'Getting all results at once works:';
my $results = $es->search(
index => 'v0',
type => 'release',
body => {
filter => { range => { date => { gte => '2013-11-28T00:00:00.000Z' } } },
fields => [qw(author archive date)],
},
);
foreach my $hit (@{ $results->{hits}{hits} }) {
my $field = $hit->{fields};
say "@$field{qw(date author archive)}";
}
say "\nUsing a scrolled search does not work:";
my $scroller = Elasticsearch::Scroll->new(
es => $es,
index => 'v0',
search_type => 'scan',
size => 100,
type => 'release',
body => {
filter => { range => { date => { gte => '2013-11-28T00:00:00.000Z' } } },
fields => [qw(author archive date)],
},
);
while (my $hit = $scroller->next) {
my $field = $hit->{fields};
say "@$field{qw(date author archive)}";
} # end while $hit
第一次搜索,我只是在1块中获得所有结果,工作正常。但是第二次搜索,我试图滚动结果,产生:
Using a scrolled search does not work:
[Request] ** [http://api.metacpan.org:80]-[500]
ActionRequestValidationException[Validation Failed: 1: scrollId is missing;],
called from sub Elasticsearch::Transport::try {...}
at .../Try/Tiny.pm line 83. With vars: {'body' =>
'ActionRequestValidationException[Validation Failed: 1: scrollId is missing;]',
'request' => {'path' => '/_search/scroll','serialize' => 'std',
'body' => 'c2Nhbjs1OzE3MjU0NjM2MjowakFELUU3VFFibTJIZW1ibUo0SUdROzE3MjU0NjM2NDowakFELUU3VFFibTJIZW1ibUo0SUdROzE3MjU0NjM2MTowakFELUU3VFFibTJIZW1ibUo0SUdROzE3MjU0NjM2MDowakFELUU3VFFibTJIZW1ibUo0SUdROzE3MjU0NjM2MzowakFELUU3VFFibTJIZW1ibUo0SUdROzE7dG90YWxfaGl0czoxNDQ7',
'method' => 'GET','qs' => {'scroll' => '1m'},'ignore' => [],
'mime_type' => 'application/json'},'status_code' => 500}
我做错了什么?我使用的是Elasticsearch 0.75和Elasticsearch-Cxn-NetCurl 0.02以及Perl 5.18.1。
答案 0 :(得分:1)
我终于使用了newer Search::Elasticsearch official client。这是简短版本:
#! /usr/bin/perl
use strict;
use warnings;
use 5.010;
use Search::Elasticsearch ();
my $es = Search::Elasticsearch->new(
cxn_pool => 'Static::NoPing',
nodes => 'api.metacpan.org:80',
);
my $scroller = $es->scroll_helper(
index => 'v0',
type => 'release',
search_type => 'scan',
scroll => '2m',
size => 100,
body => {
fields => [qw(author archive date)],
query => { range => { date => { gte => '2015-02-01T00:00:00.000Z' } } },
},
);
while (my $hit = $scroller->next) {
my $field = $hit->{fields};
say "@$field{qw(date author archive)}";
} # end while $hit
请注意,滚动搜索时不会对记录进行排序。我最终将记录转储到临时数据库并在本地对它们进行排序。 updated script在GitHub上。
答案 1 :(得分:0)
我没有直接答案,但我可能有办法解决问题:
我按照指向Elasticsearch::Client
的链接找到了一个scroll()方法:
https://metacpan.org/pod/Elasticsearch::Client::Direct#scroll
此方法将scroll
和scroll_id
作为参数。 scroll
是在搜索到期之前可以继续调用scroll方法的分钟数。 scroll_id
是最后一次调用scroll()的地方的标记。
$results = $e->scroll(
scroll => '1m',
scroll_id => $id
);
Elasticsearch::Scroll
是一个面向对象的scroll()包装器,它隐藏了scroll
和scroll_id
。
我会在您的脚本上运行perl -d
,然后进入$scroller->next
,然后尽可能地沿着兔子洞走。其中的某些内容正在尝试搜索,该搜索应填充scroll_id
或scrollId
并且失败。
我的描述无疑是非常粗糙的...我在谷歌搜索过程中遇到了滚动ID的确切描述,但我似乎无法再找到它。