Question

我正在尝试使用elasticsearch-dsl python库从elasticsearch获取数据。我需要获取最近15分钟的所有数据。问题是检索数据极其缓慢。 220万次点击需要花费大量时间。这是我的代码

start_time = time.time()  
try:
    client = Elasticsearch(['IP_HERE'])
    s = Search(using=client, index="firewallv2-*", doc_type = 'doc').filter('range', **{'@timestamp': {'gte': 'now-15m' , 'lt': 'now'}})
    response = s.execute()
except Exception as e:
    print(e)
    print("error in getting data from FIREWALL")

try:
    for hit1 in s.scan():
        source_ip.append(hit1.to_dict().get('Source IP'))
        destination_ip.append(hit1.to_dict().get('Destination IP'))
        destination_port.append(hit1.to_dict().get('Destination Port'))
        source_port.append(hit1.to_dict().get('Source Port'))


except Exception as e:
    print("not able to parse json data")

elapsed_time = time.time() - start_time
print("Time to get data from server " + str(elapsed_time))

还有更多代码要编写，但我只是发布主要的慢组件。其余的是纯python代码。以下是我的输出

Time to get data from server 893.599892855
Time to store data into variables 27.647258997
Time to process for loop 9.32531404495

所有时间的输出都以秒为单位，您可以看到检索220万次匹配需要花费大量时间。

我还尝试使用bulk_size=10000，甚至将bulk_size更改为各种值，但未成功。

如何提高扫描方法的性能？

0 个答案: