Question

elasticsearch是我的新手。所以，如果我问一个非常简单的问题，请原谅。

在我的工作场所中，我们已正确设置了ELK。

由于海量数据，我们仅存储14天数据，我的问题是我该如何在Python中读取数据，然后再将分析结果存储在某些NOSQL中。

到目前为止，我的主要目标是将数据从弹性簇中以数据框形式或任何格式读入python。

我想以不同的时间间隔（例如1天，1周，1个月等）来获取它。

过去1周我一直在挣扎。

Answer 1

您可以使用下面的代码来实现

// getting all spss files from the from the input path 
FileInfo[] Files = new DirectoryInfo("D:\Input").GetFiles("*.sav");

// looping each files and calling the job
foreach (FileInfo file in Files)
{ 
    if (file.Name != "")
    {
        // updating the text.sps file for each job                       
        System.IO.File.WriteAllText("D:\Input\text.sps", string.Empty);
        System.IO.File.WriteAllText("D:\Input\text.sps", (Content for the file));

        // calling the process
        var p = new Process();
        // this code will work fine simply calling one exe
        p.StartInfo = new ProcessStartInfo((@"D:\Input\temp.exe"), "-n")
        // instead of this I need to call something like this
        // stats C:\Users\10522\Desktop\spssJob1.spj -production from this 
        // path C:\Program Files\IBM\SPSS\Statistics\22 
        {
            UseShellExecute = false
        };

        p.Start();
        p.WaitForExit();
    }
}

要获取索引的架构：-

# Create a DataFrame object
from pandasticsearch import DataFrame
df = DataFrame.from_es(url='http://localhost:9200', index='indexname')

之后，您可以在df上执行常规数据框操作。

如果要解析结果，请执行以下操作：-

 df.print_schema()

，然后将所有内容最终放入您的最终数据框中：-

from elasticsearch import Elasticsearch
es = Elasticsearch('http://localhost:9200')
result_dict = es.search(index="indexname", body={"query": {"match_all": {}}})

希望对您有帮助。

Answer 2

这取决于您要如何从Elasticsearch读取数据。它是增量读取，即读取每天收到的新数据，还是像批量读取。对于后者，您需要在python中使用Elasticsearch的批量API，对于前者，您可以将自己限制为简单的范围查询。

用于读取批量数据的示意图代码：https://gist.github.com/dpkshrma/04be6092eda6ae108bfc1ed820621130

如何使用ES的批量API：

How to use Bulk API to store the keywords in ES by using Python

https://elasticsearch-py.readthedocs.io/en/master/helpers.html#elasticsearch.helpers.bulk

如何使用范围查询进行增量插入：

https://martinapugliese.github.io/python-for-(some)-elasticsearch-queries/

How to have Range and Match query in one elastic search query using python?

由于您希望以不同的间隔插入数据，因此还需要执行日期汇总。

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-datehistogram-aggregation.html

How to perform multiple aggregation on an object in Elasticsearch using Python?

发出Elasticsearch查询后，您的数据将被收集到一个临时变量中，您可以在PyMongo之类的NOSQL数据库上使用python库，将其插入Elasticsearch数据中。

将Elastic Cluster数据读入python数据框

2 个答案: