Question

我们在云中有一些外部应用程序（IBM Bluemix），它将应用程序syslog记录在内部使用ELK堆栈的bluemix logmet服务中。

现在我们要定期从云下载日志并将其上传到本地Elastic / Kibana实例。这是因为如果我们想要通过Kibana搜索日志，则将日志存储在云服务中会产生成本和额外成本。本地弹性实例可以删除/刷新我们不需要的旧日志。

下载的日志将如下所示

{"instance_id_str":"0","source_id_str":"APP/PROC/WEB","app_name_str":"ABC","message":"Hello","type":"syslog","event_uuid":"474b78aa-6012-44f3-8692-09bd667c5822","origin_str":"rep","ALCH_TENANT_ID":"3213cd20-63cc-4592-b3ee-6a204769ce16","logmet_cluster":"topic3-elasticsearch_3","org_name_str":"123","@timestamp":"2017-09-29T02:30:15.598Z","message_type_str":"OUT","@version":"1","space_name_str":"prod","application_id_str":"3104b522-aba8-48e0-aef6-6291fc6f9250","ALCH_ACCOUNT_ID_str":"","org_id_str":"d728d5da-5346-4614-b092-e17be0f9b820","timestamp":"2017-09-29T02:30:15.598Z"}

{"instance_id_str":"0","source_id_str":"APP/PROC/WEB","app_name_str":"ABC","message":"EFG","type":"syslog","event_uuid":"d902dddb-afb7-4f55-b472-211f1d370837","origin_str":"rep","ALCH_TENANT_ID":"3213cd20-63cc-4592-b3ee-6a204769ce16","logmet_cluster":"topic3-elasticsearch_3","org_name_str":"123","@timestamp":"2017-09-29T02:30:28.636Z","message_type_str":"OUT","@version":"1","space_name_str":"prod","application_id_str":"dcd9f975-3be3-4451-a9db-6bed1d906ae8","ALCH_ACCOUNT_ID_str":"","org_id_str":"d728d5da-5346-4614-b092-e17be0f9b820","timestamp":"2017-09-29T02:30:28.636Z"}

我在本地的elasticsearch中创建了一个索引

curl -XPUT 'localhost:9200/commslog?pretty' -H 'Content-Type: application/json' -d'
{
    "settings" : {
        "number_of_shards" : 1
    },
    "mappings" : {
        "logs" : {
            "properties" : {
                "instance_id_str" : { "type" : "text" },
                "source_id_str" : { "type" : "text" },
                "app_name_str" : { "type" : "text" },
                "message" : { "type" : "text" },
                "type" : { "type" : "text" },
                "event_uuid" : { "type" : "text" },
                "ALCH_TENANT_ID" : { "type" : "text" },
                "logmet_cluster" : { "type" : "text" },
                "org_name_str" : { "type" : "text" },
                "@timestamp" : { "type" : "date" },
                "message_type_str" : { "type" : "text" },
                "@version" : { "type" : "text" },
                "space_name_str" : { "type" : "text" },
                "application_id_str" : { "type" : "text" },
                "ALCH_ACCOUNT_ID_str" : { "type" : "text" },
                "org_id_str" : { "type" : "text" },
                "timestamp" : { "type" : "date" }
            }
        }
    }
}'

现在批量上传文件，使用命令

curl -XPOST -H 'Content-Type: application/x-ndjson' http://localhost:9200/commslog/logs/_bulk --data-binary '@commslogs.json'

上面的命令会抛出错误

格式错误的操作/元数据行[1]，预计START_OBJECT或END_OBJECT但找到[VALUE_STRING]

解决方案是按照

遵循批量上传规则

https://discuss.elastic.co/t/bulk-insert-file-having-many-json-entries-into-elasticsearch/46470/2

https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html

所以我通过在每行之前添加操作来手动更改了一些日志语句

{ "index" : { "_index" : "commslog", "_type" : "logs" } }

这有效!!。

另一种选择是调用curl命令，在路径中提供_idex和_type

curl -XPOST -H 'Content-Type: application/x-ndjson' http://localhost:9200/commslog/logs/_bulk --data-binary '@commslogs.json'

但没有动作，这也会引发同样的错误

问题是我们无法对我们获得的数千条日志记录执行此操作。有一个选项，我们从Bluemix下载日志文件并上传文件而不添加操作。

注意我们目前没有使用logstash，但是

是否可以使用logstash并只使用grok来转换记录并添加必要的条目？
我们如何通过Logstash批量上传文档？
logstash是理想的解决方案还是我们可以编写程序转换并做到这一点

由于

Answer 1

正如@Alain Collins所说，你应该能够直接使用filebeat。

对于logstash：

应该可以使用logstash，但是不应该使用grok，你应该使用json编解码器/过滤器，它会更容易

。
您可以使用带有logstash的文件输入来处理许多文件并等待它完成（知道它何时完成，使用文件/标准输出，可能使用点编解码器，并等待它停止写入）。 / LI>
您应该直接上传到elasticsearch（使用elasticsearch输出），而不仅仅使用logstash转换文件。

至于你的问题，我认为只使用一个小程序来添加缺少的动作行或使用filebeat要容易得多，除非你用logstash配置进行实验，以便比添加一个程序更快地编写和logstash配置在文档中到处都是。

批量上传日志消息到本地Elasticsearch

1 个答案: