Question

我需要编写一个应用程序来考虑来自传感器的带宽，它提供了如下表格中数据流的详细信息：

[ElasticsearchType(Name = "trafficSnapshot")]
public class TrafficSnapshot
{
    // use epoch_second @ https://mixmax.com/blog/30x-faster-elasticsearch-queries
    [Date(Format = "epoch_second")]
    public long TimeStamp { get; set; }

    [Nested]
    public Sample[] Samples { get; set; }
}

[ElasticsearchType(Name = "sample")]
public class Sample
{
    public ulong Bytes { get; set; }
    public ulong Packets { get; set; }
    public string Source { get; set; }
    public string Destination { get; set; }
}

可能会有很多日志条目，特别是在每秒高流量时，我相信我们可以通过分片/索引mm/dd/yyyy来包含增长（并通过删除旧索引来丢弃不需要的日期） - 但是当我创建一个带有日期字符串的索引我得到错误Invalid NEST response built from a unsuccessful low level call on PUT: /15%2F12%2F2017。如果我想拆分日期，我应该如何定义索引？

如果我以这种格式记录数据，那么我是否可以对每个IP地址执行总数据发送和接收的总数据（在可以定义的日期范围内），或者我最好在进一步推进之前，使用不同的结构存储/索引我的数据？

我的完整代码在下面并且今晚第一次刺，指针赞赏（或者如果我偏离轨道，可能更好使用logstash或类似的请告诉我）。

public static class DateTimeEpochHelpers
{
    public static DateTime FromUnixTime(this long unixTime)
    {
        var epoch = new DateTime(1970, 1, 1, 0, 0, 0, DateTimeKind.Utc);
        return epoch.AddSeconds(unixTime);
    }

    public static long ToUnixTime(this DateTime date)
    {
        var epoch = new DateTime(1970, 1, 1, 0, 0, 0, DateTimeKind.Utc);
        return Convert.ToInt64((date - epoch).TotalSeconds);
    }
}

public static class ElasticClientTrafficSnapshotHelpers
{
    public static void IndexSnapshot(this ElasticClient elasticClient, DateTime sampleTakenOn, Sample[] samples)
    {
        var timestamp = sampleTakenOn.ToUniversalTime();
        var unixTime = timestamp.ToUnixTime();
        var dateString = timestamp.Date.ToShortDateString();

        // create the index if it doesn't exist
        if (!elasticClient.IndexExists(dateString).Exists)
        {
            elasticClient.CreateIndex(dateString);
        }

        var response = elasticClient.Index(
            new TrafficSnapshot
            {
                TimeStamp = unixTime,
                Samples = samples
            },
            p => p
                .Index(dateString)
                .Id(unixTime)
        );
    }
}

class Program
{
    static void Main(string[] args)
    {
        var node = new Uri("http://localhost:9200");

        var settings = new ConnectionSettings(node);              
        var elasticClient = new ElasticClient(settings);

        var timestamp = DateTime.UtcNow;

        var samples = new[]
        {
            new Sample() {Bytes = 100, Packets = 1, Source = "193.100.100.5", Destination = "8.8.8.8"},
            new Sample() {Bytes = 1022, Packets = 1, Source = "8.8.8.8", Destination = "193.100.100.5"},
            new Sample() {Bytes = 66, Packets = 1, Source = "193.100.100.1", Destination = "91.100.100.1"},
            new Sample() {Bytes = 554, Packets = 1, Source = "193.100.100.10", Destination = "91.100.100.2"},
            new Sample() {Bytes = 89, Packets = 1, Source = "9.9.9.9", Destination = "193.100.100.20"},
        };

        elasticClient.IndexSnapshot(timestamp, samples);
    }
}

Answer 1

// use epoch_second @ https://mixmax.com/blog/30x-faster-elasticsearch-queries
[Date(Format = "epoch_second")]
public long TimeStamp { get; set; }

我会评估在较新版本的Elasticsearch中是否仍然适用。另外，第二精度是否足以满足您的使用需求？您可以通过多种方式为日期编制索引以用于不同目的，例如用于排序，范围查询，准确值等。您可能还想使用DateTime或DateTimeOffset类型，以及define a custom JsonConverter to serialize and deserialize to epoch_millis/epoch_second.

可能会有很多日志条目，特别是在每秒高流量时，我相信我们可以通过分片/索引按mm / dd / yyyy来包含增长（并通过删除旧索引来丢弃不需要的日期）

为时间序列数据创建每个时间间隔的索引是一个非常好的主意。通常，更新的数据，例如最后一天，即上周，比旧数据更频繁地搜索/聚合。通过索引到基于时间的索引，它允许您利用hot/warm architecture和shard allocation，最近的索引可以存在于具有更好IOP的更强大的节点上，而较旧的索引可以生存在较少的IOP上IOP较少的强大节点。当您不再需要聚合此类数据时，可以将这些索引快照到冷存储中。

当我创建一个带有日期字符串的索引时，我得到错误无效的NEST响应是由PUT上的一个不成功的低级别调用构建的：/ 15％2F12％2F2017。如果我想拆分日期，我应该如何定义索引？

请勿使用包含/的索引名称。您可能希望使用<year>-<month>-<day>等格式，例如2017年12月16日。您几乎肯定希望利用index templates来确保为新创建的索引应用正确的映射，以及您可能需要考虑的几种方法：

使用Date Index Name processor根据文档中的时间戳字段将文档索引到正确的索引
使用Rollover API管理索引。有一个很好的blog post on managing time-based indices efficiently。

如果我以这种格式记录数据，那么我是否可以对每个IP地址执行总数据发送和接收的总数据（在可以定义的日期范围内），或者我最好在进一步推进之前，使用不同的结构存储/索引我的数据？

是。考虑将一组样本嵌套在一个文档上是否有意义，或者对每个样本的文档进行非规范化。查看模型，看起来样本在逻辑上可以是单独的文档，因为唯一的共享数据是时间戳。可以在顶级文档和嵌套文档上进行聚合，但可能会有一些查询更容易用顶级文档表达。我建议尝试使用这两种方法来查看哪种方法更适合您的用例。另外，请查看IP data type索引IP地址，并查看ingest-geoip plugin for getting geo data from IP addresses。

我的完整代码在下面并且今晚第一次刺，指针赞赏（或者如果我偏离轨道，可能更好使用logstash或类似的请告诉我）。

有很多方法可以解决这个问题。如果您希望使用客户端执行此操作，我建议使用批量API为每个请求索引多个文档，并在索引组件前放置一个消息队列，以提供一层缓冲。 Logstash在这里很有用，特别是如果你需要执行额外的浓缩和过滤。您可能还想查看Curator for index management.

Elasticsearch时间序列数据库记录样本并在日期范围之间求和

1 个答案: