Question

我在elasticsearch中有一个包含apache日志数据的索引。这就是我想要做的事情：

识别访问某个文件的所有访问者（按IP号码）（例如/signup.php）。
对我的数据执行搜索/查询/聚合，但将检查的文档限制为包含步骤1中找到的ip号的文档。

在sql世界中，我只想创建一个临时表并插入第一步中匹配的所有IP号码。接下来，我将查询我的主表并通过在IP号码上加入我的临时表来限制结果集。

我理解在elasticsearch中不可能加入。 elasticsearch documentation提出了一些处理这种情况的方法：

应用程序端加入

这似乎不太实际，因为IP号码列表可能非常大，将结果发送到客户端然后在一个巨大的术语过滤器中将其传递回elasticsearch似乎效率低下。

非正规化数据

这将涉及迭代匹配的IP号并更新索引中的任何给定IP号的每个文档，例如＆＃34; in_group＆＃34;：true，所以我可以在稍后的查询中使用它。这似乎也非常不切实际且效率低下，尤其是因为源查询（步骤1）是动态的。

嵌套对象和/或父子关系

我不确定在这种情况下是否可以使用嵌套对象动态创建新文档。在我看来，我最终会复制大部分数据。

我一般都是关于elasticsearch和noSQL的新手，所以也许我只是以错误的方式看待问题而且我不应该首先尝试模仿JOIN。 / p>

但这似乎是分割数据集的常见情况，它让我想知道我是否忽略了其他一些明显的方法呢？

任何帮助将不胜感激！

Answer 1

如果我正确理解了您的问题，您将尝试根据特定条件获取文档的子集，并使用该子集进一步查询/搜索/聚合它。

如果为true，为什么要将它存储在另一个视图（sql类型）中。 elasticsearch的主要功能是它具有过滤器的缓存功能，因此大大缩短了查询时间。使用此功能，您需要执行的所有查询/搜索/聚合都需要一个术语过滤器，该过滤器将指定您在步骤1中尝试执行的条件。现在，无论您要执行哪些操作，都可以执行此操作在已经收缩的数据集的相同查询中。

如果您有其他不同的用例，则可能会考虑更改文档（映射）的存储，以便更轻松，更快速地进行检索。

Answer 2

这是我目前使用的解决方法：

运行此bash脚本将第一个查询ip-list保存到临时索引，然后使用terms-query过滤器（在Kibana中）使用步骤1中的ip-list进行查询。

Widget _buildCardList() {
  return ListView.builder(
    shrinkWrap: true,
    physics: NeverScrollableScrollPhysics(),
    itemBuilder: (BuildContext context, int index) =>
        MyWidgetSlider(cardList[index]),
    itemCount: cardList.length,
  );
}

class MyWidgetSlider extends StatefulWidget {
  final String data;
  MyWidgetSlider(this.data) : super();

  _MyWidgetSliderState createState() => _MyWidgetSliderState();
}

class _MyWidgetSliderState extends State<MyWidgetSlider> {
  double _sliderValue;
  @override
  void initState() {
    super.initState();
    _sliderValue = 0.0;
  }

  void _setValue(double value) {
    setState(() {
      _sliderValue = value;
    });
  }

  @override
  Widget build(BuildContext context) {
    return Container(
      height: 350,
      child: Card(
        clipBehavior: Clip.antiAlias,
        shape: RoundedRectangleBorder(
          borderRadius: BorderRadius.circular(10.0),
        ),
        child: Column(
          children: <Widget>[
            ClipRRect(
              borderRadius: BorderRadius.circular(10.0),
              child: Image.asset(
                this.widget.data,
                fit: BoxFit.cover,
              ),
            ),
            Padding(
              padding: const EdgeInsets.only(top: 15.0),
              child: Text(
                '${_sliderValue.round()}' + ' ITEMS',
                style: TextStyle(color: Colors.white, fontSize: 15.0),
              ),
            ),
            SliderTheme(
              data: SliderTheme.of(context).copyWith(
                  thumbColor: Colors.white,
                  thumbShape: RoundSliderThumbShape(enabledThumbRadius: 10.0),
                  activeTrackColor: Color(0xff3ADEA7),
                  inactiveTrackColor: Colors.grey,
                  overlayColor: Colors.transparent,
                  trackHeight: 1.0),
              child: Slider(
                value: _sliderValue,
                onChanged: _setValue,
                min: 0.0,
                max: 150.0,
                divisions: 30,
              ),
            ),
          ],
        ),
        color: Colors.transparent,
        elevation: 0.0,
        margin: EdgeInsets.all(10.0),
      ),
    );
  }
}

查询DSL-“条款查询”过滤器：

#!/usr/bin/env bash

es_host='https://************'
elk_user='************'
cred=($(pass ELK/************ | tr "\n" " ")) ##password
index_name='iis-************'
index_hostname='"************"'
temp_index_path='temp1/_doc/1'
results_limit=1000
timestamp_gte='"2018-03-20T13:00:00"' #UTC
timestamp_lte='"now"'                 #UTC



resp_data="$(curl -X POST $es_host/$index_name/_search -u $elk_user:${cred[0]} -H 'Content-Type: application/json; charset=utf-8' -d @- << EOF
{
        "query": {
            "bool": {
                "must": [{
                  "match": {
                      "index_hostname": {
                        "query": $index_hostname
                      }
                          }
                        },
            {
                    "regexp": {
                      "iis.access.url":{
                        "value": ".*((jpg)|(jpeg)|(png))"
                      }
                    }
                  }],
                "must_not": {
                    "match": {
                        "iis.access.agent": {
                            "query": "Amazon+CloudFront"
                        }
                    }
                },
                "filter": {
                    "range": {
                        "@timestamp": {
                            "gte": $timestamp_gte,
                            "lte": $timestamp_lte
                        }
                    }
                }
            }
        },
  "aggs" : {
        "whatever" : {
            "terms" : { "field" : "iis.access.remote_ip", "size":$results_limit }
        }
    },
    "size" : 0
    }
EOF
)"

ip_list="$(echo "$resp_data" | jq '.aggregations.whatever.buckets[].key' | tr "\n" ",\ " | head -c -1)"

resp_data2="$(curl -X PUT $es_host/$temp_index_path -u $elk_user:${cred[0]} -H 'Content-Type: application/json; charset=utf-8' -d @- << EOF
{
"ips" : [$ip_list]
}
EOF
)"

echo "$resp_data2"

在Elasticsearch中创建数据子集的最佳方法是什么？

2 个答案: