我使用结构流从kafka读取json数据,并且一些窗口时间序列数据存储在json数据中。 json格式如下:
{"id": "fd78sfsdfsd8vs",
"item": [{"data_identifier": "algid1_set1_totalcount_lstm",
"time_series": [{"time": "20200903 00:00:00", "value": 342342.12},
{"time": "20200903 00:00:05", "value": 342421.88},
{"time": "20200903 00:00:10", "value": 351232.92}]},
{"data_identifier": "algid2_set2_totalcount_lstm",
"time_series": [{"time": "20200903 00:00:00", "value": 342342.12},
{"time": "20200903 00:00:05", "value": 342421.88},
{"time": "20200903 00:00:10", "value": 351232.92}]}
]
}
然后,我处理json数据以获得一个DataFrame,并对DataFrame中的时间序列数据执行异常检测。 DataFrame如下:
+--------------+----------------------+-----------------+---------+
| id|data_identifier_method| time| value|
+--------------+----------------------+-----------------+---------+
|fd78sfsdfsd8vs| algid1_set1_total...|20200903 00:00:00|342342.12|
|fd78sfsdfsd8vs| algid1_set1_total...|20200903 00:00:05|342421.88|
|fd78sfsdfsd8vs| algid1_set1_total...|20200903 00:00:10|351232.92|
|fd78sfsdfsd8vs| algid2_set2_total...|20200903 00:00:00|342342.12|
|fd78sfsdfsd8vs| algid2_set2_total...|20200903 00:00:05|342421.88|
|fd78sfsdfsd8vs| algid2_set2_total...|20200903 00:00:10|351232.92|
|fd78sfsdfsd8vs| algid1_set1_total...|20200903 00:00:00|342342.12|
|fd78sfsdfsd8vs| algid1_set1_total...|20200903 00:00:05|342421.88|
|fd78sfsdfsd8vs| algid1_set1_total...|20200903 00:00:10|351232.92|
|fd78sfsdfsd8vs| algid2_set2_total...|20200903 00:00:00|342342.12|
|fd78sfsdfsd8vs| algid2_set2_total...|20200903 00:00:05|342421.88|
|fd78sfsdfsd8vs| algid2_set2_total...|20200903 00:00:10|351232.92|
|fd78sfsdfsd8vs| algid1_set1_total...|20200903 00:00:00|342342.12|
|fd78sfsdfsd8vs| algid1_set1_total...|20200903 00:00:05|342421.88|
|fd78sfsdfsd8vs| algid1_set1_total...|20200903 00:00:10|351232.92|
|fd78sfsdfsd8vs| algid2_set2_total...|20200903 00:00:00|342342.12|
|fd78sfsdfsd8vs| algid2_set2_total...|20200903 00:00:05|342421.88|
|fd78sfsdfsd8vs| algid2_set2_total...|20200903 00:00:10|351232.92|
|fd78sfsdfsd8vs| algid1_set1_total...|20200903 00:00:00|342342.12|
|fd78sfsdfsd8vs| algid1_set1_total...|20200903 00:00:05|342421.88|
+--------------+----------------------+-----------------+---------+
由于结构流的特性,我希望每个json都独立处理,与其他json无关。我想知道我的想法是否可以实现?如果可能的话如何实现。