我具有以下DataFrame:
+------------------------+--------------------+---+---+----------+----------------------------------------------------------------------------------------------+
|_id |h |inc|op |ts |webhooks |
+------------------------+--------------------+---+---+----------+----------------------------------------------------------------------------------------------+
|5926115bffecf947d9fdf965|-3783513890158363801|148|u |1564077339|[[5,,,], [1, 2019-07-25 17:55:39.813,, 2019-07-25 17:55:39.819], [0,,,], [2,,,], [3,,,]] |
|5926115bffecf947d9fdf965|-6421919050082865687|151|u |1564077339|[[5,,,], [1, 2019-07-25 17:55:39.822,, 2019-07-25 17:55:39.845], [0,,,], [2,,,], [3,,,]] |
|5926115bffecf947d9fdf965|-1953717027542703837|155|u |1564077339|[[5,,,], [1, 2019-07-25 17:55:39.873,, 2019-07-25 17:55:39.878], [0,,,], [2,,,], [3,,,]] |
|5926115bffecf947d9fdf965|7260191374440479618 |159|u |1564077339|[[5,,,], [1, 2019-07-25 17:55:39.945,, 2019-07-25 17:55:39.951], [0,,,], [2,,,], [3,,,]] |
|57d17de901cc6a6c9e0000ab|-2430099739381353477|131|u |1564077339|[[5,,,], [1,,,], [0, 2019-07-25 17:55:39.722, error, 2019-07-25 17:55:39.731], [2,,,], [3,,,]]|
|5b9bf21bffecf966c2878b11|4122669520839049341 |30 |u |1564077341|[[5,,,], [1,,,], [0,, listening, 2019-07-25 17:55:41.453], [2,,,], [3,,,]] |
|5b9bf21bffecf966c2878b11|4122669520839049341 |30 |u |1564077341|[[5,,,], [1,,,], [0,, listening, 2019-07-25 17:55:41.453], [2,,,], [3,,,]] |
|5b9bf21bffecf966c2878b11|-7191334145177061427|60 |u |1564077341|[[5,,,], [1,,,], [0,,, 2019-07-25 17:55:41.768], [2,,,], [3,,,]] |
|5b9bf21bffecf966c2878b11|1897433358396319399 |58 |u |1564077341|[[5,,,], [1,,,], [0,,, 2019-07-25 17:55:41.767], [2,,,], [3,,,]] |
|5b9bf21bffecf966c2878b11|1897433358396319399 |58 |u |1564077341|[[5,,,], [1,,,], [0,,, 2019-07-25 17:55:41.767], [2,,,], [3,,,]] |
|58c6d048edbb6e09eb177639|8363076784039152000 |23 |u |1564077342|[[5,,,], [1,,,], [0,,, 2019-07-25 17:55:42.216], [2,,,], [3,,,]] |
|5b9bf21bffecf966c2878b11|-7191334145177061427|60 |u |1564077341|[[5,,,], [1,,,], [0,,, 2019-07-25 17:55:41.768], [2,,,], [3,,,]] |
|58c6d048edbb6e09eb177639|8363076784039152000 |23 |u |1564077342|[[5,,,], [1,,,], [0,,, 2019-07-25 17:55:42.216], [2,,,], [3,,,]] |
|5ac6a0d3b795b013a5a73a43|-3790832816225805697|36 |u |1564077346|[[5,,,], [1,,,], [0,,,], [2, 2019-07-25 17:55:46.384,, 2019-07-25 17:55:46.400], [3,,,]] |
|5ac6a0d3b795b013a5a73a43|-1747137668935062717|34 |u |1564077346|[[5,,,], [1,,,], [0,,,], [2, 2019-07-25 17:55:46.385,, 2019-07-25 17:55:46.398], [3,,,]] |
|5ac6a0d3b795b013a5a73a43|-1747137668935062717|34 |u |1564077346|[[5,,,], [1,,,], [0,,,], [2, 2019-07-25 17:55:46.385,, 2019-07-25 17:55:46.398], [3,,,]] |
|5ac6a0d3b795b013a5a73a43|-3790832816225805697|36 |u |1564077346|[[5,,,], [1,,,], [0,,,], [2, 2019-07-25 17:55:46.384,, 2019-07-25 17:55:46.400], [3,,,]] |
|5ac6a0d3b795b013a5a73a43|6060575882395080442 |63 |u |1564077346|[[5,,,], [1,,,], [0,,,], [2, 2019-07-25 17:55:46.506,, 2019-07-25 17:55:46.529], [3,,,]] |
|5ac6a0d3b795b013a5a73a43|6060575882395080442 |63 |u |1564077346|[[5,,,], [1,,,], [0,,,], [2, 2019-07-25 17:55:46.506,, 2019-07-25 17:55:46.529], [3,,,]] |
|594e88f1ffecf918a14c143e|736029767610412482 |58 |u |1564077346|[[5,,,], [1,,,], [0, 2019-07-25 17:55:46.503,, 2019-07-25 17:55:46.513], [2,,,], [3,,,]] |
+------------------------+--------------------+---+---+----------+----------------------------------------------------------------------------------------------+
具有以下架构:
root
|-- _id: string (nullable = true)
|-- h: string (nullable = true)
|-- inc: string (nullable = true)
|-- op: string (nullable = true)
|-- ts: string (nullable = true)
|-- webhooks: array (nullable = false)
| |-- element: struct (containsNull = false)
| | |-- index: string (nullable = false)
| | |-- failed_at: string (nullable = true)
| | |-- status: string (nullable = true)
| | |-- updated_at: string (nullable = true)
在webhooks列上,我有一些元素只有一个项目:
[[5,,,], [1, 2019-07-25 17:55:39.813,, 2019-07-25 17:55:39.819], [0,,,], [2,,,], [3,,,]]
我该怎么做才能删除只有一个数字的元素,这样我就可以在每一行上添加类似的内容:
[[1, 2019-07-25 17:55:39.813,, 2019-07-25 17:55:39.819]]
[[1, 2019-07-25 17:55:39.822,, 2019-07-25 17:55:39.845]]
谢谢。
答案 0 :(得分:0)
首先,爆炸您的webhooks
,例如
df.withColumn("webhooks", explode($"webhooks"))
使数组元素进入每一行。然后,像这样
df.where(col("webhooks").getItem("failed_at").isNotNull || col("webhooks").getItem("status").isNotNull || col("webhooks").getItem("updated_at").isNotNull)
它不会给出结果,因为我无法测试您的数据框,但是您可以引用我的代码并获得所需的结果。