我在DataFrame中读取了一个巨大的文件,其中包含JSON对象的每一行,如下所示:
{
"userId": "12345",
"vars": {
"test_group": "group1",
"brand": "xband"
},
"modules": [
{
"id": "New"
},
{
"id": "Default"
},
{
"id": "BestValue"
},
{
"id": "Rating"
},
{
"id": "DeliveryMin"
},
{
"id": "Distance"
}
]
}
我怎么能以这种方式操纵DataFrame,只保留模块 id =“Default”?如果 id 不等于“默认”,如何删除所有其他内容?
答案 0 :(得分:1)
如你所说,你在每一行中都有json
格式为
{"userId":"12345","vars":{"test_group":"group1","brand":"xband"},"modules":[{"id":"New"},{"id":"Default"},{"id":"BestValue"},{"id":"Rating"},{"id":"DeliveryMin"},{"id":"Distance"}]}
{"userId":"12345","vars":{"test_group":"group1","brand":"xband"},"modules":[{"id":"New"},{"id":"Default"},{"id":"BestValue"},{"id":"Rating"},{"id":"DeliveryMin"},{"id":"Distance"}]}
如果这是真的,那么您可以使用sqlContext
的{{1}} api将json
文件读取到json
,如下所示
dataframe
应该为val df = sqlContext.read.json("path to json file")
提供
dataframe
和+--------------------------------------------------------------------+------+--------------+
|modules |userId|vars |
+--------------------------------------------------------------------+------+--------------+
|[[New], [Default], [BestValue], [Rating], [DeliveryMin], [Distance]]|12345 |[xband,group1]|
|[[New], [Default], [BestValue], [Rating], [DeliveryMin], [Distance]]|12345 |[xband,group1]|
+--------------------------------------------------------------------+------+--------------+
schema
最后一步是root
|-- modules: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- id: string (nullable = true)
|-- userId: string (nullable = true)
|-- vars: struct (nullable = true)
| |-- brand: string (nullable = true)
| |-- test_group: string (nullable = true)
仅filter
modules.id
作为值
Default
应该给你
val finaldf = df.withColumn("modules", explode($"modules.id"))
.filter($"modules" === "Default")
我希望答案很有帮助
<强>更新强>
这会将+-------+------+--------------+
|modules|userId|vars |
+-------+------+--------------+
|Default|12345 |[xband,group1]|
|Default|12345 |[xband,group1]|
+-------+------+--------------+
创建为
json
但如果你的要求是如下所示
{"modules":"Default","userId":"12345","vars":{"brand":"xband","test_group":"group1"}}
{"modules":"Default","userId":"12345","vars":{"brand":"xband","test_group":"group1"}}
你应该爆炸 {"modules":{"id":"Default"},"userId":"12345","vars":{"brand":"xband","test_group":"group1"}}
{"modules":{"id":"Default"},"userId":"12345","vars":{"brand":"xband","test_group":"group1"}}
而不是modules
modules.id