Question

我有一个带有字符串键和JSON字符串值的select * from t where a > 0.9。这个想法是应用一些业务逻辑来解析基于键的JSON。对于每个键，可以有多个JSON。因此，我选择执行select * from t where b > 0.9，但这是我要避免的事情，因为我不想从多个节点中重新整理数据，因为解析逻辑独立于具有相同密钥的其他JSON。

我想知道是否有人使用过类似的用例，并且有更好的方法来做到这一点。

Dataframe

==身体计划==

ObjectHashAggregate（键= [键＃6]，函数= [collect_list（值＃7，0，0）]）   +-Exchange哈希分区（key＃6，200）+-ObjectHashAggregate（keys = [key＃6]，   functions = [partial_collect_list（value＃7，0，0）]）         +-*（1）项目[value＃3 AS key＃6，value＃4 AS value＃7]            +-*（1）SerializeFromObject [staticinvoke（class org.apache.spark.unsafe.types.UTF8String，StringType，fromString，   input [0，scala.Tuple2，true] ._ 1，true，false）AS值＃3，   staticinvoke（类org.apache.spark.unsafe.types.UTF8String，   StringType，fromString，input [0，scala.Tuple2，true] ._ 2，true，false）   AS值4]               +-扫描ExternalRDDScan [obj＃2]

火花组的替代方法通过非数字聚合操作

0 个答案: