Question

我在我的cloudant数据库中使用以下JSON Schema：

{...
 departureWeather:{
    temp:30,
    otherfields:xyz
 },
 arrivalWeather:{
    temp:45,
    otherfields: abc
 }
 ...
}

然后我使用cloudant-spark连接器将数据加载到数据框中。如果我尝试选择这样的字段：

df.select("departureWeather.temp", "arrivalWeather.temp")

我最终得到的数据框有2列，名称相同，例如温度。看起来Spark数据源框架仅使用最后一部分来展平名称。

是否容易对列名进行重复数据删除？

Answer 1

您可以使用别名：

df.select(
    col("departureWeather.temp").alias("departure_temp"),
    col("arrivalWeather.temp").alias("arrival_temp")
)