spark partitionBy与json输出中的分区列

时间:2018-06-04 23:30:16

标签: apache-spark apache-spark-sql

我正在阅读json in spark

{"bucket": "B01", "actionType": "A1", "preaction": "NULL", "postaction": "NULL"}
{"bucket": "B02", "actionType": "A2", "preaction": "NULL", "postaction": "NULL"}
{"bucket": "B03", "actionType": "A3", "preaction": "NULL", "postaction": "NULL"}

val df=spark.read.json("actions.json").toDF()

现在我将相同的内容写入json输出,如下所示

df.write. format("json"). mode("append"). partitionBy("bucket","actionType"). save("output.json")

,output.json如下所示

{"preaction":"NULL","postaction":"NULL"}

桶,json输出中缺少actionType列,我还需要在输出中使用partitionby列

1 个答案:

答案 0 :(得分:0)

您可以通过创建要在import org.apache.spark.sql.functions._ df.select(Seq(col("bucket").as("bucketCopy"), col("actionType").as("actionTypeCopy")) ++ df.columns.map(col): _*) .write.format("json").mode("append").partitionBy("bucketCopy","actionTypeCopy"). save("output.json") 中用作

的重复列来尝试解决方法
UserModel::with(['brand' => function($query) {
    $query->with(['id', 'keyword', 'name', 'description']);
}]);