我正在阅读json in spark
{"bucket": "B01", "actionType": "A1", "preaction": "NULL", "postaction": "NULL"}
{"bucket": "B02", "actionType": "A2", "preaction": "NULL", "postaction": "NULL"}
{"bucket": "B03", "actionType": "A3", "preaction": "NULL", "postaction": "NULL"}
val df=spark.read.json("actions.json").toDF()
现在我将相同的内容写入json输出,如下所示
df.write. format("json"). mode("append"). partitionBy("bucket","actionType"). save("output.json")
,output.json如下所示
{"preaction":"NULL","postaction":"NULL"}
桶,json输出中缺少actionType列,我还需要在输出中使用partitionby列
答案 0 :(得分:0)
您可以通过创建要在import org.apache.spark.sql.functions._
df.select(Seq(col("bucket").as("bucketCopy"), col("actionType").as("actionTypeCopy")) ++ df.columns.map(col): _*)
.write.format("json").mode("append").partitionBy("bucketCopy","actionTypeCopy"). save("output.json")
中用作
UserModel::with(['brand' => function($query) {
$query->with(['id', 'keyword', 'name', 'description']);
}]);