例如,我有dataframe
如下:
var tmp_df = sqlContext.createDataFrame(Seq(
("One", "Sagar", 1),
("Two", "Ramesh" , 2),
("Three", "Suresh", 3),
("One", "Sagar", 5)
)).toDF("ID", "Name", "Balance");
现在我想同样在一个文件中写入具有相同ID的上述数据帧的所有记录。请指教。
答案 0 :(得分:0)
//find records having same id and rename the id column to idstowrite
val idsMoreThanOne = tmp_df.groupBy('id).count.filter('count.gt(1)).withColumnRenamed("id" , "idstowrite")
idsMoreThanOne.show
//join back with original dataframe
val joinedDf = idsMoreThanOne.join(tmp_df ,tmp_df("id") === idsMoreThanOne("idstowrite") , "left")
joinedDf.show
//select only the columns we want
val dfToWrite = joinedDf.select("id" , "Name" , "Balance")
dfToWrite.show