如何从火花中获取分区?

时间:2018-02-27 11:29:20

标签: scala apache-spark partitioning

有一个DataFrame,需要从另一个DataFrame执行连接。要减少数据,需要选择等于分区的数据查看代码:

 // get partition values (like 2017-01-01, 2017-01-02 etc)
val partitionValues = leftDataFrame.someFunctionHere()
rightDataFrame.createOrReplaceTempView("view")
//approximative syntax here
val rightDataFrameReduced = sparkSession
    .sql(s"select * from view where my_partition_col IN ($partitionValues)") 
rightDataFrameReduced.createOrReplaceTempView("right_df")
leftDataFrame.createOrReplaceTempView("left_df")    
//approximative syntax here
sparkSession.sql(s"select * from view right_df joint left_df ON right_df.id = left_df.id")

所以问题是 - 使用什么而不是leftDataFrame.someFunctionHere()来获取分区值并避免完整的recods扫描?

0 个答案:

没有答案