<table>
<tr>
<th>Id</th>
<th>Name</th>
</tr>
<tr>
<td>1</td>
<td>A</td>
</tr>
<tr>
<td>2</td>
<td>B</td>
</tr>
<tr>
<td>3</td>
<td>C</td>
</tr>
<tr>
<td>4 .. </td>
<td>..</td>
</tr>
</table>
在x行之后,我想将其分成两个,三个或x个数据帧。为了使其更易于理解,我将尝试解释我正在考虑的方法的逻辑。
def divideDF(df: DataFrame, delimiter: Integer): Seq[DataFrame] = {
val num = df.count
val start = 0
val end = fn.round(num/delimiter) // this is the number of dfs i want to receive
val i = 0
while(i <= end){
// split df in multiple data frames
}
}
我非常感谢您的帮助,如果您需要更多信息,我会提供:)
答案 0 :(得分:0)
尝试使用randomSplit
函数:
import org.apache.spark.sql.SparkSession
object SampleFoo extends App {
val spark = SparkSession
.builder()
.master("local[2]")
.getOrCreate()
spark.sparkContext.setLogLevel("WARN")
import spark.implicits._
val splits = (1 to 100)
.toDS
.randomSplit(Array(.25, .25, .25, .25), 1)
println(splits.length)
splits.head.printSchema()
splits.foreach(s => {
s.show(40)
})
}