这是我的驱动程序(用伪代码):
// read all the files
df1 = spark.read(f1)
df2 = spark.read(f2)
df3 = spark.read(f3)
...
df10 = spark.read(f10)
// crossjoin each pair and report the count<br>
cdf1 = df1.crossjoin(df2)
print cdf1.count
...
cdf5 = df9.crossjoin(df10)
print cdf5.count
当我执行火花提交并转到跟踪器UI时,我看到每个作业都按顺序执行。我本来希望每个负载并行发生,每个交叉联接都并行发生。
我的错误在哪里?