应用错误收集

这是我的驱动程序（用伪代码）：

// read all the files
df1 = spark.read(f1)
df2 = spark.read(f2)
df3 = spark.read(f3)

...

df10 = spark.read(f10)

// crossjoin each pair and report the count<br>
cdf1 = df1.crossjoin(df2)
print cdf1.count
...
cdf5 = df9.crossjoin(df10)
print cdf5.count

当我执行火花提交并转到跟踪器UI时，我看到每个作业都按顺序执行。我本来希望每个负载并行发生，每个交叉联接都并行发生。

我的错误在哪里？

Spark并行操作

0 个答案: