我想在pyspark测试或调整。所以我按照计划
df1 = sqlContext.createDataFrame(
[(1, "a", 2.0), (2, "b", 3.0), (3, "c", 3.0),(4, "h", 5.0)],
("x1", "x2", "x3"))
df2 = sqlContext.createDataFrame(
[(1, "f", -1.0), (2, "p", 0.0),(5, "g", -9.0),(7, "h", -2.0)], ("x1", "x2", "x3"))
df = df1.join(df2, (df1.x1 == df2.x1) | (df1.x2 == df2.x2))
df.show()
它提供了错误消息AnalysisException: u'Cartesian joins could be prohibitively expensive and are disabled by default. To explicitly enable them, please set spark.sql.crossJoin.enabled = true;'
但如果我更换|由&然后运行相同的查询没有任何错误消息。你能告诉我这是什么问题吗?
答案 0 :(得分:0)
更新你的spark版本,代码在我的机器上运行,没有任何警告或python 2.7和3.6以及spark版本2.2.0的错误。