AssertionError:所有exprs都应该是Column

时间:2017-11-13 13:41:17

标签: python apache-spark pyspark

我加入两个PySpark DataFrames如下:

exprs = [max(x) for x in ["col1","col2"]]
df = df1.union(df2).groupBy(['campk', 'ppk']).agg(*exprs)

但是我收到了这个错误:

AssertionError: all exprs should be Column

有什么问题?

1 个答案:

答案 0 :(得分:4)

exprs = [max(x) for x in ["col1","col2"]]

将返回具有最大ASCII值的字符,即['o', 'o']

引用正确的max可行:

>>> from pyspark.sql import functions as F
>>> exprs = [F.max(x) for x in ["col1","col2"]]
>>> print(exprs)
[Column<max(col1)>, Column<max(col2)>]