我加入两个PySpark DataFrames如下:
exprs = [max(x) for x in ["col1","col2"]]
df = df1.union(df2).groupBy(['campk', 'ppk']).agg(*exprs)
但是我收到了这个错误:
AssertionError: all exprs should be Column
有什么问题?
答案 0 :(得分:4)
exprs = [max(x) for x in ["col1","col2"]]
将返回具有最大ASCII值的字符,即['o', 'o']
引用正确的max
可行:
>>> from pyspark.sql import functions as F
>>> exprs = [F.max(x) for x in ["col1","col2"]]
>>> print(exprs)
[Column<max(col1)>, Column<max(col2)>]