Question

我尝试使用此代码创建一个max列。总和栏有效

总和：

for col in list_names:
    for month in [3,6,9,12]:
        companies = companies.withColumn(col + 'sum_'+ str(month) + '_months', sum(companies[col + ult_pats2[month_ix - ix]] for ix in range(month)) )

最大：

for col in list_names:
    for month in [3,6,9,12]:
        companies = companies.withColumn(col + 'max_'+ str(month) + '_months', max(companies[col + ult_pats2[month_ix - ix]] for ix in range(month)) )

错误消息是：

“ ValueError：无法将列转换为布尔值：请使用'＆' “和”，“ |”构建DataFrame布尔值时为'or'，为'〜'为'not' 表达式”

Answer 1

在我看来，这就像用其他软件包覆盖max函数。试试：

import pyspark.sql.functions as f

然后使用引用f.max(...)

Answer 2

最后，它使用sf.greatest与该代码一起使用：

import pyspark.sql.functions as sf

for col in list_names:  
    for month  in [3,6,9,12]:
            companies = companies.withColumn('max_'+ col + str(month) + '_months',
                                             sf.greatest( *[sf.col(col + ult_pats2[month_ix - ix]) for ix in range(month)] ) )

无法创建具有列表理解的最大列。总和列工程

2 个答案: