Question

我正在尝试从日期列中查找季度开始日期。当我使用selectExpr（）

编写它时，我得到了预期的结果

df.selectExpr("add_months(history_effective_month,-(month(history_effective_month)%3)+1) as history_effective_qtr","history_effective_month").show(5)

output-

history_effective_qtr   history_effective_month

       2017-07-01                2017-06-01
       2016-04-01                2016-05-01
       2015-10-01                2015-09-01
       2012-01-01                2012-01-01
       2012-01-01                2012-01-01

但是当我在.withColumn（）中添加相同的逻辑时，我得到TypeError：Column不可迭代

df.withColumn("history_effective_quarter",add_months('history_effective_month',-(month('history_effective_month')%3)+1))

TypeError Traceback (most recent call last) 
<ipython-input-259-0bb78d27d2a7> in <module>() 1 

~/anaconda3/lib/python3.6/site-packages/pyspark/sql/column.py in iter(self) 248 249 def iter(self): --> 250 raise TypeError("Column is not iterable") 251 252 # string methods

TypeError: Column is not iterable

我使用的解决方法如下

df=selectExpr('*',"date_sub(history_effective_date," \
   "dayofmonth(history_effective_date)-1) as history_effective_month")

Answer 1

TL; DR 只需使用select：

select(*cols)

投影一组表达式并返回一个新的DataFrame。

df.select(
   "history_effective_quarter", add_months('history_effective_month',
   -(month('history_effective_month')%3)+1))

您的代码无效，因为withColumn：

withColumn(colName, col)

通过添加列或替换具有相同名称的现有列来返回新的DataFrame。

用于添加单个列

TypeError：在withColumn（）中使用多个列时，列不可迭代

1 个答案: