我试图基于其他三个列数据在数据框中创建一个新列。下面我为相同代码编写的代码
dataFrame.withColumn('net_inventory_qty', when((dataFrame.raw_wip_fg_indicator =='RAW MATERIALS') |
(dataFrame.raw_wip_fg_indicator =='WIP') |
(dataFrame.raw_wip_fg_indicator =='FINISHED GOODS'), dataFrame.total_stock_qty+dataFrame.sit_qty).
otherwise(dataFrame.sit_qty))
但是当我运行胶水作业时,它会抛出错误
pyspark.sql.utils.AnalysisException: u"cannot resolve '(`total_stock_qty` + `sit_qty`)' due to data type mismatch: differing types in '(`total_stock_qty` + `sit_qty`)' (struct<double:double,string:string> and double)
我想念什么?任何建议都会有帮助
答案 0 :(得分:0)
由于检查您的架构,根据错误消息,我可以为2列提供以下类型的列:
total_stock_qty: struct<double:double,string:string>
sit_qty: double
您可以printSchema()或show()首先检查数据