我正在尝试根据一些规则从pyspark数据框中选择一些值。在pyspark获得例外。
from pyspark.sql import functions as F
df.select(df.card_key,F.when((df.tran_sponsor = 'GAMES') & (df.location_code = '9145'),'ENTERTAINMENT').when((df.tran_sponsor = 'XYZ') & (df.location_code = '123'),'eBOOKS').when((df.tran_sponsor = 'XYZ') & (df.l_code.isin(['123', '234', '345', '456', '567', '678', '789', '7878', '67', '456']) ),'FINANCE').otherwise(df.tran_sponsor)).show()
我遇到以下异常。你能提一些建议吗?
文件"",第1行 df.select(df.card_key,F.when((df.tran_sponsor =' GAMES')&(df.location_code =' 9145'),' ENTERTAINMENT' ;)。((df.tran_sponsor =' XYZ')&(df.location_code =' 123'),' eBOOKS')。when((df .tran_sponsor =' XYZ')&(df.l_code.isin([' 6001',' 6002',' 6003',& #39; 6004',' 6005',' 6006',' 6007',' 6008',' 6009&# 39;,' 6010',' 6011',' 6012',' 6013',' 6014'])) '作者&#39)否则(df.tran_sponsor))示出了()。 ^ SyntaxError:语法无效
答案 0 :(得分:2)
嗯,我刚想通了,问题在于赋值算子没有问题:(
df.select(df.card_key,F.when((df.tran_sponsor == 'GAMES') & (df.location_code == '9145'),'ENTERTAINMENT').when((df.tran_sponsor == 'XYZ') & (df.location_code == '123'),'eBOOKS').when((df.tran_sponsor == 'XYZ') & (df.l_code.isin(['123', '234', '345', '456', '567', '678', '789', '7878', '67', '456']) ),'FINANCE').otherwise(df.tran_sponsor)).show()
效果很好,感谢有人在努力研究它。