我试图将一个函数应用于数据框中的一个列,该函数返回三个项目。在某些情况下,它可以工作,但对于其他情况,它不起作用。然后我意识到这可能是因为存在NULL值。所以这是我的代码的简化版本:
import pandas as pd
def proc(x):
return ([x,1,2], [x+1,3,4], [x+2,5,6])
## This works fine.
df = pd.DataFrame({'a':[1,2,3]})
df['new1'],df['new2'], df['new3'] = df.a.apply(lambda x:proc(x))
## But this throws the 'too many values to unpack' error.
df2 = pd.DataFrame({'a':[1,2,3, float('nan')]})
df2['new1'],df2['new2'], df2['new3'] = df2.a.apply(lambda x:proc(x))
为什么将浮动(' nan')添加到df['a']
列会导致此错误?
答案 0 :(得分:2)
使用zip
打包值:
def proc(x):
return ([x,1,2], [x+1,3,4], [x+2,5,6])
df2 = pd.DataFrame({'a':[1,2,4, float('nan')]})
df2['new1'], df2['new2'], df2['new3'] = zip(*df2['a'].apply(proc))
a new1 new2 new3
0 1.0 [1.0, 1, 2] [2.0, 3, 4] [3.0, 5, 6]
1 2.0 [2.0, 1, 2] [3.0, 3, 4] [4.0, 5, 6]
2 4.0 [4.0, 1, 2] [5.0, 3, 4] [6.0, 5, 6]
3 NaN [nan, 1, 2] [nan, 3, 4] [nan, 5, 6]
使用正确数量的列表元素解压缩并使用相同的数字在proc中返回:
def proc(x):
return ([x,1,2], [x+1,3,4], [x+2,5,6])
df2 = pd.DataFrame({'a':[1,2,4, float('nan')]})
df2['new1'], df2['new2'], df2['new3'] = zip(*df2['a'].apply(proc))
a new1 new2 new3 new4
0 1.0 [1.0, 1, 2] [2.0, 1, 2] [4.0, 1, 2] [nan, 1, 2]
1 2.0 [2.0, 3, 4] [3.0, 3, 4] [5.0, 3, 4] [nan, 3, 4]
2 4.0 [3.0, 5, 6] [4.0, 5, 6] [6.0, 5, 6] [nan, 5, 6]
3 NaN [4.0, 7, 8] [5.0, 7, 8] [7.0, 7, 8] [nan, 7, 8]