我有一段代码可以单独工作,但是当我把它放在循环中(或使用df.apply()
方法)时,它不起作用。
代码是:
import pandas as pd
from functools import partial
datadf=pd.DataFrame(data,columns=['X1','X2'])
for i in datadf.index.values.tolist():
row=datadf.loc[i]
x1=row['X1']
x2=row['X2']
set1=set([x1,x2])
links=data2[data2['Xset']==set1]
df1=pd.DataFrame(range(1,11),columns=['year'])
def idlist1(row,var1):
year=row['year']
id1a=links[(links['xx1']==var1) & (links['year']==year)]
id1a=id1a['id1'].values.tolist()
id1b=links[(links['xx2']==var1) & (links['year']==year)]
id1b=id1b['id2'].values.tolist()
id1=list(set(id1a+id1b))
return id1
df1['id1a']=df1.apply(partial(idlist1,var1=x1),axis=1)
#...(do other stuffs to return a value using "df1")
del df1
此处data2
是另一个数据帧。在这里,我尝试将(x1,x2)
的值与data2
进行匹配。
代码在循环之外正常工作,我的意思是,我直接指定(x1,x2)
。但是当我将代码放在循环中或使用df.apply
时,我总是收到错误消息
ValueError: could not broadcast input array from shape (0) into shape (1)
我不明白为什么。有人可以帮忙吗?谢谢!
(顺便说一下,pandas
的版本是0.18.0
。)
完整的错误消息是:
File "<ipython-input-229-541c0f3a4d2f>", line 19, in <module>
df1['id1a']=df1.apply(partial(idlist1,var1=x1),axis=1)
File "/anaconda2/lib/python2.7/site-packages/pandas/core/frame.py", line 4042, in apply
return self._apply_standard(f, axis, reduce=reduce)
File "/anaconda2/lib/python2.7/site-packages/pandas/core/frame.py", line 4155, in _apply_standard
result = self._constructor(data=results, index=index)
File "/anaconda2/lib/python2.7/site-packages/pandas/core/frame.py", line 223, in __init__
mgr = self._init_dict(data, index, columns, dtype=dtype)
File "/anaconda2/lib/python2.7/site-packages/pandas/core/frame.py", line 359, in _init_dict
return _arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
File "/anaconda2/lib/python2.7/site-packages/pandas/core/frame.py", line 5250, in _arrays_to_mgr
return create_block_manager_from_arrays(arrays, arr_names, axes)
File "/anaconda2/lib/python2.7/site-packages/pandas/core/internals.py", line 3933, in create_block_manager_from_arrays
construction_error(len(arrays), arrays[0].shape, axes, e)
File "/anaconda2/lib/python2.7/site-packages/pandas/core/internals.py", line 3895, in construction_error
raise e
ValueError: could not broadcast input array from shape (0) into shape (1)
更新:我发现df.apply
方法与循环不兼容,所以我将循环中的所有apply
转换为循环,并且代码工作正常。虽然我“有点”解决了这个问题,但我仍然很困惑为什么会发生这种情况。如果有人知道为什么,我真的很感激答案。谢谢!
答案 0 :(得分:1)
可能是因为row
的多个定义,一个作为函数def idlist1(row,var1):
的参数,一个定义为row=datadf.loc[i]
,您可以尝试重命名一个并查看是否它会有所帮助。