(Python代码无法在循环中工作)pandas.DataFrame.apply()无法在循环

时间:2016-08-15 04:27:28

标签: python loops pandas

我有一段代码可以单独工作,但是当我把它放在循环中(或使用df.apply()方法)时,它不起作用。 代码是:

import pandas as pd
from functools import partial
datadf=pd.DataFrame(data,columns=['X1','X2'])
for i in datadf.index.values.tolist():
    row=datadf.loc[i]
    x1=row['X1']
    x2=row['X2']
    set1=set([x1,x2])
    links=data2[data2['Xset']==set1]
    df1=pd.DataFrame(range(1,11),columns=['year'])
    def idlist1(row,var1):
        year=row['year']
        id1a=links[(links['xx1']==var1) & (links['year']==year)]
        id1a=id1a['id1'].values.tolist()
        id1b=links[(links['xx2']==var1) & (links['year']==year)]
        id1b=id1b['id2'].values.tolist()
        id1=list(set(id1a+id1b))
        return id1
    df1['id1a']=df1.apply(partial(idlist1,var1=x1),axis=1)
    #...(do other stuffs to return a value using "df1")
    del df1

此处data2是另一个数据帧。在这里,我尝试将(x1,x2)的值与data2进行匹配。 代码在循环之外正常工作,我的意思是,我直接指定(x1,x2)。但是当我将代码放在循环中或使用df.apply时,我总是收到错误消息

ValueError: could not broadcast input array from shape (0) into shape (1) 

我不明白为什么。有人可以帮忙吗?谢谢! (顺便说一下,pandas的版本是0.18.0。) 完整的错误消息是:

File "<ipython-input-229-541c0f3a4d2f>", line 19, in <module>
df1['id1a']=df1.apply(partial(idlist1,var1=x1),axis=1)

File "/anaconda2/lib/python2.7/site-packages/pandas/core/frame.py", line 4042, in apply
return self._apply_standard(f, axis, reduce=reduce)

File "/anaconda2/lib/python2.7/site-packages/pandas/core/frame.py",  line 4155, in _apply_standard
result = self._constructor(data=results, index=index)

File "/anaconda2/lib/python2.7/site-packages/pandas/core/frame.py", line 223, in __init__
mgr = self._init_dict(data, index, columns, dtype=dtype)

File "/anaconda2/lib/python2.7/site-packages/pandas/core/frame.py", line 359, in _init_dict
return _arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)

File "/anaconda2/lib/python2.7/site-packages/pandas/core/frame.py", line 5250, in _arrays_to_mgr
return create_block_manager_from_arrays(arrays, arr_names, axes)

File "/anaconda2/lib/python2.7/site-packages/pandas/core/internals.py", line 3933, in create_block_manager_from_arrays
construction_error(len(arrays), arrays[0].shape, axes, e)

File "/anaconda2/lib/python2.7/site-packages/pandas/core/internals.py", line 3895, in construction_error
raise e

ValueError: could not broadcast input array from shape (0) into shape (1)

更新:我发现df.apply方法与循环不兼容,所以我将循环中的所有apply转换为循环,并且代码工作正常。虽然我“有点”解决了这个问题,但我仍然很困惑为什么会发生这种情况。如果有人知道为什么,我真的很感激答案。谢谢!

1 个答案:

答案 0 :(得分:1)

可能是因为row的多个定义,一个作为函数def idlist1(row,var1):的参数,一个定义为row=datadf.loc[i],您可以尝试重命名一个并查看是否它会有所帮助。