Hello其他程序员,
我正在尝试使用逻辑索引将行复制到数据框中,但我得到了nans。我们的想法是使用逻辑索引快速替换数据框中的多个行:
示例我期望它如何工作:
# pseudo code: example to create logical vector
logical_vector = df.loc[:,'colname']==x
# pseudo code: example to use logical vector to index a dataframe
df[logical_vector,:]=df1.loc[logical_vector2,:]
这是基本操作之一,如果可能,允许快速矩阵操作。如何解决这个问题,最好没有循环?
我创建了一个示例来说明问题所在:
# create example 10x5 dataframe containing random numbers
x=pd.DataFrame(np.random.randn(10,5),columns=['ac', 'bc','cc','dc',
'ec'],index = ['a','b','c','d','e','f','g','h','i','j'])
# add a column containing information for use in example of logical indexing
x['t']= [1,0,1,0,1,0,1,0,1,0]
x
Out[76]:
ac bc cc dc ec t
a -1.029517 1.936904 1.143655 0.708996 -1.218484 1
b 1.836638 -0.723243 -0.501546 -2.046355 0.248156 0
c 2.369828 0.559880 -0.878904 0.673454 -0.630927 1
d -0.629210 1.261608 -0.190508 -0.582700 0.068166 0
e 1.500134 0.534379 0.375362 0.849761 -1.675824 1
f 1.399520 0.038366 -0.137986 0.156580 -0.674619 0
g -1.359863 0.433721 -0.625973 -0.477530 -0.542612 1
h -0.694573 -0.196907 -0.372210 0.464188 -1.217399 0
i 1.357809 -0.017611 0.539137 -1.016894 0.172672 1
j 0.366195 0.750404 -0.055895 0.358795 0.181593 0
然后我尝试使用索引替换我得到这个:
# Use logical indexes x.loc[:,'t']==0 and x.loc[:,'t']==1 to point and get
# data into x. This should replace all row values that contain '0' in column
# t with row values from columns that have '1' for column t
x.loc[x.loc[:,'t']==0,:]=x.loc[x.loc[:,'t']==1,:]
x
Out[78]:
ac bc cc dc ec t
a -1.029517 1.936904 1.143655 0.708996 -1.218484 1.0
b NaN NaN NaN NaN NaN NaN
c 2.369828 0.559880 -0.878904 0.673454 -0.630927 1.0
d NaN NaN NaN NaN NaN NaN
e 1.500134 0.534379 0.375362 0.849761 -1.675824 1.0
f NaN NaN NaN NaN NaN NaN
g -1.359863 0.433721 -0.625973 -0.477530 -0.542612 1.0
h NaN NaN NaN NaN NaN NaN
i 1.357809 -0.017611 0.539137 -1.016894 0.172672 1.0
j NaN NaN NaN NaN NaN NaN
虽然我期待这个:
Out[76]:
ac bc cc dc ec t
a -1.029517 1.936904 1.143655 0.708996 -1.218484 1
b -1.029517 1.936904 1.143655 0.708996 -1.218484 1
c 2.369828 0.559880 -0.878904 0.673454 -0.630927 1
d 2.369828 0.559880 -0.878904 0.673454 -0.630927 1
e 1.500134 0.534379 0.375362 0.849761 -1.675824 1
f 1.500134 0.534379 0.375362 0.849761 -1.675824 1
g -1.359863 0.433721 -0.625973 -0.477530 -0.542612 1
h -1.359863 0.433721 -0.625973 -0.477530 -0.542612 1
i 1.357809 -0.017611 0.539137 -1.016894 0.172672 1
j 1.357809 -0.017611 0.539137 -1.016894 0.172672 1
我错过了什么吗?
答案 0 :(得分:0)
好问题,问题是右侧的索引与左侧不匹配。以下解决了一个更简单的例子:
df=pd.DataFrame({'a':[1,0,1,0],'b':[2,0,2,0]})
a b
0 1 2
1 0 0
2 1 2
3 0 0
df.loc[df['a']==0,:]=df.loc[df['a']==1,:].set_index(df.loc[df['a']==0,:].index)
a b
0 1 2
1 1 2
2 1 2
3 1 2
如果你知道形状是相同的,你可以简单地取值:
df=pd.DataFrame({'a':[1,0,1,0],'b':[2,0,2,0]})
df.loc[df['a']==0,:]=df.loc[df['a']==1,:].values