Question

Hello其他程序员，

我正在尝试使用逻辑索引将行复制到数据框中，但我得到了nans。我们的想法是使用逻辑索引快速替换数据框中的多个行：

示例我期望它如何工作：

# pseudo code: example to create logical vector
logical_vector = df.loc[:,'colname']==x
# pseudo code: example to use logical vector to index a dataframe
df[logical_vector,:]=df1.loc[logical_vector2,:]

这是基本操作之一，如果可能，允许快速矩阵操作。如何解决这个问题，最好没有循环？

我创建了一个示例来说明问题所在：

# create example 10x5 dataframe containing random numbers
x=pd.DataFrame(np.random.randn(10,5),columns=['ac', 'bc','cc','dc', 
'ec'],index = ['a','b','c','d','e','f','g','h','i','j'])

# add a column containing information for use in example of logical indexing
x['t']= [1,0,1,0,1,0,1,0,1,0]

x
Out[76]: 
         ac        bc        cc        dc        ec  t
a -1.029517  1.936904  1.143655  0.708996 -1.218484  1
b  1.836638 -0.723243 -0.501546 -2.046355  0.248156  0
c  2.369828  0.559880 -0.878904  0.673454 -0.630927  1
d -0.629210  1.261608 -0.190508 -0.582700  0.068166  0
e  1.500134  0.534379  0.375362  0.849761 -1.675824  1
f  1.399520  0.038366 -0.137986  0.156580 -0.674619  0
g -1.359863  0.433721 -0.625973 -0.477530 -0.542612  1
h -0.694573 -0.196907 -0.372210  0.464188 -1.217399  0
i  1.357809 -0.017611  0.539137 -1.016894  0.172672  1
j  0.366195  0.750404 -0.055895  0.358795  0.181593  0

然后我尝试使用索引替换我得到这个：

# Use logical indexes x.loc[:,'t']==0 and x.loc[:,'t']==1 to point and get
# data into x. This should replace all row values that contain '0' in column
# t with row values from columns that have '1' for column t
x.loc[x.loc[:,'t']==0,:]=x.loc[x.loc[:,'t']==1,:]

x
Out[78]: 
         ac        bc        cc        dc        ec    t
a -1.029517  1.936904  1.143655  0.708996 -1.218484  1.0
b       NaN       NaN       NaN       NaN       NaN  NaN
c  2.369828  0.559880 -0.878904  0.673454 -0.630927  1.0
d       NaN       NaN       NaN       NaN       NaN  NaN
e  1.500134  0.534379  0.375362  0.849761 -1.675824  1.0
f       NaN       NaN       NaN       NaN       NaN  NaN
g -1.359863  0.433721 -0.625973 -0.477530 -0.542612  1.0
h       NaN       NaN       NaN       NaN       NaN  NaN
i  1.357809 -0.017611  0.539137 -1.016894  0.172672  1.0
j       NaN       NaN       NaN       NaN       NaN  NaN

虽然我期待这个：

Out[76]: 
         ac        bc        cc        dc        ec  t
a -1.029517  1.936904  1.143655  0.708996 -1.218484  1
b -1.029517  1.936904  1.143655  0.708996 -1.218484  1
c  2.369828  0.559880 -0.878904  0.673454 -0.630927  1
d  2.369828  0.559880 -0.878904  0.673454 -0.630927  1
e  1.500134  0.534379  0.375362  0.849761 -1.675824  1
f  1.500134  0.534379  0.375362  0.849761 -1.675824  1
g -1.359863  0.433721 -0.625973 -0.477530 -0.542612  1
h -1.359863  0.433721 -0.625973 -0.477530 -0.542612  1
i  1.357809 -0.017611  0.539137 -1.016894  0.172672  1
j  1.357809 -0.017611  0.539137 -1.016894  0.172672  1

我错过了什么吗？

Answer 1

好问题，问题是右侧的索引与左侧不匹配。以下解决了一个更简单的例子：

df=pd.DataFrame({'a':[1,0,1,0],'b':[2,0,2,0]})

   a  b
0  1  2
1  0  0
2  1  2
3  0  0

df.loc[df['a']==0,:]=df.loc[df['a']==1,:].set_index(df.loc[df['a']==0,:].index)

   a  b
0  1  2
1  1  2
2  1  2
3  1  2

如果你知道形状是相同的，你可以简单地取值：

df=pd.DataFrame({'a':[1,0,1,0],'b':[2,0,2,0]})
df.loc[df['a']==0,:]=df.loc[df['a']==1,:].values

使用列逻辑索引将行复制到python数据帧中

1 个答案: