Question

我的问题如下：
假设我在pandas中有两个具有相同列数的数据帧，例如：

A= 1 2
   3 4 
   8 9

和

B= 7 8
   4 0

还有一个布尔向量，其长度恰好是A + num of B rows = 5的行数，其中1 s的数量与B中的行数相同，这意味着两个1 s在这个例子中。我们说Bool= 0 1 0 1 0。

我的目标是将A和B合并为一个更大的数据框C，这样B的行对应Bool中的1，所以这个例子它会给我：

你知道怎么做吗？如果你知道这对我有多大帮助。谢谢你的阅读。

Answer 1

一个选项是创建一个具有预期形状的空数据框，然后填充 A 和 B 中的值：

import pandas as pd
import numpy as np

# initialize a data frame with the same data types as A thanks to @piRSquared
df = pd.DataFrame(np.empty((A.shape[0] + B.shape[0], A.shape[1])), dtype=A.dtypes)
Bool = np.array([0, 1, 0, 1, 0]).astype(bool)

df.loc[Bool,:] = B.values
df.loc[~Bool,:] = A.values

df
#   0   1
#0  1   2
#1  7   8
#2  3   4
#3  4   0
#4  8   9

Answer 2

这是一个仅使用熊猫的解决方案，重新索引原始数据帧，然后将它们连接起来：

Bool = pd.Series([0, 1, 0, 1, 0], dtype=bool) 
B.index = Bool[ Bool].index
A.index = Bool[~Bool].index
pd.concat([A,B]).sort_index() # sort_index() is not really necessary
#   0  1
#0  1  2
#1  7  8
#2  3  4
#3  4  0
#4  8  9

Answer 3

以下方法将推广到比2更大的组。从

开始

A = pd.DataFrame([[1,2],[3,4],[8,9]])    
B = pd.DataFrame([[7,8],[4,0]])    
C = pd.DataFrame([[9,9],[5,5]])
bb = pd.Series([0, 1, 0, 1, 2, 2, 0])

我们可以使用

pd.concat([A, B, C]).iloc[bb.rank(method='first')-1].reset_index(drop=True)

给出了

In [269]: pd.concat([A, B, C]).iloc[bb.rank(method='first')-1].reset_index(drop=True)
Out[269]: 
   0  1
0  1  2
1  7  8
2  3  4
3  4  0
4  9  9
5  5  5
6  8  9

这是有效的，因为当您使用method='first'时，它会按顺序按值排列值，然后按照它们看到的顺序排列。这意味着我们得到了像

这样的东西

In [270]: pd.Series([1, 0, 0, 1, 0]).rank(method='first')
Out[270]: 
0    4.0
1    1.0
2    2.0
3    5.0
4    3.0
dtype: float64

正好（在减去一个之后）我们想要选择行的iloc顺序。

根据布尔向量组合2个pandas数据帧

3 个答案: