我有一个如下数据框:
dates 0
numbers letters
0 a 2013-01-01 0.261092
2013-01-02 -1.267770
2013-01-03 0.008230
b 2013-01-01 -1.515866
2013-01-02 0.351942
2013-01-03 -0.245463
c 2013-01-01 -0.253103
2013-01-02 -0.385411
2013-01-03 -1.740821
1 a 2013-01-01 -0.108325
2013-01-02 -0.212350
2013-01-03 0.021097
b 2013-01-01 -1.922214
2013-01-02 -1.769003
2013-01-03 -0.594216
c 2013-01-01 -0.419775
2013-01-02 1.511700
2013-01-03 0.994332
2 a 2013-01-01 -0.020299
2013-01-02 -0.749474
2013-01-03 -1.478558
b 2013-01-01 -1.357671
2013-01-02 0.161185
2013-01-03 -0.658246
c 2013-01-01 -0.564796
2013-01-02 -0.333106
2013-01-03 -2.814611
现在我收到了一个查询列表,如:
numbers letters
0 0 b
1 1 c
2 0 b
我需要选择索引满足列表的数据。答案如下:
dates 0
numbers letters
0 b 2013-01-01 -1.515866
2013-01-02 0.351942
2013-01-03 -0.245463
1 c 2013-01-01 -0.419775
2013-01-02 1.511700
2013-01-03 0.994332
0 b 2013-01-01 -1.515866
2013-01-02 0.351942
2013-01-03 -0.245463
如何从MultiIndex的Dataframe中选择特定数据来回答包含重复行的查询列表?重要的是要注意查询列表比数据帧的长度长得多。因此,我需要一个足够快的方法来解决这个问题。
(PS,还有另一个问题,比如这个问题,但没有重复的查询。 How to select a subset from a Multi-Index Dataframe based on conditions from another DataFrame)
答案 0 :(得分:1)
如果您将第二个DataFrame
转换为MultiIndex
,则只需使用DataFrame
.loc
In [2]: idx = df2.set_index(['numbers', 'letters']).index
In [3]: print df.loc[idx]
dates 0
numbers letters
0 b 2013-01-01 -1.515866
b 2013-01-02 0.351942
b 2013-01-03 -0.245463
1 c 2013-01-01 -0.419775
c 2013-01-02 1.511700
c 2013-01-03 0.994332
0 b 2013-01-01 -1.515866
b 2013-01-02 0.351942
b 2013-01-03 -0.245463