公共索引中最接近的值

时间:2019-05-07 07:07:06

标签: pandas

在此数据集中 df

data = ['dog', 'cat', 'rabbit', 'elephant']
i = data*3
base = pd.DataFrame(np.random.randn(12, 2), index=i, columns=list('AB'))
marker = pd.DataFrame(np.random.randn(4,1), index=data, columns=['marker'])

df = base.join(marker)

如何获取最接近其标记的df ['A']列的行?

通过this link,但无法按每个唯一索引提取行。

2 个答案:

答案 0 :(得分:1)

使用:

np.random.seed(123)

data = ['dog', 'cat', 'rabbit', 'elephant']
i = data*3
base = pd.DataFrame(np.random.randn(12, 2), index=i, columns=list('AB'))
marker = pd.DataFrame(np.random.randn(4,1), index=data, columns=['marker'])

print (base)
                 A         B
dog      -1.085631  0.997345
cat       0.282978 -1.506295
rabbit   -0.578600  1.651437
elephant -2.426679 -0.428913
dog       1.265936 -0.866740
cat      -0.678886 -0.094709
rabbit    1.491390 -0.638902
elephant -0.443982 -0.434351
dog       2.205930  2.186786
cat       1.004054  0.386186
rabbit    0.737369  1.490732
elephant -0.935834  1.175829

print (marker)
            marker
dog      -1.253881
cat      -0.637752
rabbit    0.907105
elephant -1.428681

DataFrame.sort_index排序索引-原因是避免在上一次过滤中使用ValueError: cannot reindex from a duplicate axis

base = base.sort_index()
print (base)
                 A         B
cat       0.282978 -1.506295
cat      -0.678886 -0.094709
cat       1.004054  0.386186
dog      -1.085631  0.997345
dog       1.265936 -0.866740
dog       2.205930  2.186786
elephant -2.426679 -0.428913
elephant -0.443982 -0.434351
elephant -0.935834  1.175829
rabbit   -0.578600  1.651437
rabbit    1.491390 -0.638902
rabbit    0.737369  1.490732

Series.sub减去列并获取绝对值,最后用minGroupBy.transformboolean indexing进行过滤:

s = base['A'].sub(marker['marker']).abs()
s2 = base.loc[s.groupby(level=0).transform('min').eq(s), 'A']
print (s2)
cat        -0.678886
dog        -1.085631
elephant   -0.935834
rabbit      0.737369
Name: A, dtype: float64

编辑:

df = base.join(marker)
df['marker'] = df['A'].sub(df['marker']).abs()
s2 = df.loc[df.groupby(level=0)['marker'].transform('min').eq(df['marker']) , 'A']
print (s2)
cat        -0.678886
dog        -1.085631
elephant   -0.935834
rabbit      0.737369
Name: A, dtype: float64

答案 1 :(得分:0)

对于数据框:

                 A         B    marker
cat      -1.364769 -0.723230  0.069315
cat      -1.141256 -0.124800  0.069315
cat      -1.658259 -0.881559  0.069315
dog      -0.277469 -0.621357 -1.389664
dog      -0.854505  0.282091 -1.389664
dog      -1.000602  0.171808 -1.389664
elephant -0.673019  0.202090 -0.735848
elephant  1.729002 -0.052014 -0.735848
elephant  3.083791  0.623577 -0.735848
rabbit   -0.946095  0.536181 -2.455088
rabbit    0.644441 -1.476657 -2.455088
rabbit    1.614225 -0.806389 -2.455088

...带代码的单行解决方案...

df.iloc[df.reset_index().groupby('index').apply(lambda g: abs(g.A - g.marker).idxmin())]

...给予...

                 A         B    marker
cat      -1.141256 -0.124800  0.069315
dog      -1.000602  0.171808 -1.389664
elephant -0.673019  0.202090 -0.735848
rabbit   -0.946095  0.536181 -2.455088