在此数据集中 df :
data = ['dog', 'cat', 'rabbit', 'elephant']
i = data*3
base = pd.DataFrame(np.random.randn(12, 2), index=i, columns=list('AB'))
marker = pd.DataFrame(np.random.randn(4,1), index=data, columns=['marker'])
df = base.join(marker)
如何获取最接近其标记的df ['A']列的行?
通过this link,但无法按每个唯一索引提取行。
答案 0 :(得分:1)
使用:
np.random.seed(123)
data = ['dog', 'cat', 'rabbit', 'elephant']
i = data*3
base = pd.DataFrame(np.random.randn(12, 2), index=i, columns=list('AB'))
marker = pd.DataFrame(np.random.randn(4,1), index=data, columns=['marker'])
print (base)
A B
dog -1.085631 0.997345
cat 0.282978 -1.506295
rabbit -0.578600 1.651437
elephant -2.426679 -0.428913
dog 1.265936 -0.866740
cat -0.678886 -0.094709
rabbit 1.491390 -0.638902
elephant -0.443982 -0.434351
dog 2.205930 2.186786
cat 1.004054 0.386186
rabbit 0.737369 1.490732
elephant -0.935834 1.175829
print (marker)
marker
dog -1.253881
cat -0.637752
rabbit 0.907105
elephant -1.428681
按DataFrame.sort_index
排序索引-原因是避免在上一次过滤中使用ValueError: cannot reindex from a duplicate axis
:
base = base.sort_index()
print (base)
A B
cat 0.282978 -1.506295
cat -0.678886 -0.094709
cat 1.004054 0.386186
dog -1.085631 0.997345
dog 1.265936 -0.866740
dog 2.205930 2.186786
elephant -2.426679 -0.428913
elephant -0.443982 -0.434351
elephant -0.935834 1.175829
rabbit -0.578600 1.651437
rabbit 1.491390 -0.638902
rabbit 0.737369 1.490732
用Series.sub
减去列并获取绝对值,最后用min
和GroupBy.transform
用boolean indexing
进行过滤:
s = base['A'].sub(marker['marker']).abs()
s2 = base.loc[s.groupby(level=0).transform('min').eq(s), 'A']
print (s2)
cat -0.678886
dog -1.085631
elephant -0.935834
rabbit 0.737369
Name: A, dtype: float64
编辑:
df = base.join(marker)
df['marker'] = df['A'].sub(df['marker']).abs()
s2 = df.loc[df.groupby(level=0)['marker'].transform('min').eq(df['marker']) , 'A']
print (s2)
cat -0.678886
dog -1.085631
elephant -0.935834
rabbit 0.737369
Name: A, dtype: float64
答案 1 :(得分:0)
对于数据框:
A B marker
cat -1.364769 -0.723230 0.069315
cat -1.141256 -0.124800 0.069315
cat -1.658259 -0.881559 0.069315
dog -0.277469 -0.621357 -1.389664
dog -0.854505 0.282091 -1.389664
dog -1.000602 0.171808 -1.389664
elephant -0.673019 0.202090 -0.735848
elephant 1.729002 -0.052014 -0.735848
elephant 3.083791 0.623577 -0.735848
rabbit -0.946095 0.536181 -2.455088
rabbit 0.644441 -1.476657 -2.455088
rabbit 1.614225 -0.806389 -2.455088
...带代码的单行解决方案...
df.iloc[df.reset_index().groupby('index').apply(lambda g: abs(g.A - g.marker).idxmin())]
...给予...
A B marker
cat -1.141256 -0.124800 0.069315
dog -1.000602 0.171808 -1.389664
elephant -0.673019 0.202090 -0.735848
rabbit -0.946095 0.536181 -2.455088