Question

import numpy as np
import pandas as pd

ind = [0, 1, 2]
cols = ['A','B','C']
df = pd.DataFrame(np.arange(9).reshape((3,3)),columns=cols)

假设您有一个pandas数据框df，如下所示：

如果要从cols中特定索引ind的每列中捕获单个元素，则输出应该看起来像一个系列：

 A  0
 B  4
 C  8

到目前为止我尝试的是：

 df.loc[ind,cols]

给出了不希望的输出：

有什么建议吗？

上下文：下一步是将一个数据帧的df.idxmax()调用的输出映射到具有相同列名和索引的另一个数据帧，但如果我知道如何进行上述转换，我可能会想到这一点。

Answer 1

您可以使用DataFrame.lookup()：

In [6]: pd.Series(df.lookup(df.index, df.columns), index=df.columns)
Out[6]:
A    0
B    4
C    8
dtype: int32

或：

In [14]: pd.Series(df.lookup(ind, cols), index=df.columns)
Out[14]:
A    0
B    4
C    8
dtype: int32

说明：

In [12]: df.lookup(df.index, df.columns)
Out[12]: array([0, 4, 8])

Answer 2

这里是一个带有NumPy advanced-indexing的矢量化图片，每列选择一个元素，给定每个列的行索引ind -

pd.Series(df.values[ind, np.arange(len(ind))], df.columns)

示例运行 -

In [107]: ind = [0, 2, 1] # different one than sample for variety
     ...: cols = ['A','B','C']
     ...: df = pd.DataFrame(np.arange(9).reshape((3,3)),columns=cols)
     ...: 

In [109]: df
Out[109]: 
   A  B  C
0  0  1  2
1  3  4  5
2  6  7  8

In [110]: pd.Series(df.values[ind, np.arange(len(ind))], df.columns)
Out[110]: 
A    0
B    7
C    5
dtype: int64

运行时测试

让我们将提议的方法与@ MaxU的解决方案中提出的大熊猫内置的矢量化lookup方法进行比较，因为我们看到了矢量化的方法有多好，让我们来看看。有更多的cols -

In [111]: ncols = 10000
     ...: df = pd.DataFrame(np.random.randint(0,9,(100,ncols)))
     ...: ind = np.random.randint(0,100,(ncols)).tolist()
     ...: 

# @MaxU's solution
In [112]: %timeit pd.Series(df.lookup(ind, df.columns), index=df.columns)
1000 loops, best of 3: 718 µs per loop

# Proposed in this post    
In [113]: %timeit pd.Series(df.values[ind, np.arange(len(ind))], df.columns)
1000 loops, best of 3: 410 µs per loop

In [114]: ncols = 100000
     ...: df = pd.DataFrame(np.random.randint(0,9,(100,ncols)))
     ...: ind = np.random.randint(0,100,(ncols)).tolist()
     ...: 

# @MaxU's solution
In [115]: %timeit pd.Series(df.lookup(ind, df.columns), index=df.columns)
100 loops, best of 3: 8.83 ms per loop

# Proposed in this post
In [116]: %timeit pd.Series(df.values[ind, np.arange(len(ind))], df.columns)
100 loops, best of 3: 5.76 ms per loop

Answer 3

如果你喜欢使用.loc

，还有另一种方法可以使用mutiIndex

df1=df.reset_index().melt('index').set_index(['index','variable'])
df1.loc[list(zip(df.index,df.columns))]
Out[118]: 
                value
index variable       
0     A             0
1     B             4
2     C             8

Answer 4

您可以压缩要为其检索值的列和索引值，然后从中创建一个系列：

pd.Series([df.loc[id_, col] for id_, col in zip(ind, cols)], df.columns)
A    0
B    4
C    8

或者如果你总是只需要对角线值：

pd.Series(np.diag(df), df.columns)

会快得多

Answer 5

应该有更直接的方式，但这是我能想到的，

val = [df.iloc[i,i] for i in df.index]
pd.Series(val, index = df.columns)

A    0
B    4
C    8
dtype: int64

使用列表从pandas数据框中选择单个值

5 个答案: