Question

我正在尝试使用索引从熊猫数据框中提取数据。我遇到了一个我不知道如何解决的问题。我的两个行的名称以完全相同的方式命名，但每行都有自己的数据。下面是我的代码行，可以看出我在两次中放了一行名为basic的行（因为它们有不同的数据帧），但每一行都有自己的值：

i_s = i_s.loc[['Revenue','Cost of Revenue', 'Gross profit', 'Operating expenses', 'Total operating expenses', 'Operating income', 'Net income', 'Earnings per share', 'Basic', 'Weighted average shares outstanding', 'Basic', 'EBITDA']]

当我尝试使用上面的代码进行索引时，带有标记为basic的行的数据框出现四次（总共两个数据帧）。它选择名为basic的每一行，并将两个数据帧放在我在.loc中调用basic的地方：

Revenue                                       0.0  1.059400e+10  9.789000e+09   
Cost of Revenue                               NaN           NaN           NaN   
Gross profit                                  2.0  6.420000e+09  5.691000e+09   
Operating expenses                            3.0  4.989000e+09  4.924000e+09   
Total operating expenses                      3.0  4.989000e+09  4.924000e+09   
Operating income                              8.0  1.431000e+09  7.670000e+08   
Net income                                   14.0  7.370000e+08  2.890000e+08   
Earnings per share                           16.0           NaN           NaN   
Basic                                        17.0  1.400000e+00  6.200000e-01   
Basic                                        20.0  5.254150e+08  5.145740e+08   
Weighted average shares outstanding          19.0           NaN           NaN   
Basic                                        17.0  1.400000e+00  6.200000e-01   
Basic                                        20.0  5.254150e+08  5.145740e+08   
EBITDA                                       22.0  1.838000e+09  1.150000e+09

例如：

    A    B   C
   foo   0   10
   foo   1   11
   foo   1   12
   foo   1   13
   foo   1   14

我想使用.loc通过其标签'foo'调出列'A'，但是如果我使用df.loc ['foo']，它将拉出所有3.我希望输出到只显示其中一些而不是全部，如下所示：

    A    B   C
   foo   1   12
   foo   1   14

有谁知道如何解决这个问题？如何使用.loc？

选择与另一行同名的行

Answer 1

您可以使用.iloc仅获取所需的索引。查看文档，详细了解how to select data in pandas和the iloc attribute。

e.g。

df = pd.DataFrame(np.arange(6).reshape(2, 3), columns=['a', 'a', 'b'])

选择标有“a”的两列中的第一列：

df.iloc[:, [0, 2]]

返回

   a  b
0  0  2
1  3  5

Answer 2

虽然不是我想要的最神奇的方式，但下面应该有效。

假设您要提取具有索引名称的行：

rows = ['Revenue','Cost of Revenue', 'Gross profit', 'Operating expenses',
        'Total operating expenses', 'Operating income', 'Net income', 
        'Earnings per share', 'Basic', 'Weighted average shares outstanding', 
        'Basic', 'EBITDA']


# get the location index(es) for the rows you need in dict
loc_dict = {e: df.index.get_loc(e) for e in set(rows)}

# convert the row index(es) to a list
loc_dict = {k:[v] if isinstance(v,int) else np.where(v)[0].tolist() for k,v in loc_dict.items()}

# extract all the rows you need using iloc.
df.iloc[[loc_dict.get(e).pop(0) for e in rows]]

注意虽然您在此处使用iloc，但它会保留您在行列表中指定的行的顺序。

如何使用.loc

2 个答案: