Question

我试图通过条件选择从Pandas DataFrame中返回一个特定的项目（并且不希望必须引用索引来执行此操作）。

以下是一个例子：

我有以下数据框：

objectMapper.getDeserializationConfig().registerValueProcessor(Foo.class, Foo::bar);

我输入以下代码来搜索蓝莓的代码：

  Code  Colour  Fruit
0   1   red     apple
1   2   orange  orange
2   3   yellow  banana
3   4   green   pear
4   5   blue    blueberry

返回：

df[df['Fruit'] == 'blueberry']['Code']

类型：

4    5
Name: Code, dtype: int64

但我真正想要返回的是类型的数字5：

pandas.core.series.Series

如果我输入以下代码，我可以这样做：

numpy.int64

即。引用索引给出数字5，但我不想引用索引！

我可以在这里部署另一种语法来实现同样的目的吗？

谢谢！...

更新

另一个想法是这段代码：

df[df['Fruit'] == 'blueberry']['Code'][4]

然而，这似乎并不特别优雅（它引用了索引）。是否有一个更简洁和精确的方法，不需要引用索引或这是严格必要的？

谢谢！......

Answer 1

我们试试这个：

df.loc[df['Fruit'] == 'blueberry','Code'].values[0]

输出：

首先，使用.loc访问数据框中的值，使用行选择的布尔索引和列选择的索引标签。将系列返回到值数组的转换，因为该数组中只有一个值，您可以使用索引'[0]'从该单个元素数组中获取标量值。

Answer 2

引用索引是一项要求（除非您使用next() ^），因为pd.Series不能保证只有一个值。

您可以使用pd.Series.values将值提取为数组。如果您有多个匹配项，这也有效：

res = df.loc[df['Fruit'] == 'blueberry', 'Code'].values

# array([5], dtype=int64)

df2 = pd.concat([df]*5)
res = df2.loc[df2['Fruit'] == 'blueberry', 'Code'].values

# array([5, 5, 5, 5, 5], dtype=int64)

要从numpy数组中获取列表，可以使用.tolist()：

res = df.loc[df['Fruit'] == 'blueberry', 'Code'].values.tolist()

阵列和列表版本都可以直观地索引，例如第一项res[0]。

^如果你真的反对使用索引，你可以使用next()进行迭代：

next(iter(res))

Answer 3

您还可以将“水果”列设置为“ Ann”索引

df_fruit_index = df.set_index('Fruit')

并根据您选择的水果从“代码”列中提取值

df_fruit_index.loc['blueberry','Code']

如何从Pandas DataFrame而不是Series（不引用索引）中提取值？

3 个答案: