Question

在下面的代码中，我要迭代groupby对象的组并在列中打印第一项每个组中的b。

import pandas as pd

d = {
    'a': [1, 2, 3, 4, 5, 6],
    'b': [10, 20, 30, 10, 20, 30],
}

df = pd.DataFrame(d)
groups = df.groupby('b')

for name, group in groups:
    first_item_in_b = group['b'].tolist()[0]
    print(first_item_in_b)

由于groupby具有层次结构索引，为了选择b中的第一个元素，我需要将b转换为第一个列表。

如何避免此类开销？

我不能像这样删除tolist()：

first_item_in_b = group['b'][0]

因为它将给出KeyError。

Answer 1

您可以使用Index.get_loc来获取列b的位置，因此可以仅使用iat或iloc或通过列名的索引的第一个值使用{{3 }}。

或者可以在按列标签b选择之后按Series.at或Series.iat按位置选择：

for name, group in groups:
    #first value by positions from columns names
    first_item_in_b = group.iat[0, group.columns.get_loc('b')]
    #first value by labels from index
    first_item_in_b = group.at[group.index[0],'b']

    #fast select first value
    first_item_in_b = group['b'].iat[0]
    #alternative
    first_item_in_b = group['b'].iloc[0]
    print(first_item_in_b)

10
20
30

Answer 2

使用iloc：

import pandas as pd

d = {
    'a': [1, 2, 3, 4, 5, 6],
    'b': [10, 20, 30, 10, 20, 30],
}

df = pd.DataFrame(d)
groups = df.groupby('b')

for name, group in groups:
    first_item_in_b = group['b'].iloc[0]
    print(first_item_in_b)

输出：

10
20
30

编辑：

或使用Fast integer location scalar accessor.

按索引选择groupby对象的group的第一个元素，而不转换为list

2 个答案: