Question

我有一个pandas数据框，我正在使用groupby（）函数按照我想要的方式对事物进行分组，除了pandas跳过重复的值，而只显示唯一的值。

这是一个示例数据框

data = [ 
    ['American Mathematical Society', 'Journal', 2, 'Mathematics & Statistics'],
    ['American Mathematical Society', 'Journal', 2, 'Mathematics & Statistics'],
    ['American Mathematical Society', 'Journal', 38, 'Mathematics & Statistics'],
    ['American Mathematical Society', 'Journal', 4, 'Mathematics & Statistics']]

df = pd.DataFrame(data, columns = ['Provider', 'Type', 'Downloads JR1 2017', 'Field'])

现在，我可以使用groupby函数以列表中的方式对它们进行分组。

jr1_provider = df.groupby(['Provider', 'Field', 'Downloads JR1 2017'], as_index=False).sum().values.tolist()

以下是输出：

[['American Mathematical Society', 'Mathematics & Statistics', 2, 'JournalJournal'], ['American Mathematical Society', 'Mathematics & Statistics', 4, 'Journal'], ['American Mathematical Society', 'Mathematics & Statistics', 38, 'Journal']]

但是，输出中应该有4个项目。相反，我只有3个。我看到结果中已删除重复的值，因为其中两行在“ Downloads JR1 2017”列中的值为“ 2”。

为什么？我如何获得所有结果？

我要获得的输出将是“提供程序”的名称，并加上“ Downloads JR1 2017”。示例：

['American Mathematical Society', 46]

Answer 1

因此您可以检查transform

jr1_provider = provider_subset.groupby(['Provider', 'Field', 'Downloads JR1 2017'], as_index=False).transform('sum').values.tolist()

Answer 2

根据您在评论中的其他详细信息，

df.groupby(['Provider', 'Field'], as_index=False).sum()

Python Pandas-groupby（）跳过Dataframe中的重复值

2 个答案: