Best way to retrieve many aggregate counts from a dataframe?

时间:2019-03-17 22:33:44

标签: python pandas report

I have a dataframe that I need to retrieve many metrics from. Dataframe columns are the following:

Consumer_ID|Client|Campaign|Date

I am trying to get the unique count of the consumer_ID column for various combinations of the Client, Campaign, and Date columns. So far I have come up with two solutions:

  1. Groupby statements with count as the agg function for every combination of client, campaign, and date.
  2. Writing for loops and filtering on every combination of the client, campaign and date columns and then using the nunique() function to get the final count.

My question: is there a cleaner more Pythonic way of getting the unique count of one column for all available combinations of other columns?

Example (annoying) solution using groupbys: Yes understood, but is there a more pythonic way to get every combination of the groupby columns? For example, right now to get all combinations I'd have to write:

df.groupby(['Client']).Consumer_ID.nunique()
df.groupby(['Client', 'Campaign']).Consumer_ID.nunique()
df.groupby(['Client', 'Campaign', 'Date']).Consumer_ID.nunique()
df.groupby(['Client', 'Date'].Consumer_ID.nunique()

4 个答案:

答案 0 :(得分:1)

If I understand correctly:

df.groupby(df.columns.drop(Consumer_ID).tolist(), as_index=False).nunique()

答案 1 :(得分:0)

I believe what you're looking for is:

df.groupby(['Client', 'Campaign', 'Date']).Consumer_ID.nunique()

答案 2 :(得分:0)

您可以使用数据透视表,如下所示:

将熊猫作为pd导入 pd.pivot_table(df,index = ['Client','Campaign','Date'],values ='Consumer_ID',aggfunc = pd.Series.nunique)

答案 3 :(得分:0)

回答了我自己的问题。我使用itertools组合创建了所有可能的列组合,然后将其用于完成所有groupby聚合。下面的示例代码:

from itertools import combinations
cols = df.columns
combinations = [j for i in range(len(cols)) for j in combinations(cols, i+1)]

然后,我可以使用“组合”列表中列的不同组合来完成所有groupby聚合,而不必多次编写groupby语句。

谢谢!