在Python中为数据框中的特定ID汇总值

时间:2018-06-22 00:29:12

标签: python pandas dataframe group-by pandas-groupby

如果我有一个数据框,例如

id     quantity     date
 1     2.0          6-12-18
 1     3.0          6-20-18
 1     3.0          6-22-18
 1     1.0          5-12-18
 2     5.0          6-10-18
 2     1.0          6-15-18
 2     1.0          6-11-18
 3     4.0          7-10-18
 3     4.0          7-15-18
 3     4.0          7-16-18

我想找到与特定ID相关联的“数量”列值的偏差。

我当时想我可以聚合与特定ID相关联的数量值,并按日期对数量值进行排序,并从为每个ID创建的整数列表中消除重复项。我的想法是使用 df.groupby 以及 pd.series.unique

目标是看起来像这样:

id     quantity                 date
 1     1.0, 3.0, 3.0, 2.0       5-12-18, 6-12-18, 6-20-18, 6-22-18 
 2     5.0, 1.0, 1.0            6-10-18, 6-11-18, 6-15-18
 3     4.0, 4.0, 4.0            7-10-18, 7-15-18, 7-16-18       

然后我想在数据框中创建一个新列,其中将说明数量的值是增加,减少还是保持不变,因此看起来像这样:

id     quantity                 trend
 1     1.0, 3.0, 3.0, 2.0       inc, same, dec 
 2     5.0, 1.0, 1.0            dec, same
 3     4.0, 4.0, 4.0            same 

谢谢:)

1 个答案:

答案 0 :(得分:1)

输入(df

   id  quality       date
0   1      2.0 2018-06-12
1   1      3.0 2018-06-20
2   1      3.0 2018-06-22
3   1      1.0 2018-05-12
4   2      5.0 2018-06-10
5   2      1.0 2018-06-15
6   2      1.0 2018-06-11
7   3      4.0 2018-07-10
8   3      4.0 2018-07-15
9   3      4.0 2018-07-16

代码

# date column (lists)
df0 = df.groupby('id')['date'].apply(list).reset_index(drop=False)

# quality column (lists)
df1 = df.groupby('id')['quality'].apply(list).reset_index(drop=False)

# trend column (lists)
df['delta'] = df.quality.diff(1)
df.loc[df.delta > 0, 'trend'] = 'inc'
df.loc[df.delta == 0, 'trend'] = 'same'
df.loc[df.delta < 0, 'trend'] = 'dec'
df2 = df.groupby('id')['trend'].apply(list).apply(lambda x: x[1:]).reset_index(drop=False)

# merge all
df3 = pd.merge(df1, df0, on='id', how='left')
df3 = pd.merge(df3, df2, on='id', how='left')

# remove brackets
df3['quality'] = df3.quality.apply(lambda x: ", ".join(repr(e) for e in x))
df3['date'] = df3.date.apply(lambda x: ", ".join(x))
df3['trend'] = df3.trend.apply(lambda x: ", ".join(x))

输出(df3

    id  quality             date                    trend
0   1   2.0, 3.0, 3.0, 1.0  6-12-18, 6-20-18, ...   inc, same, dec
1   2   5.0, 1.0, 1.0       6-10-18, 6-15-18, ...   dec, same
2   3   4.0, 4.0, 4.0       7-10-18, 7-15-18, ...   same, same