如果我有一个数据框,例如
id quantity date
1 2.0 6-12-18
1 3.0 6-20-18
1 3.0 6-22-18
1 1.0 5-12-18
2 5.0 6-10-18
2 1.0 6-15-18
2 1.0 6-11-18
3 4.0 7-10-18
3 4.0 7-15-18
3 4.0 7-16-18
我想找到与特定ID相关联的“数量”列值的偏差。
我当时想我可以聚合与特定ID相关联的数量值,并按日期对数量值进行排序,并从为每个ID创建的整数列表中消除重复项。我的想法是使用 df.groupby 以及 pd.series.unique
目标是看起来像这样:
id quantity date
1 1.0, 3.0, 3.0, 2.0 5-12-18, 6-12-18, 6-20-18, 6-22-18
2 5.0, 1.0, 1.0 6-10-18, 6-11-18, 6-15-18
3 4.0, 4.0, 4.0 7-10-18, 7-15-18, 7-16-18
然后我想在数据框中创建一个新列,其中将说明数量的值是增加,减少还是保持不变,因此看起来像这样:
id quantity trend
1 1.0, 3.0, 3.0, 2.0 inc, same, dec
2 5.0, 1.0, 1.0 dec, same
3 4.0, 4.0, 4.0 same
谢谢:)
答案 0 :(得分:1)
输入(df
)
id quality date
0 1 2.0 2018-06-12
1 1 3.0 2018-06-20
2 1 3.0 2018-06-22
3 1 1.0 2018-05-12
4 2 5.0 2018-06-10
5 2 1.0 2018-06-15
6 2 1.0 2018-06-11
7 3 4.0 2018-07-10
8 3 4.0 2018-07-15
9 3 4.0 2018-07-16
代码
# date column (lists)
df0 = df.groupby('id')['date'].apply(list).reset_index(drop=False)
# quality column (lists)
df1 = df.groupby('id')['quality'].apply(list).reset_index(drop=False)
# trend column (lists)
df['delta'] = df.quality.diff(1)
df.loc[df.delta > 0, 'trend'] = 'inc'
df.loc[df.delta == 0, 'trend'] = 'same'
df.loc[df.delta < 0, 'trend'] = 'dec'
df2 = df.groupby('id')['trend'].apply(list).apply(lambda x: x[1:]).reset_index(drop=False)
# merge all
df3 = pd.merge(df1, df0, on='id', how='left')
df3 = pd.merge(df3, df2, on='id', how='left')
# remove brackets
df3['quality'] = df3.quality.apply(lambda x: ", ".join(repr(e) for e in x))
df3['date'] = df3.date.apply(lambda x: ", ".join(x))
df3['trend'] = df3.trend.apply(lambda x: ", ".join(x))
输出(df3
)
id quality date trend
0 1 2.0, 3.0, 3.0, 1.0 6-12-18, 6-20-18, ... inc, same, dec
1 2 5.0, 1.0, 1.0 6-10-18, 6-15-18, ... dec, same
2 3 4.0, 4.0, 4.0 7-10-18, 7-15-18, ... same, same