我遇到了这个问题,它与我要尝试的事情非常相似: python pandas groupby calculate change
唯一的问题是我的数据框要复杂得多,因为它还有一堆我还想计算差异的值列,还有一些我需要保留的字符串类型的列,但是我显然可以不能计算出它们的数值差。
Group | Date | Value | Leader | Quantity
A 01-02-2016 16.0 John 1
A 01-03-2016 15.0 John 1
B 01-02-2016 16.0 Phillip 1
B 01-03-2016 13.0 Phillip 1
C 01-02-2016 16.0 Bob 1
C 01-03-2016 16.0 Bob 1
是否有一种方法可以更改代码,以便使差异仅适用于浮点型值,而不必通过使用loc / iloc来指定哪些列是浮点型的?所以我会得到这样的东西:
Date Group Change in Value Leader Change in Quantity
2016-01-02 A NaN John NaN
2016-01-03 A -0.062500 John 0
2016-01-02 B NaN Phillip NaN
2016-01-03 B -0.187500 Phillip 0
2016-01-02 C NaN Bob NaN
2016-01-03 C 0.000000 Bob 0
此外,是否可以将pct_change更改为diff?所以理想情况下,我会得到这样的东西:
Date Group Leader Change in Value Change in Quantity
2016-01-02 A John NaN NaN
2016-01-03 A John -1.0 0.0
2016-01-02 B Phillip NaN NaN
2016-01-03 B Phillip -3.0 0.0
2016-01-02 C Bob NaN NaN
2016-01-03 C Bob 0.0 0.0
有关我的实际数据集的更多详细信息:
谢谢!
答案 0 :(得分:2)
使用select_dtypes
和join
df1 = df.select_dtypes('number')
df_final = df.drop(df1.columns, 1).join(df1.groupby(df['Group'])
.pct_change().add_prefix('Change_in_'))
Out[10]:
Group Date Leader Change_in_Value Change_in_Quantity
0 A 01-02-2016 John NaN NaN
1 A 01-03-2016 John -0.0625 0.0
2 B 01-02-2016 Phillip NaN NaN
3 B 01-03-2016 Phillip -0.1875 0.0
4 C 01-02-2016 Bob NaN NaN
5 C 01-03-2016 Bob 0.0000 0.0
使用diff
。只需将pct_change
替换为diff
df1 = df.select_dtypes('number')
df_final = df.drop(df1.columns, 1).join(df1.groupby(df['Group'])
.diff().add_prefix('Change_in_'))
Out[15]:
Group Date Leader Change_in_Value Change_in_Quantity
0 A 01-02-2016 John NaN NaN
1 A 01-03-2016 John -1.0 0.0
2 B 01-02-2016 Phillip NaN NaN
3 B 01-03-2016 Phillip -3.0 0.0
4 C 01-02-2016 Bob NaN NaN
5 C 01-03-2016 Bob 0.0 0.0
答案 1 :(得分:0)
您可以这样做:
cols = []
for col in df3.columns:
if str(col).startswith('Value'):
cols.append(col)
for i in range(len(cols)-1):
df["Change " + i] = (df["Value " + i] - df["Value " + i].shift(-1)) / df["Value " + i]