基于多列的行之间的DataFrame差异

时间:2018-04-11 04:44:35

标签: python-3.x pandas dataframe

我正在尝试根据多列计算行之间的差异。数据集非常大,我正在粘贴描述问题的虚拟数据:

enter image description here

如果我想计算宠物+名字等级的每日体重差异。到目前为止,我只提出了连接这些列并基于新列和日期列创建多索引的解决方案。但我认为应该有更好的方法。在真实数据集中,我有超过3列我正在使用计算行差异。

df['pet_name']=df.pet + df.name

df.set_index(['pet_name','date'],inplace = True)
df.sort_index(inplace=True)

df['diffs']=np.nan

for idx in t.index.levels[0]:
    df.diffs[idx] = df.weight[idx].diff()

2 个答案:

答案 0 :(得分:2)

根据您的描述,您可以尝试groupby

node_modules

答案 1 :(得分:2)

使用groupby 2列:

df.groupby(['pet', 'name'])['weight'].diff()

所有在一起:

#convert dates to datetimes
df['date'] = pd.to_datetime(df['date'])
#sorting
df = df.sort_values(['pet', 'name','date'])
#get differences per groups
df['diffs'] = df.groupby(['pet', 'name', 'date'])['weight'].diff()

<强>示例

np.random.seed(123)

N = 100
L = list('abc')
df = pd.DataFrame({'pet': np.random.choice(L, N),
                   'name': np.random.choice(L, N),
                   'date': pd.Series(pd.date_range('2015-01-01', periods=int(N/10)))
                              .sample(N, replace=True),
                   'weight':np.random.rand(N)})


df['date'] = pd.to_datetime(df['date'])
df = df.sort_values(['pet', 'name','date'])
df['diffs'] = df.groupby(['pet', 'name', 'date'])['weight'].diff()

df['pet_name'] = df.pet + df.name
df = df.sort_values(['pet_name','date'])
df['diffs1'] = df.groupby(['pet_name', 'date'])['weight'].diff()
print (df.head(20))
        date name pet    weight     diffs pet_name    diffs1
1 2015-01-02    a   a  0.105446       NaN       aa       NaN
2 2015-01-03    a   a  0.845533       NaN       aa       NaN
2 2015-01-03    a   a  0.980582  0.135049       aa  0.135049
2 2015-01-03    a   a  0.443368 -0.537214       aa -0.537214
3 2015-01-04    a   a  0.375186       NaN       aa       NaN
6 2015-01-07    a   a  0.715601       NaN       aa       NaN
7 2015-01-08    a   a  0.047340       NaN       aa       NaN
9 2015-01-10    a   a  0.236600       NaN       aa       NaN
0 2015-01-01    b   a  0.777162       NaN       ab       NaN
2 2015-01-03    b   a  0.871683       NaN       ab       NaN
3 2015-01-04    b   a  0.988329       NaN       ab       NaN
4 2015-01-05    b   a  0.918397       NaN       ab       NaN
4 2015-01-05    b   a  0.016119 -0.902279       ab -0.902279
5 2015-01-06    b   a  0.095530       NaN       ab       NaN
5 2015-01-06    b   a  0.894978  0.799449       ab  0.799449
5 2015-01-06    b   a  0.365719 -0.529259       ab -0.529259
5 2015-01-06    b   a  0.887593  0.521874       ab  0.521874
7 2015-01-08    b   a  0.792299       NaN       ab       NaN
7 2015-01-08    b   a  0.313669 -0.478630       ab -0.478630
7 2015-01-08    b   a  0.281235 -0.032434       ab -0.032434