我有dataframe
并希望减去上一行的两列,前提是前一行具有相同的Name
值。如果没有,那么我希望它产生NAN
并填充-
。我的groupby
表达式产生错误TypeError: 'Series' objects are mutable, thus they cannot be hashed
,这是非常模糊的。我错过了什么?
import pandas as pd
df = pd.DataFrame(data=[['Person A', 5, 8], ['Person A', 13, 11], ['Person B', 11, 32], ['Person B', 15, 20]], columns=['Names', 'Value', 'Value1'])
df['diff'] = df.groupby('Names').apply(df['Value'].shift(1) - df['Value1'].shift(1)).fillna('-')
print df
期望的输出:
Names Value Value1 diff
0 Person A 5 8 -
1 Person A 13 11 -3
2 Person B 11 32 -
3 Person B 15 20 -21
答案 0 :(得分:4)
您可以添加lambda x
并将df['Value']
更改为x['Value']
,与Value1
和reset_index
类似:
df['diff'] = df.groupby('Names')
.apply(lambda x: x['Value'].shift(1) - x['Value1'].shift(1))
.fillna('-')
.reset_index(drop=True)
print (df)
Names Value Value1 diff
0 Person A 5 8 -
1 Person A 13 11 -3
2 Person B 11 32 -
3 Person B 15 20 -21
DataFrameGroupBy.shift
的另一个解决方案:
df1 = df.groupby('Names')['Value','Value1'].shift()
print (df1)
Value Value1
0 NaN NaN
1 5.0 8.0
2 NaN NaN
3 11.0 32.0
df['diff'] = (df1.Value - df1.Value1).fillna('-')
print (df)
Names Value Value1 diff
0 Person A 5 8 -
1 Person A 13 11 -3
2 Person B 11 32 -
3 Person B 15 20 -21
答案 1 :(得分:1)
你也可以这样做:
import glob2
import os
bam_dirs = {os.path.dirname(p) for p in glob2.glob('/data2/**/*.bam')}
print bam_dirs