Pandas:计算当前列值与下一列值之间的差异值,具体取决于它是否满足不同列的条件

时间:2017-12-06 09:39:58

标签: python-3.x pandas dataframe

我有一个数据框:

df = pd.DataFrame.from_items([('A', [10, 'foo']), ('B', [440, 'foo']), ('C', [790, 'bar']), ('D', [800, 'bar']), ('E', [7000, 'foo'])], orient='index', columns=['position', 'foobar'])

如下所示:

    position foobar
A   10       foo
B   440      foo
C   790      bar
D   800      bar
E   7000     foo

我想知道每个位置与foobar列中具有相反值的下一个位置之间的差异。通常我会使用shift方法向下移动position列:

df[comparisonCol].shift(-1) - df[comparisonCol]

但是当我使用foobar列来确定哪个位置适用时,我不知道该怎么做。

结果如下:

    position foobar difference
A   10       foo      780
B   440      foo      350
C   790      bar      6210
D   800      bar      6200
E   7000     foo      NaN

1 个答案:

答案 0 :(得分:2)

我认为如果foobar中的唯一值只有2,那么您需要a系列中的组之间可能会发生变化:

#identify consecutive groups
a = df['foobar'].ne(df['foobar'].shift()).cumsum()
print (a)
A    1
B    1
C    2
D    2
E    3
Name: foobar, dtype: int32

#get first value by a of position column
b = df.groupby(a)['position'].first()
print (b)
foobar
1      10
2     790
3    7000
Name: position, dtype: int64

#subtract mapped value, but for next group is added 1 to a Series
df['difference'] = a.add(1).map(b) - df['position']
print (df)
   position foobar  difference
A        10    foo       780.0
B       440    foo       350.0
C       790    bar      6210.0
D       800    bar      6200.0
E      7000    foo         NaN

详情:

print (a.add(1).map(b))
A     790.0
B     790.0
C    7000.0
D    7000.0
E       NaN
Name: foobar, dtype: float64