我有一个数据框:
df = pd.DataFrame.from_items([('A', [10, 'foo']), ('B', [440, 'foo']), ('C', [790, 'bar']), ('D', [800, 'bar']), ('E', [7000, 'foo'])], orient='index', columns=['position', 'foobar'])
如下所示:
position foobar
A 10 foo
B 440 foo
C 790 bar
D 800 bar
E 7000 foo
我想知道每个位置与foobar
列中具有相反值的下一个位置之间的差异。通常我会使用shift
方法向下移动position
列:
df[comparisonCol].shift(-1) - df[comparisonCol]
但是当我使用foobar
列来确定哪个位置适用时,我不知道该怎么做。
结果如下:
position foobar difference
A 10 foo 780
B 440 foo 350
C 790 bar 6210
D 800 bar 6200
E 7000 foo NaN
答案 0 :(得分:2)
我认为如果foobar
中的唯一值只有2,那么您需要a
系列中的组之间可能会发生变化:
#identify consecutive groups
a = df['foobar'].ne(df['foobar'].shift()).cumsum()
print (a)
A 1
B 1
C 2
D 2
E 3
Name: foobar, dtype: int32
#get first value by a of position column
b = df.groupby(a)['position'].first()
print (b)
foobar
1 10
2 790
3 7000
Name: position, dtype: int64
#subtract mapped value, but for next group is added 1 to a Series
df['difference'] = a.add(1).map(b) - df['position']
print (df)
position foobar difference
A 10 foo 780.0
B 440 foo 350.0
C 790 bar 6210.0
D 800 bar 6200.0
E 7000 foo NaN
详情:
print (a.add(1).map(b))
A 790.0
B 790.0
C 7000.0
D 7000.0
E NaN
Name: foobar, dtype: float64