大熊猫识别符号变化时间戳并计算组总和

时间:2015-10-19 16:57:11

标签: python pandas apply

我有一个数据框,我正在逐行操作,我目前正在使用iterrows(),我知道它很慢,而宁愿使用apply()。但是,我不确定如何应用(如果可能的话)。

边缘'数据:

time    raw_signal  amp_change  edge    edge_dir
2.73105 499.878 -22.583 TRUE    decr
2.7311  477.295 -24.414 TRUE    decr
2.73115 452.881 -25.025 TRUE    decr
2.7312  427.856 -21.362 TRUE    decr
2.7315  412.598 28.076  TRUE    incr
2.73155 440.674 25.024  TRUE    incr
8.5267  490.112 -24.414 TRUE    decr
8.52675 465.698 -30.517 TRUE    decr
8.5268  435.181 -25.635 TRUE    decr
8.70805 413.208 21.362  TRUE    incr
8.7081  434.57  24.414  TRUE    incr
10.7113 487.671 -20.752 TRUE    decr
10.71135    466.919 -34.79  TRUE    decr
10.7114 432.129 -37.842 TRUE    decr
10.71145    394.287 -24.414 TRUE    decr
10.9586 367.432 25.634  TRUE    incr
10.95865    393.066 34.79   TRUE    incr
10.9587 427.856 32.349  TRUE    incr
10.95875    460.205 20.142  TRUE    incr
12.35745    477.295 -23.193 TRUE    decr

应用于每一行的功能

start = None
dir = None
sum_amp = 0
for index, row in edges.iterrows():

    # this will collapse the multiple incr/decr together by taking only the first one seen
    # the others will get their edge set to False
    # it also assumes that the distance been multiple incr/decr is less than some threshold
    if start == None:
        start = index
        dir = row.edge_dir
        sum_amp = row.amp_change
    else:
        if row.edge_dir == dir and abs(start - index) < 0.01:
            edges.loc[index,'edge'] = False
            sum_amp += row.amp_change # sum amp increase so we can get an overall for this edge
        else:
            edges.loc[start,'amp_change'] = sum_amp
            sum_amp = row.amp_change
            start = index
            dir = row.edge_dir

应该产生

time    raw_signal  amp_change  edge    edge_dir
2.73105 499.878 -93.384 TRUE    decr
2.7311  477.295 -24.414 FALSE   decr
2.73115 452.881 -25.025 FALSE   decr
2.7312  427.856 -21.362 FALSE   decr
2.7315  412.598 53.1    TRUE    incr
2.73155 440.674 25.024  FALSE   incr
8.5267  490.112 -80.566 TRUE    decr
8.52675 465.698 -30.517 FALSE   decr
8.5268  435.181 -25.635 FALSE   decr
8.70805 413.208 45.776  TRUE    incr
8.7081  434.57  24.414  FALSE   incr
10.7113 487.671 -117.798    TRUE    decr
10.71135    466.919 -34.79  FALSE   decr
10.7114 432.129 -37.842 FALSE   decr
10.71145    394.287 -24.414 FALSE   decr
10.9586 367.432 112.915 TRUE    incr
10.95865    393.066 34.79   FALSE   incr
10.9587 427.856 32.349  FALSE   incr
10.95875    460.205 20.142  FALSE   incr
12.35745    477.295 -23.193 TRUE    decr

1 个答案:

答案 0 :(得分:2)

这个oneliner怎么样:

In [16]:

df['New_amp_change'] = np.hstack((np.diff(~(np.sign(df.amp_change.shift(1))<0)), True))

In [40]:

df.ix[df.New_amp_change,'amp_change'] = df.groupby(df.New_amp_change.cumsum()).amp_change.sum().values
In [42]:

print df
        time  raw_signal  amp_change  edge edge_dir New_amp_change
0    2.73105     499.878     -93.384  True     decr           True
1    2.73110     477.295     -24.414  True     decr          False
2    2.73115     452.881     -25.025  True     decr          False
3    2.73120     427.856     -21.362  True     decr          False
4    2.73150     412.598      53.100  True     incr           True
5    2.73155     440.674      25.024  True     incr          False
6    8.52670     490.112     -80.566  True     decr           True
7    8.52675     465.698     -30.517  True     decr          False
8    8.52680     435.181     -25.635  True     decr          False
9    8.70805     413.208      45.776  True     incr           True
10   8.70810     434.570      24.414  True     incr          False
11  10.71130     487.671    -117.798  True     decr           True
12  10.71135     466.919     -34.790  True     decr          False
13  10.71140     432.129     -37.842  True     decr          False
14  10.71145     394.287     -24.414  True     decr          False
15  10.95860     367.432     112.915  True     incr           True
16  10.95865     393.066      34.790  True     incr          False
17  10.95870     427.856      32.349  True     incr          False
18  10.95875     460.205      20.142  True     incr          False
19  12.35745     477.295     -23.193  True     decr           True

1,将amp_change移动一个位置(shift(1)

2,检查标志,返回True为负数

3,检查标志是否已更改(np.diff()

4,在最后填充Truenp.diff()返回更短的向量1元素)

5,groupby使用新创建的New_amp_change列

获取组总和

6,将组总和分配回原始数据框中的符号更改行(边?)。