在Pandas中,如何连续计算连续的正面和负面?

时间:2016-08-06 15:36:05

标签: python pandas numpy

在python pandas或numpy中,是否有内置函数或函数组合可以计算一行中正值或负值的数量?

这可以被认为类似于连续黑色或红色数量的轮盘赌。

输入系列数据示例:

Date
2000-01-07    -3.550049
2000-01-10    28.609863
2000-01-11    -2.189941
2000-01-12     4.419922
2000-01-13    17.690185
2000-01-14    41.219971
2000-01-18     0.000000
2000-01-19   -16.330078
2000-01-20     7.950195
2000-01-21     0.000000
2000-01-24    38.370117
2000-01-25     6.060059
2000-01-26     3.579834
2000-01-27     7.669922
2000-01-28     2.739991
2000-01-31    -8.039795
2000-02-01    10.239990
2000-02-02    -1.580078
2000-02-03     1.669922
2000-02-04     7.440186
2000-02-07    -0.940185

期望的输出:

-  in a row 5 times
+  in a row 4 times
++  in a row once
++++  in a row once
+++++++ in a row once

3 个答案:

答案 0 :(得分:1)

您可以使用itertools.groupby()功能。

import itertools

l = [-3.550049, 28.609863, -2.189941,  4.419922, 17.690185, 41.219971,  0.000000, -16.330078,  7.950195,  0.000000, 38.370117,  6.060059,  3.579834,  7.669922,  2.739991, -8.039795, 10.239990, -1.580078,  1.669922,  7.440186, -0.940185]

r_pos = {}
r_neg = {}
for k, v in itertools.groupby(l, lambda e:e>0):
    count = len(list(v))
    r = r_pos
    if k == False:
        r = r_neg
    if count not in r.keys():
        r[count] = 0
    r[count] += 1

for k, v in r_neg.items():
    print '%s in a row %s time(s)' % ('-'*k, v)

for k, v in r_pos.items():
    print '%s in a row %s time(s)' % ('+'*k, v)

输出

- in a row 6 time(s)
+ in a row 2 time(s)
++ in a row 1 time(s)
++++ in a row 1 time(s)
+++++++ in a row 1 time(s)

取决于您认为的正值,您可以更改第lambda e:e>0

答案 1 :(得分:1)

Nonnegatives:

from functools import reduce  # For Python 3.x
ser = df['x'] >= 0
c = ser.expanding().apply(lambda r: reduce(lambda x, y: x + 1 if y else x * y, r))
c[ser & (ser != ser.shift(-1))].value_counts()
Out: 
1.0    2
7.0    1
4.0    1
2.0    1
Name: x, dtype: int64

否定:

ser = df['x'] < 0
c = ser.expanding().apply(lambda r: reduce(lambda x, y: x + 1 if y else x * y, r))
c[ser & (ser != ser.shift(-1))].value_counts()

Out: 
1.0    6
Name: x, dtype: int64

基本上,它创建一个布尔系列获取转折点之间的累积计数(当符号改变时,它重新开始)。例如,对于非负数,c是:

Out: 
0     0.0
1     1.0  # turning point
2     0.0
3     1.0
4     2.0
5     3.0
6     4.0  # turning point
7     0.0
8     1.0
9     2.0
10    3.0
11    4.0
12    5.0
13    6.0
14    7.0  # turning point
15    0.0
16    1.0  # turning point
17    0.0
18    1.0
19    2.0  # turning point
20    0.0
Name: x, dtype: float64

现在,为了识别转折点,条件是当前值与下一个不同,并且它是真。如果选择这些,则有计数。

答案 2 :(得分:1)

到目前为止,这是我提出的,它可以工作并输出一个计数,表示连续出现负值,正值和零值的次数。也许有人可以使用上面的ayhan和Ghilas发布的一些建议使其更简洁。

from collections import Counter

ser = [-3.550049, 28.609863, -2.1, 89941,4.419922,17.690185,41.219971,0.000000,-16.330078,7.950195,0.000000,38.370117,6.060059,3.579834,7.669922,2.739991,-8.039795,10.239990,-1.580078, 1.669922, 7.440186,-0.940185]

c = 0
zeros, neg_counts, pos_counts = [], [], []
for i in range(len(ser)):
    c+=1
    s = np.sign(ser[i])
    try:
        if s != np.sign(ser[i+1]):
            if s == 0:
               zeros.append(c)
            elif s == -1:
                neg_counts.append(c)
            elif s == 1:
                pos_counts.append(c)
            c = 0
    except IndexError:
        pos_counts.append(c) if s == 1 else neg_counts.append(c) if s ==-1 else zeros.append(c)

print(Counter(neg_counts))
print(Counter(pos_counts))
print(Counter(zeros))

出:

Counter({1: 5})
Counter({1: 3, 2: 1, 4: 1, 5: 1})
Counter({1: 2})