在python pandas或numpy中,是否有内置函数或函数组合可以计算一行中正值或负值的数量?
这可以被认为类似于连续黑色或红色数量的轮盘赌。
输入系列数据示例:
Date
2000-01-07 -3.550049
2000-01-10 28.609863
2000-01-11 -2.189941
2000-01-12 4.419922
2000-01-13 17.690185
2000-01-14 41.219971
2000-01-18 0.000000
2000-01-19 -16.330078
2000-01-20 7.950195
2000-01-21 0.000000
2000-01-24 38.370117
2000-01-25 6.060059
2000-01-26 3.579834
2000-01-27 7.669922
2000-01-28 2.739991
2000-01-31 -8.039795
2000-02-01 10.239990
2000-02-02 -1.580078
2000-02-03 1.669922
2000-02-04 7.440186
2000-02-07 -0.940185
期望的输出:
- in a row 5 times
+ in a row 4 times
++ in a row once
++++ in a row once
+++++++ in a row once
答案 0 :(得分:1)
您可以使用itertools.groupby()功能。
import itertools
l = [-3.550049, 28.609863, -2.189941, 4.419922, 17.690185, 41.219971, 0.000000, -16.330078, 7.950195, 0.000000, 38.370117, 6.060059, 3.579834, 7.669922, 2.739991, -8.039795, 10.239990, -1.580078, 1.669922, 7.440186, -0.940185]
r_pos = {}
r_neg = {}
for k, v in itertools.groupby(l, lambda e:e>0):
count = len(list(v))
r = r_pos
if k == False:
r = r_neg
if count not in r.keys():
r[count] = 0
r[count] += 1
for k, v in r_neg.items():
print '%s in a row %s time(s)' % ('-'*k, v)
for k, v in r_pos.items():
print '%s in a row %s time(s)' % ('+'*k, v)
输出
- in a row 6 time(s)
+ in a row 2 time(s)
++ in a row 1 time(s)
++++ in a row 1 time(s)
+++++++ in a row 1 time(s)
取决于您认为的正值,您可以更改第lambda e:e>0
行
答案 1 :(得分:1)
Nonnegatives:
from functools import reduce # For Python 3.x
ser = df['x'] >= 0
c = ser.expanding().apply(lambda r: reduce(lambda x, y: x + 1 if y else x * y, r))
c[ser & (ser != ser.shift(-1))].value_counts()
Out:
1.0 2
7.0 1
4.0 1
2.0 1
Name: x, dtype: int64
否定:
ser = df['x'] < 0
c = ser.expanding().apply(lambda r: reduce(lambda x, y: x + 1 if y else x * y, r))
c[ser & (ser != ser.shift(-1))].value_counts()
Out:
1.0 6
Name: x, dtype: int64
基本上,它创建一个布尔系列获取转折点之间的累积计数(当符号改变时,它重新开始)。例如,对于非负数,c
是:
Out:
0 0.0
1 1.0 # turning point
2 0.0
3 1.0
4 2.0
5 3.0
6 4.0 # turning point
7 0.0
8 1.0
9 2.0
10 3.0
11 4.0
12 5.0
13 6.0
14 7.0 # turning point
15 0.0
16 1.0 # turning point
17 0.0
18 1.0
19 2.0 # turning point
20 0.0
Name: x, dtype: float64
现在,为了识别转折点,条件是当前值与下一个不同,并且它是真。如果选择这些,则有计数。
答案 2 :(得分:1)
到目前为止,这是我提出的,它可以工作并输出一个计数,表示连续出现负值,正值和零值的次数。也许有人可以使用上面的ayhan和Ghilas发布的一些建议使其更简洁。
from collections import Counter
ser = [-3.550049, 28.609863, -2.1, 89941,4.419922,17.690185,41.219971,0.000000,-16.330078,7.950195,0.000000,38.370117,6.060059,3.579834,7.669922,2.739991,-8.039795,10.239990,-1.580078, 1.669922, 7.440186,-0.940185]
c = 0
zeros, neg_counts, pos_counts = [], [], []
for i in range(len(ser)):
c+=1
s = np.sign(ser[i])
try:
if s != np.sign(ser[i+1]):
if s == 0:
zeros.append(c)
elif s == -1:
neg_counts.append(c)
elif s == 1:
pos_counts.append(c)
c = 0
except IndexError:
pos_counts.append(c) if s == 1 else neg_counts.append(c) if s ==-1 else zeros.append(c)
print(Counter(neg_counts))
print(Counter(pos_counts))
print(Counter(zeros))
出:
Counter({1: 5})
Counter({1: 3, 2: 1, 4: 1, 5: 1})
Counter({1: 2})