我有这样的数据框:
RTD I
0 BA 32
1 BA 15
2 BA 22
3 BA 75
4 BA 28
5 BA 32 > 6 BA 7
现在,我想计算最小数量和最大连续行数,其中数字32不存在
代码是(参见:@MaxU):
len(x) - np.argwhere(x.I == 32).max() - 1
out = 1(它是对的)
len(x) - np.argwhere(x.I == 32).min() - 1
Out = 6(这是错误的,因为结果应该是4
我找到的解决方案是:
import pandas as pd
import numpy as np
df = pd.DataFrame({'RTD': ['BA']*7, 'I': [32, 15, 22, 75, 28, 32, 7]})
print (df )
def rolling_count(val):
if val == rolling_count.previous:
rolling_count.count +=1
else:
rolling_count.previous = val
rolling_count.count = 1
return rolling_count.count
rolling_count.count = 0 #static variable
rolling_count.previous = None #static variable
df['count'] = df['I']==32
ddf= df['count'].apply(rolling_count)
print ('delay maximum',max(ddf))
DelayMinimum= len(df) - np.argwhere(df.I==32).max() - 1
print(DelayMinimum)
答案 0 :(得分:0)
如果索引的编号为0到n-1,则只能选择值32,然后取索引的第一个差异。
np.diff(np.append(-2, df.query('I==32').index.values)) -1
我不了解第一个值,但这应该让你非常接近。
答案 1 :(得分:0)
有点强大的解决方案,但它的工作原理。我包含了整个代码,所以如果我误解了某些内容你就可以纠正我:
import pandas as pd
import numpy as np
df = pd.DataFrame({'RTD': ['BA']*7, 'I': [32, 15, 22, 75, 28, 32, 7]})
occurrences = df[df['I'] == 32].min(axis=1).index.values
max_diff = 0
for i in range(len(occurrences)-1):
curr_diff = occurrences[i + 1] - occurrences[i] - 1
if curr_diff > max_diff:
max_diff = curr_diff
min_diff = len(df['I'])
occurrences = np.append(occurrences, min_diff - 1)
for i in range(len(occurrences)-1):
curr_diff = occurrences[i + 1] - occurrences[i]
if curr_diff < min_diff:
min_diff = curr_diff