Question

我有一个类似的数据集

对于此数据集，我想执行一项任务，该任务将遍历我的数据集，并且如果出现时间的长度大于M，则将计算超出临界值的出现次数。

临界值和M将是系统参数。

因此，如果临界值为0.32并且M为1，它将打印出类似清单

[2, 4, 3, 2]

逻辑：第二列中的前两个值大于0.32，并且的长度大于M = 1，因此它打印出2和4,3,2，依此类推。

我需要一个帮助来编写参数，以便如果x> cutoff且Breaking的长度为> M，它将打印出Breaking Frames的长度（因此与上述相同）。有帮助吗？

结构应如下所示（我不确定如何将自变量替换为XXX）

def get_input(filename):
    with open(filename) as f:
        next(f) # skip the first line
        input_list = []
        for line in f:
            input_list.append(float(line.split()[1]))

    return input_list


def countwanted(input_list, wantbroken, cutoff,M):

    def whichwanted(x):
        if(wantbroken): return x > cutoff
        else: return x < cutoff

XXX I think here I need to add the criteria for M but not sure how?

filename=sys.argv[1]
wantbroken=(sys.argv[2]=='b' or sys.argv[2]=='B')
cutoff=float(sys.argv[3])
M=int(sys.argv[4])

input_list = get_input(filename)

broken,lifebroken=countwanted(input_list,True,cutoff,M)
#closed,lifeclosed=countwanted(input_list,False,cutoff,M)
print(lifebroken)
#print(lifeclosed)

或者也许有一种更简单的编写方法。

Answer 1

您可以使用numpy，这样会使生活变得更加轻松。

首先，让我们看一下文件加载器。 np.loadtxt可以在一行中完成相同的操作。

y = np.loadtxt(filename, skiprows=1, usecols=1)

现在创建一个掩码，这些掩码构成超出阈值的值：

b = (y > cutoff)  # I think you can figure out how to switch the sense of the test

其余的操作很简单，并且基于this question：

b = np.r_[0, b, 0]       # pad the ends
d = np.diff(b)           # find changes in state
start, = np.where(d > 0) # convert switch up to start indices
end, = np.where(d < 0)   # convert switch down to end indices
len = end - start        # get the lengths

现在您可以将M应用于len：

result = len[len >= M]

如果您要使用列表，itertools.groupby还提供了一个很好的解决方案：

grouper = it.groupby(y, key=lambda x: x > cutoff)
result = [x for x in (len(list(group)) for key, group in grouper if key) if x >= M]

对某些条件使用简单的计数方法

1 个答案: