使用python将数字列表压缩为唯一的非重叠时间范围

时间:2018-02-21 04:54:39

标签: python algorithm python-2.7 numpy

我来自生物学,对python和ML来说很新,实验室有一个黑盒ML模型输出如下序列:

Predictions =
[1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,0,1,0,1,0,1,0,1,1,1,1,1,0,0,0,1,1,1,1,1,1,0]  

每个值表示持续时间为0.25秒的预测时间帧 1表示高。
0表示不高。

如何将这些预测转换为[开始,停止,标签]? 为了将较长的序列分组示例,前10个代表0到10 * .25s,因此第一个范围和标签将是

[[0.0,2.5,高]
接下来有13个零===> start =(2.5),stop = 13 * .25 +2.5,label = not high
因此
[2.5,5.75,不高]

因此,最终列表将类似于具有唯一非重叠间隔的列表/范围列表以及如下标签:

[[0.0,2.5, High],
[2.5, 5.75, Not-High],
[5.75,6.50, High] ..

我尝试了什么:
1.计算预测中的值数量
2.生成两个范围,一个从零开始,另一个从0.25开始 3.将这两个列表合并为元组

import numpy as np  
len_pred = len(Predictions) 
range_1 = np.arange(0,len_pred,0.25)
range_2 = np.arange(0.25,len_pred,0.25)
new_range = zip(range_1,range_2)  

在这里,我可以获得范围,但遗漏在标签上 看似简单的问题,但我在圈子里跑。

请指教。 谢谢。

3 个答案:

答案 0 :(得分:4)

您可以在检测到更改时遍历列表并创建范围。使用此方法时,您还需要考虑最终范围。可能不是超级干净但应该有效。

public Task<Void> convertTask() {
        return new Task<Void>() {
            @Override
            protected Void call() throws Exception {

                int totalSteps = 100;

                // Step one (fast)
                doThis();
                updateProgress(1, 100);

                // Step two (fast)
                doThisToo();
                updateProgress(2, 100);

                // Step three (iterates through an ArrayList)
                for (Clazz clazz :
                        listOfThings) {
                    // Do something with the list
                    updateProgress(?, ?); // <-- Need help figuring this out!
                }

                // Step four (fast)
                doThisLast();
                updateProgress(4, 100);

                return null;
            }
        };
    }

更新以修复一些类型错误。

答案 1 :(得分:4)

使用diff()where(),您可以找到值更改的所有索引:

import numpy as np

p = np.array([1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,0,1,0,1,0,1,0,1,1,1,1,1,0,0,0,1,1,1,1,1,1,0])

idx = np.r_[0, np.where(np.diff(p) != 0)[0]+1, len(p)]
t = idx * 0.25

np.c_[t[:-1], t[1:], p[idx[:-1]]]

输出:

array([[  0.  ,   2.5 ,   1.  ],
       [  2.5 ,   5.75,   0.  ],
       [  5.75,   6.5 ,   1.  ],
       [  6.5 ,   6.75,   0.  ],
       [  6.75,   7.  ,   1.  ],
       [  7.  ,   7.25,   0.  ],
       [  7.25,   7.5 ,   1.  ],
       [  7.5 ,   7.75,   0.  ],
       [  7.75,   8.  ,   1.  ],
       [  8.  ,   8.25,   0.  ],
       [  8.25,   9.5 ,   1.  ],
       [  9.5 ,  10.25,   0.  ],
       [ 10.25,  11.75,   1.  ],
       [ 11.75,  12.  ,   0.  ]])

答案 2 :(得分:3)

如果我理解正确,我认为这样的事情应该有效。

compact_prediction = list()
sequence = list()  # This will contain each sequence list [start, end, label]

last_prediction = 0

for index, prediction in enumerate(Predictions):
    if index == 0:
        sequence.append(0)  # It's the first sequence, so it will start in zero

    # When we not talking about the prediction we only end the sequence
    # when the last prediction is different from the current one, 
    # signaling a change
    elif prediction != last_prediction:
        sequence.append((index - 1) * 0.25) # We append the end of the sequence

        # And we put the label based on the last prediction
        if last_prediction == 1:  
            sequence.append('High')
        else:
            sequence.append('Not-High')

        # Append to our compact list and reset the sequence
        compact_prediction.append(sequence)
        sequence= list()

        # After reseting the sequence we append the start of the new one
        sequence.append(index * 0.25)

    # Save the last prediction so we can check if it changed
    last_prediction = prediction

print(compact_prediction)

结果:[[0.0,2.25,'High'],[2.5,5.5,'Not-High'],[5.75,6.25,'High'],[6.5,6.5,'Not-High'], [6.75,6.75,'高'],[7.0,7.0,'不高'],[7.25,7.25,'高'],[7.5,7.5,'不高'],[7.75,7.75,'高'], [8.0,8.0,'不高'],[8.25,9.25,'高'],[9.5,10.0,'不高'],[10.25,11.5,'高']