Question

我来自生物学，对python和ML来说很新，实验室有一个黑盒ML模型输出如下序列：

Predictions =
[1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,0,1,0,1,0,1,0,1,1,1,1,1,0,0,0,1,1,1,1,1,1,0]

每个值表示持续时间为0.25秒的预测时间帧 1表示高。
0表示不高。

如何将这些预测转换为[开始，停止，标签]？为了将较长的序列分组示例，前10个代表0到10 * .25s，因此第一个范围和标签将是

[[0.0,2.5，高]
接下来有13个零===＆gt; start =（2.5），stop = 13 * .25 +2.5，label = not high
因此
[2.5,5.75，不高]

因此，最终列表将类似于具有唯一非重叠间隔的列表/范围列表以及如下标签：

[[0.0,2.5, High],
[2.5, 5.75, Not-High],
[5.75,6.50, High] ..

我尝试了什么：
1.计算预测中的值数量
2.生成两个范围，一个从零开始，另一个从0.25开始 3.将这两个列表合并为元组

import numpy as np  
len_pred = len(Predictions) 
range_1 = np.arange(0,len_pred,0.25)
range_2 = np.arange(0.25,len_pred,0.25)
new_range = zip(range_1,range_2)

在这里，我可以获得范围，但遗漏在标签上看似简单的问题，但我在圈子里跑。

请指教。谢谢。

Answer 1

您可以在检测到更改时遍历列表并创建范围。使用此方法时，您还需要考虑最终范围。可能不是超级干净但应该有效。

public Task<Void> convertTask() {
        return new Task<Void>() {
            @Override
            protected Void call() throws Exception {

                int totalSteps = 100;

                // Step one (fast)
                doThis();
                updateProgress(1, 100);

                // Step two (fast)
                doThisToo();
                updateProgress(2, 100);

                // Step three (iterates through an ArrayList)
                for (Clazz clazz :
                        listOfThings) {
                    // Do something with the list
                    updateProgress(?, ?); // <-- Need help figuring this out!
                }

                // Step four (fast)
                doThisLast();
                updateProgress(4, 100);

                return null;
            }
        };
    }

更新以修复一些类型错误。

Answer 2

使用diff()和where()，您可以找到值更改的所有索引：

import numpy as np

p = np.array([1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,0,1,0,1,0,1,0,1,1,1,1,1,0,0,0,1,1,1,1,1,1,0])

idx = np.r_[0, np.where(np.diff(p) != 0)[0]+1, len(p)]
t = idx * 0.25

np.c_[t[:-1], t[1:], p[idx[:-1]]]

输出：

array([[  0.  ,   2.5 ,   1.  ],
       [  2.5 ,   5.75,   0.  ],
       [  5.75,   6.5 ,   1.  ],
       [  6.5 ,   6.75,   0.  ],
       [  6.75,   7.  ,   1.  ],
       [  7.  ,   7.25,   0.  ],
       [  7.25,   7.5 ,   1.  ],
       [  7.5 ,   7.75,   0.  ],
       [  7.75,   8.  ,   1.  ],
       [  8.  ,   8.25,   0.  ],
       [  8.25,   9.5 ,   1.  ],
       [  9.5 ,  10.25,   0.  ],
       [ 10.25,  11.75,   1.  ],
       [ 11.75,  12.  ,   0.  ]])

Answer 3

如果我理解正确，我认为这样的事情应该有效。

compact_prediction = list()
sequence = list()  # This will contain each sequence list [start, end, label]

last_prediction = 0

for index, prediction in enumerate(Predictions):
    if index == 0:
        sequence.append(0)  # It's the first sequence, so it will start in zero

    # When we not talking about the prediction we only end the sequence
    # when the last prediction is different from the current one, 
    # signaling a change
    elif prediction != last_prediction:
        sequence.append((index - 1) * 0.25) # We append the end of the sequence

        # And we put the label based on the last prediction
        if last_prediction == 1:  
            sequence.append('High')
        else:
            sequence.append('Not-High')

        # Append to our compact list and reset the sequence
        compact_prediction.append(sequence)
        sequence= list()

        # After reseting the sequence we append the start of the new one
        sequence.append(index * 0.25)

    # Save the last prediction so we can check if it changed
    last_prediction = prediction

print(compact_prediction)

结果：[[0.0,2.25，'High']，[2.5,5.5，'Not-High']，[5.75,6.25，'High']，[6.5,6.5，'Not-High']， [6.75,6.75，'高']，[7.0,7.0，'不高']，[7.25,7.25，'高']，[7.5,7.5，'不高']，[7.75,7.75，'高']， [8.0,8.0，'不高']，[8.25,9.25，'高']，[9.5,10.0，'不高']，[10.25,11.5，'高']

使用python将数字列表压缩为唯一的非重叠时间范围

3 个答案: