Question

我有一个带有时间序列数据的DataFrame，如下所示：

（TP =时间点）

gene number   TP1   TP2   TP3   TP4   TP5   TP6
gene1         0.4   0.2   0.1   0.5   0.8   1.9
gene2         0.3   0.05  0.5   0.8   1.0   1.7
....

对于每一行（基因），我想识别其值达到比时间序列中的最小值大4倍的TP的TP，其中附加条件是该识别的TP必须在最小TP之后。因此，在基因2的情况下，我对TP3而不是TP1（比TP2的最小值大4倍）感兴趣，因为TP1在系列中比在最小TP2中更早。

因此，我尝试构建的脚本的结果是：

gene1    TP4
gene2    TP3
...

我的数据是一个numpy数组。

Answer 1

这是一种方式：

df =pd.DataFrame({'TP1':[.4,.3],'TP2':[.2,.05],'TP3':[.1,.5],'TP4':[.5,.8],'TP5':[.8,1.0], 'TP6':[1.9,1.7]},index= ['gene1','gene2'])

def f(x):
    #get min value and index
    min_ind = [ e for e in enumerate(x) if e[1] == x.min()]
    #return only the first value that is greater than the index of the min value and > min value *4
    r =df.columns[[e[0] for e in enumerate(x) if e[1] if e[1] > min_ind[0][1]*4 and e[0]> min_ind[0][0]][0]]
    return r

返回：

df.apply(f, axis=1)

gene1    TP4
gene2    TP3
dtype: object

Answer 2

您可以先创建一个掩码ma，然后将最小值之前的所有行值设置为False。接下来，使用此掩码找到每行之后最小值的值，达到最小值的4倍（由True表示）：

>>> ma = df.values.argmin(axis=1)[:,None] <= np.arange(df.shape[1]) >>> df.ge(4*df.min(axis=1), axis=0) & ma TP1 TP2 TP3 TP4 TP5 TP6 gene1 False False False True True True gene2 False False True True True True

然后，您可以使用True从此布尔数据框（我称之为df1）中检索第一个idxmax值的标签：

>>> df1.idxmax(axis=1) gene1 TP4 gene2 TP3 dtype: object

根据每个时间序列的条件确定DataFrame中的时间点

2 个答案: