Question

我有一个数据框，其中的列和行有很多id，例如下面的数据只有2个id。

    id  group   time    gene1   gene2   gene3   …
1   1   A        1       1       2       2      …
2   1   A        2       2       5       4      …
3   1   A        3       3       8       5      …
4   1   A        4       3       8       6      …
5   1   A        5       3       8       7      …
6   1   B       -2       0       0       9      …
7   1   B        1       0       1       1      …
8   1   B        5       7       5       0      …
9   2   A        1       1       2       2      …
10  2   A        2       2       5       3      …
11  2   A        3       3       4       4      …
12  2   A        4       4       3       3      …
13  2   A        5       6       0       6      …
14  2   B       -2       0       0       8      …
15  2   B        1       1       0       1      …
16  2   B        5       7       5       0      …

我想根据以下条件在每个主题（id）中将其替换为NA：

如果第6行和第7行（时间为-2和1的B组）中的值均为0，则
1. 如果第5行（时间5的A组）中的值为0，则为该主题的所有值赋予NA；否则，为0。
2. 如果第5行中的值不为0，则将第5行到第8行以外的值赋予NA。

如果第6行和第7行中的值都不为0，则无需更改任何值。

所以输出表如下：

id group time gene1 gene2 gene3 … 1 1 A 1 NA 2 2 … 2 1 A 2 NA 5 4 … 3 1 A 3 NA 8 5 … 4 1 A 4 NA 8 6 … 5 1 A 5 3 8 7 … 6 1 B -2 0 0 9 … 7 1 B 1 0 1 1 … 8 1 B 5 7 5 0 … 9 2 A 1 1 NA 2 … 10 2 A 2 2 NA 3 … 11 2 A 3 3 NA 4 … 12 2 A 4 4 NA 3 … 13 2 A 5 6 NA 6 … 14 2 B -2 0 NA 8 … 15 2 B 1 1 NA 1 … 16 2 B 5 7 NA 0 …

Answer 1

根据所概述的规则很难说出您想要什么-但我会尝试遵循您的逻辑：

for(i in unique(df$id)){
    df_sub=df[df$id==i,] # isolate each part of df with the same id
    for(g in c('gene1','gene2','gene3')){ # treat each gene column separately
             if(df_sub[6,colnames(df_sub)==g]==0 | df_sub[7,colnames(df_sub)==g]==0){ # check if row 6 or 7 are 0
                 if(df_sub[5,colnames(df_sub)==g]==0){ #check if row 5 is 0
                      df_sub[,colnames(df_sub)==g]=NA # fulfil row 3 rule #1
                 }else{
                      df_sub[c(1:4),colnames(df_sub)==g]=NA # fulfil row 3 rule #2
                 }
             } else{print('row 6 and 7 are not both 0- no need to change')}
        }
        df[df$id==i,]=df_sub # reset values in original data frame with those from the amended 'df_sub'
    }

更新-我希望这段代码有意义

如何根据其他条件将值更改为NA？

1 个答案: