根据条件将标志添加到列

时间:2019-04-12 11:44:47

标签: r

假设您有一个像这样的data.frame:

          FDR_1      Label_1     FDR_2      Label_2  
          0.001        NA        0.45         NA
          0.34         NA         6           NA
          0.2          NA         3           NA
          2            NA         2.5         NA
          4            NA        0.001        NA           

对于总共10.000行和3000列,您需要以下输出:

       FDR_1      Label_1     FDR_2      Label_2  
          0.001        NA        0.45         NA
          0.34         NA         6           Y
          0.2          NA         3           Y
          2            Y         2.5          Y
          4            Y        0.001         NA           

换句话说,您要将Y“标志”添加到FDR *列包含值> 2的行中。

我尝试过:

lapply(mydf, function(x) ifelse(mydf[, grepl( "FDR" , names(mydf) ) > 2, .....) 

但是我不知道如何继续添加标志。

有人可以帮我吗?

提前谢谢

4 个答案:

答案 0 :(得分:3)

我们可以在base R中使用

df1[!i1] <- 'Y'[(NA^(df1[i1] <= 2))]
df1
#   FDR_1 Label_1 FDR_2 Label_2
#1 0.001    <NA> 0.450    <NA>
#2 0.340    <NA> 6.000       Y
#3 0.200    <NA> 3.000       Y
#4 2.000    <NA> 2.500       Y
#5 4.000       Y 0.001    <NA>

其中

i1 <-  grepl("^FDR", names(df1))

数据

df1 <- structure(list(FDR_1 = c(0.001, 0.34, 0.2, 2, 4), Label_1 = c(NA, 
 NA, NA, NA, NA), FDR_2 = c(0.45, 6, 3, 2.5, 0.001), Label_2 = c(NA, 
 NA, NA, NA, NA)), class = "data.frame", row.names = c(NA, -5L
 ))

答案 1 :(得分:2)

我们可以使用底数为R的split.default来拆分数字上的列,即

do.call(cbind, 
   lapply(split.default(df, gsub('\\D+', '',names(df))), function(i){
                                           i[2] <- replace(i[2], i[1] >= 2, 'Y'); i}))

#  1.FDR_1 1.Label_1 2.FDR_2 2.Label_2
#1   0.001      <NA>   0.450      <NA>
#2   0.340      <NA>   6.000         Y
#3   0.200      <NA>   3.000         Y
#4   2.000         Y   2.500         Y
#5   4.000         Y   0.001      <NA>

答案 2 :(得分:1)

使用R的循环“免费”基础reshape变体:

df <- structure(list(FDR_1   = c(0.001, 0.34, 0.2, 2, 4), 
                     Label_1 = c(NA, NA, NA, NA, NA), 
                     FDR_2   = c(0.45, 6, 3, 2.5, 0.001), 
                     Label_2 = c(NA, NA, NA, NA, NA)), 
                class     = "data.frame", 
                row.names = c(NA, -5L))

mv <- lapply(split(names(df), 
            gsub("(.+)_\\d+", 
                 "\\1", 
                 names(df))), sort)

data_long <- reshape(df, 
                     varying   = mv, 
                     direction = "long", 
                     v.names   = names(mv))
data_long$Label[data_long$FDR >= 2] <- "Y"
reshape(data_long)
#     id FDR_1 Label_1 FDR_2 Label_2
# 1.1  1 0.001    <NA> 0.450    <NA>
# 2.1  2 0.340    <NA> 6.000       Y
# 3.1  3 0.200    <NA> 3.000       Y
# 4.1  4 2.000       Y 2.500       Y
# 5.1  5 4.000       Y 0.001    <NA>

答案 3 :(得分:0)

您也可以尝试tidyverse

library(tidyverse)
read.table(text="  FDR_1      Label_1     FDR_2      Label_2  
          0.001        NA        0.45         NA
          0.34         NA         6           NA
          0.2          NA         3           NA
          2            NA         2.5         NA
          4            NA        0.001        NA    ", header=T) %>% 
  rownames_to_column() %>% 
  gather(k, v, -rowname) %>% 
  separate(k, into = c("k1", "k2")) %>% 
  spread(k1, v) %>% 
  mutate(Label = ifelse(FDR >= 2, "Y", Label)) %>% 
  gather(k, v, -rowname, -k2) %>% 
  unite(k, k2, k) %>% # changing the colnames a little bit
  spread(k, v) %>% 
  select(-1)    
  1_FDR 1_Label 2_FDR 2_Label
1 0.001    <NA>  0.45    <NA>
2  0.34    <NA>     6       Y
3   0.2    <NA>     3       Y
4     2       Y   2.5       Y
5     4       Y 0.001    <NA>