在R中更改数据帧的列中特定值的所有出现的有效方法

时间:2014-08-08 06:13:34

标签: r performance dataframe

我有一个巨大的数据框(28987853行),格式为

head(ratRawData)
  ratGene        ratReplicate alignment RNAtype
1     C4b Thymus_M_GSM1328751         2     REG
2    Rpl4 Thymus_M_GSM1328751         4     REG
3    Dntt Thymus_M_GSM1328751         3     DUP
4  Sptbn1 Thymus_M_GSM1328751         2     DUP
5  Ndufb7 Thymus_M_GSM1328751         2     REG
6 Ndufb10 Thymus_M_GSM1328751         2     REV

现在,我想要做的是将RNAtype中DUP的所有出现更改为REV。由于thyis数据框架相当大,我想知道这样做的好方法。提前谢谢!

1 个答案:

答案 0 :(得分:3)

我做了一些时间。

> set.seed(357)
> rat.raw.data <- data.frame(col1 = sample(letters, 28987853, replace = TRUE),
+                            col2 = sample(1:10, 28987853, replace = TRUE),
+                            col3 = sample(LETTERS, 28987853, replace = TRUE),
+                            rna = sample(c("REG", "DUP", "REV"), 28987853, replace = TRUE))
> 
> 
> dusty <- rat.raw.data
> system.time({dusty$rna[dusty$rna == "DUP"] <-  "REV"})
   user  system elapsed 
   3.37    0.24    3.64 
> 
> akrun <- rat.raw.data
> system.time({akrun$rna[grepl("DUP", akrun$rna)]<- "REV"})
   user  system elapsed 
   5.06    0.04    5.18 
> 
> roman <- rat.raw.data
> system.time({levels(roman$rna) <- c("REV", "REG", "REV")})
   user  system elapsed 
   1.08    0.13    1.20 
> head(dusty)
  col1 col2 col3 rna
1    c    3    P REV
2    b    7    B REG
3    h    6    T REV
4    f    3    H REV
5    q    6    F REG
6    m    9    F REV
> head(akrun)
  col1 col2 col3 rna
1    c    3    P REV
2    b    7    B REG
3    h    6    T REV
4    f    3    H REV
5    q    6    F REG
6    m    9    F REV
> head(roman)
  col1 col2 col3 rna
1    c    3    P REV
2    b    7    B REG
3    h    6    T REV
4    f    3    H REV
5    q    6    F REG
6    m    9    F REV