如何删除R中以特殊字符开头的行

时间:2015-02-10 16:31:55

标签: r

我有一个数据框,我想删除所有以#开头的行。任何人都可以告诉我该怎么做。提前致谢。

#ID_REF = The name of the probe set, blank for control probes           
    #VALUE = The signal value calculated by MAS5, normalized            
    #ABS_CALL = The detection value calculated by the MAS5          
    #DETECTION P-VALUE = The detection p-value calculated by the MAS5           
    *ID_REF**   VALUE** ABS_CALL**  DETECTION P-VALUE*
    AFFX-BioB-5_at  757.7   P   0.00039
    AFFX-BioB-M_at  933.7   P   0.000095
    AFFX-BioB-3_at  525.6   P   0.000095
    AFFX-BioC-5_at  1999.5  P   0.000044
    AFFX-BioC-3_at  2339.5  P   0.000044
    AFFX-BioDn-5_at 4321.3  P   0.000044
    AFFX-BioDn-3_at 9229.4  P   0.00007
    AFFX-CreX-5_at  21949.9 P   0.000044
    AFFX-CreX-3_at  26022.8 P   0.000044
    AFFX-DapX-5_at  1171.1  P   0.00006

1 个答案:

答案 0 :(得分:1)

某些行中的注释字符(#)不是第一个字符。一种方法是使用#(“lines2”)删除具有注释字符(grep)的行,然后使用read.csv

进行阅读
lines <- readLines('awaited.csv')
lines1 <- gsub('^ +| +$', '', lines)
lines2 <- lines1[!grepl('^#|^.*#', lines1)]
d1 <- read.csv(text=lines2, check.names=FALSE, stringsAsFactors=FALSE)
str(d1)
#'data.frame':  54682 obs. of  4 variables:
# $ *ID_REF**         : chr  "AFFX-BioB-5_at" "AFFX-BioB-M_at" "AFFX-BioB-3_at" "AFFX-BioC-5_at" ...
# $ VALUE**           : num  758 934 526 2000 2340 ...
# $ ABS_CALL**        : chr  "P" "P" "P" "P" ...
# $ DETECTION P-VALUE*: num  3.9e-04 9.5e-05 9.5e-05 4.4e-05 4.4e-05 4.4e-05 7.0e-05 4.4e-05 4.4e-05 6.0e-05 ...
head(d1,3)
#       *ID_REF** VALUE** ABS_CALL** DETECTION P-VALUE*
#1 AFFX-BioB-5_at   757.7          P            3.9e-04
#2 AFFX-BioB-M_at   933.7          P            9.5e-05
#3 AFFX-BioB-3_at   525.6          P            9.5e-05

或者,在comment.char='#' read.csv#)这些行中移除#之前的所有其他字符后,您可以在sub(.*...)中使用d2 <- read.csv(text=sub('.*(#.*)', '\\1', lines), check.names=FALSE, stringsAsFactors=FALSE, comment.char='#') dim(d2) #[1] 54682 4 head(d2,3) # *ID_REF** VALUE** ABS_CALL** DETECTION P-VALUE* #1 AFFX-BioB-5_at 757.7 P 3.9e-04 #2 AFFX-BioB-M_at 933.7 P 9.5e-05 #3 AFFX-BioB-3_at 525.6 P 9.5e-05 参数。

{{1}}