我有一个数据框,我想删除所有以#
开头的行。任何人都可以告诉我该怎么做。提前致谢。
#ID_REF = The name of the probe set, blank for control probes
#VALUE = The signal value calculated by MAS5, normalized
#ABS_CALL = The detection value calculated by the MAS5
#DETECTION P-VALUE = The detection p-value calculated by the MAS5
*ID_REF** VALUE** ABS_CALL** DETECTION P-VALUE*
AFFX-BioB-5_at 757.7 P 0.00039
AFFX-BioB-M_at 933.7 P 0.000095
AFFX-BioB-3_at 525.6 P 0.000095
AFFX-BioC-5_at 1999.5 P 0.000044
AFFX-BioC-3_at 2339.5 P 0.000044
AFFX-BioDn-5_at 4321.3 P 0.000044
AFFX-BioDn-3_at 9229.4 P 0.00007
AFFX-CreX-5_at 21949.9 P 0.000044
AFFX-CreX-3_at 26022.8 P 0.000044
AFFX-DapX-5_at 1171.1 P 0.00006
答案 0 :(得分:1)
某些行中的注释字符(#
)不是第一个字符。一种方法是使用#
(“lines2”)删除具有注释字符(grep
)的行,然后使用read.csv
lines <- readLines('awaited.csv')
lines1 <- gsub('^ +| +$', '', lines)
lines2 <- lines1[!grepl('^#|^.*#', lines1)]
d1 <- read.csv(text=lines2, check.names=FALSE, stringsAsFactors=FALSE)
str(d1)
#'data.frame': 54682 obs. of 4 variables:
# $ *ID_REF** : chr "AFFX-BioB-5_at" "AFFX-BioB-M_at" "AFFX-BioB-3_at" "AFFX-BioC-5_at" ...
# $ VALUE** : num 758 934 526 2000 2340 ...
# $ ABS_CALL** : chr "P" "P" "P" "P" ...
# $ DETECTION P-VALUE*: num 3.9e-04 9.5e-05 9.5e-05 4.4e-05 4.4e-05 4.4e-05 7.0e-05 4.4e-05 4.4e-05 6.0e-05 ...
head(d1,3)
# *ID_REF** VALUE** ABS_CALL** DETECTION P-VALUE*
#1 AFFX-BioB-5_at 757.7 P 3.9e-04
#2 AFFX-BioB-M_at 933.7 P 9.5e-05
#3 AFFX-BioB-3_at 525.6 P 9.5e-05
或者,在comment.char='#'
read.csv
(#
)这些行中移除#
之前的所有其他字符后,您可以在sub(.*...)
中使用d2 <- read.csv(text=sub('.*(#.*)', '\\1', lines),
check.names=FALSE, stringsAsFactors=FALSE, comment.char='#')
dim(d2)
#[1] 54682 4
head(d2,3)
# *ID_REF** VALUE** ABS_CALL** DETECTION P-VALUE*
#1 AFFX-BioB-5_at 757.7 P 3.9e-04
#2 AFFX-BioB-M_at 933.7 P 9.5e-05
#3 AFFX-BioB-3_at 525.6 P 9.5e-05
参数。
{{1}}