我有一个看起来像这样的文本文件:
"Saved at:19 January 2015, 1:01 PM"
"Course" "Time"
"EDPY 301 (SEM J4 Wi14)" "28 January 2014, 7:28 PM"
"EDPY 301 (SEM J4 Wi14)" "28 January 2014, 7:27 PM"
"EDPY 301 (SEM J4 Wi14)" "28 January 2014, 7:26 PM"
"EDPY 301 (SEM J4 Wi14)" "28 January 2014, 7:25 PM"
"EDPY 301 (SEM J4 Wi14)" "28 January 2014, 7:02 PM"
"EDPY 301 (SEM J4 Wi14)" "28 January 2014, 7:02 PM"
"EDPY 301 (SEM J4 Wi14)" "28 January 2014, 6:57 PM"
"EDPY 301 (SEM J4 Wi14)" "28 January 2014, 6:57 PM"
The DI Module Exam contains 16 mul..."
"EDPY 301 (SEM J4 Wi14)" "28 January 2014, 6:57 PM"
"EDPY 301 (SEM J4 Wi14)" "28 January 2014, 6:57 PM"
在使用read.delim
时,我指定了skip=1
,然后使用第二行作为标题。有时,在导入过程中应该跳过像第11行(可能是其他任何东西)的行。我想,如果有办法,特别是在R基地,
"EDPY 301 (SEM J4 Wi14)"
开头的行。仅供参考,这是我用来导入文本文件的代码:
read.delim("path to the file",header=T,stringsAsFactors=FALSE,strip.white=TRUE,na.strings=c("NA",""),skip=1)
谢谢,
答案 0 :(得分:1)
我不知道有条件地使用read.table
排除行,但是使用readLines读取并使用grep或grepl创建包含向量似乎有效:
Lines <- readLines(textConnection('"Saved at:19 January 2015, 1:01 PM"
"Course" "Time"
"EDPY 301 (SEM J4 Wi14)" "28 January 2014, 7:28 PM"
"EDPY 301 (SEM J4 Wi14)" "28 January 2014, 7:27 PM"
"EDPY 301 (SEM J4 Wi14)" "28 January 2014, 7:26 PM"
"EDPY 301 (SEM J4 Wi14)" "28 January 2014, 7:25 PM"
"EDPY 301 (SEM J4 Wi14)" "28 January 2014, 7:02 PM"
"EDPY 301 (SEM J4 Wi14)" "28 January 2014, 7:02 PM"
"EDPY 301 (SEM J4 Wi14)" "28 January 2014, 6:57 PM"
"EDPY 301 (SEM J4 Wi14)" "28 January 2014, 6:57 PM"
The DI Module Exam contains 16 mul..."
"EDPY 301 (SEM J4 Wi14)" "28 January 2014, 6:57 PM"
"EDPY 301 (SEM J4 Wi14)" "28 January 2014, 6:57 PM"'))
good <- grep("^\\\"EDPY", Lines)
inp <- read.table(text=Lines[good], col.names = c("Course","Time" ))
模式字符串需要在行开头标记后有三个斜杠,两个用于斜杠,第三个用于转义双引号。