在保留标题的同时导入文本文件的特定行

时间:2015-01-19 21:27:32

标签: r text import

我有一个看起来像这样的文本文件:

"Saved at:19 January 2015, 1:01 PM"
"Course"    "Time"  
"EDPY 301 (SEM J4 Wi14)"    "28 January 2014, 7:28 PM"
"EDPY 301 (SEM J4 Wi14)"    "28 January 2014, 7:27 PM"
"EDPY 301 (SEM J4 Wi14)"    "28 January 2014, 7:26 PM"
"EDPY 301 (SEM J4 Wi14)"    "28 January 2014, 7:25 PM"
"EDPY 301 (SEM J4 Wi14)"    "28 January 2014, 7:02 PM"
"EDPY 301 (SEM J4 Wi14)"    "28 January 2014, 7:02 PM"
"EDPY 301 (SEM J4 Wi14)"    "28 January 2014, 6:57 PM"
"EDPY 301 (SEM J4 Wi14)"    "28 January 2014, 6:57 PM"
The DI Module Exam contains 16 mul..."
"EDPY 301 (SEM J4 Wi14)"    "28 January 2014, 6:57 PM"
"EDPY 301 (SEM J4 Wi14)"    "28 January 2014, 6:57 PM"

在使用read.delim时,我指定了skip=1,然后使用第二行作为标题。有时,在导入过程中应该跳过像第11行(可能是其他任何东西)的行。我想,如果有办法,特别是在R基地,

  1. 跳过第一行,
  2. 将第二行作为标题和
  3. 跳过不以"EDPY 301 (SEM J4 Wi14)"开头的行。
  4. 仅供参考,这是我用来导入文本文件的代码:

    read.delim("path to the file",header=T,stringsAsFactors=FALSE,strip.white=TRUE,na.strings=c("NA",""),skip=1)
    

    谢谢,

1 个答案:

答案 0 :(得分:1)

我不知道有条件地使用read.table排除行,但是使用readLines读取并使用grep或grepl创建包含向量似乎有效:

Lines <- readLines(textConnection('"Saved at:19 January 2015, 1:01 PM"
"Course"    "Time"  
"EDPY 301 (SEM J4 Wi14)"    "28 January 2014, 7:28 PM"
"EDPY 301 (SEM J4 Wi14)"    "28 January 2014, 7:27 PM"
"EDPY 301 (SEM J4 Wi14)"    "28 January 2014, 7:26 PM"
"EDPY 301 (SEM J4 Wi14)"    "28 January 2014, 7:25 PM"
"EDPY 301 (SEM J4 Wi14)"    "28 January 2014, 7:02 PM"
"EDPY 301 (SEM J4 Wi14)"    "28 January 2014, 7:02 PM"
"EDPY 301 (SEM J4 Wi14)"    "28 January 2014, 6:57 PM"
"EDPY 301 (SEM J4 Wi14)"    "28 January 2014, 6:57 PM"
The DI Module Exam contains 16 mul..."
"EDPY 301 (SEM J4 Wi14)"    "28 January 2014, 6:57 PM"
"EDPY 301 (SEM J4 Wi14)"    "28 January 2014, 6:57 PM"'))

good <- grep("^\\\"EDPY", Lines)
inp <- read.table(text=Lines[good], col.names = c("Course","Time" ))

模式字符串需要在行开头标记后有三个斜杠,两个用于斜杠,第三个用于转义双引号。