Question

我有一个包含数万行的文本文件，其中插入了2010 5 3 0 0等时间戳。它们不一致，但是2行是。

如何导入2列（试用版和数字版），而忽略我有这些时间戳的行？

a <- read.table('test.txt')

目前，我收到此错误：

Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec,  : 
  line 5 did not have 2 elements

数据

 Trial  0.214526266019124
 Trial  0.213914388985549
 Trial  0.213886659329060
 Trial  0.213886587273578
2010  5  3  0  0
 Trial  0.213886587273578
 Trial  0.213256610071994
 Trial  0.213232963405967
 Trial  0.213232928149832
2011  2  3  0  0
 Trial  0.213886587273578
 Trial  0.213256610071994
 Trial  0.213232963405967
 Trial  0.213232928149832
 Trial  0.213886587273578
 Trial  0.213256610071994
 Trial  0.213232963405967
2011  2  6  0  0

Answer 1

您可以将read.table（或其他功能）与grep结合使用：

read.table(text=grep("Trial", readLines(path_to_your_file), value=TRUE))

这会解决您的问题吗？

Answer 2

如果你有perl，你可以使用它进行数据清理并捕获输出而不使用pipe离开R.不得不在perl“one-liner”中逃避正则表达式和引号使得它有点奇怪，可能更好，因为它是自己的脚本。

这里的perl管道可能比你需要的更复杂。 perl -lne 'print $1 if m/Trial (.*)/'可能就足够了。下面捕获时间戳并将其附加到所有行，直到找到时间戳。 \W+匹配一个或多个空格字符，但需要额外的转义才能从R的解析器中转义并传递给perl：\\W+。 \"用于防止R认为我们给它的字符串已经结束，同时仍允许在perl中使用字符串分隔符（可以在perl中使用qq(..)而不是"..."。

a <- read.table(
   pipe("perl -lne  '
        BEGIN{$ts=\"0 0 0 0 0\"} 
        chomp; 
        if(/Trial\\W+(.*)/){ 
           print \"$1 $ts\" 
       } else {
         $ts=$_
      }' test.txt"))

对于示例数据，输出将是

         V1   V2 V3 V4 V5 V6
1 0.2145263    0  0  0  0  0
2 0.2139144    0  0  0  0  0
3 0.2138867    0  0  0  0  0
4 0.2138866    0  0  0  0  0
5 0.2138866 2010  5  3  0  0
6 0.2132566 2010  5  3  0  0
7 0.2132330 2010  5  3  0  0
8 0.2132329 2010  5  3  0  0

Answer 3

def main():
    print("I want this to show once at the beggining")

    p0 = multiprocessing.Process( ... )
    p0.start() 

    ...

    p2.join()

    print("I want this to show once at the end")

在R中读取文本文件时如何忽略行？

3 个答案: