Question

问题：我们如何读取R中的数据文件，其中文件开头的元数据将被忽略？

在下面的示例文件中，我们希望从行

开始读取文件的末尾

1446.60     35785.0

示例摘录

Axis    Energy  Elements=   226

...

    Etch Time (EtchTime)\s  0.000000    
    Etch Level (EtchLevel)\ 0.000000    
Energy (E)  
eV  
1446.60     35785.0 
1446.80     34955.9 
1447.00     34448.0 
1447.20     33632.6 
1447.40     32905.1 
1447.60     31976.5 

...

此外，两列中的值都有一个空格，我们如何摆脱它们？使用strip.white=T似乎没有帮助：

read.table('myFile', sep = '\t', header = F, strip.white = T)

给出

        V1 V2          V3 V4
1   1446.6 NA  35785.0000 NA
2   1446.8 NA  34955.9000 NA
3   1447.0 NA  34448.0000 NA
4   1447.2 NA  33632.6000 NA
5   1447.4 NA  32905.1000 NA

Answer 1

你可以pipe awk或sed从数字开头读取（在linux中）。

 read.table(pipe("awk '/^\\s*(-?[0-9]+(\\.[0-9]*)?\\s*)+$/ {print $0}' Nyxynyx.txt"),
         header=FALSE)
 #     V1      V2
 #1 1446.6 35785.0
 #2 1446.8 34955.9
 #3 1447.0 34448.0
 #4 1447.2 33632.6
 #5 1447.4 32905.1
 #6 1447.6 31976.5

注意：Nyxynyx.txt是文件

从R中的文件读取数据帧时删除元数据

1 个答案: