数据之间的空白到底可以解析为NA?

时间:2013-12-01 07:53:46

标签: r

c:/workspace/test.txt中有一个数据文件(不是固定宽度格式的数据文件)。

           24.9444 24.7500 24.5555 24.3389 24.0389 23.7667  NA    
24.7500 25.0167          25.6800 26.3055 24  26.1833 25      
25.5778 25.6000 25.5167 25.3944 25.1889 24.9389 24.6778 24.3833 24.0944 23.8    
25.5778 25.6000 25.5167         

我读了它,得到了结果。

> read.table("c:/workspace/test.txt",blank.lines.skip=FALSE, col.names=paste("x",1:10,sep=""),sep="") ->xx    
> xx    
       x1      x2      x3      x4      x5      x6      x7      x8      x9  x10
1 24.9444 24.7500 24.5555 24.3389 24.0389 23.7667      NA      NA      NA   NA
2 24.7500 25.0167 25.6800 26.3055 24.0000 26.1833 25.0000      NA      NA   NA
3 25.5778 25.6000 25.5167 25.3944 25.1889 24.9389 24.6778 24.3833 24.0944 23.8
4 25.5778 25.6000 25.5167      NA      NA      NA      NA      NA      NA   NA  

我可以将数据文件解析为下吗?

         x1      x2      x3      x4      x5      x6      x7      x8    x9  x10
1 NA      24.9444 24.7500 24.5555 24.3389 24.0389 23.7667      NA      NA   NA 
2 24.7500 25.0167 NA      25.6800 26.3055 24.0000 26.1833 25.0000      NA   NA
3 25.5778 25.6000 25.5167 25.3944 25.1889 24.9389 24.6778 24.3833 24.0944 23.8
4 25.5778 25.6000 25.5167      NA      NA      NA      NA      NA      NA   NA 

1 个答案:

答案 0 :(得分:0)

这取代了NA,其中找到了7个空白的序列,然后正常读取它:

 txt <- 
"           24.9444 24.7500 24.5555 24.3389 24.0389 23.7667  NA    
 24.7500 25.0167          25.6800 26.3055 24  26.1833 25      
 25.5778 25.6000 25.5167 25.3944 25.1889 24.9389 24.6778 24.3833 24.0944 23.8    
 25.5778 25.6000 25.5167         "
 ttt <- readLines(textConnection(txt))
 read.table( text = gsub("\\s{7}", " NA ", ttt) ,fill =TRUE)

       V1      V2      V3      V4      V5      V6      V7      V8
1      NA 24.9444 24.7500 24.5555 24.3389 24.0389 23.7667      NA
2 24.7500 25.0167      NA 25.6800 26.3055 24.0000 26.1833 25.0000
3 25.5778 25.6000 25.5167 25.3944 25.1889 24.9389 24.6778 24.3833
4 25.5778 25.6000 25.5167      NA      NA      NA      NA      NA
       V9  V10
1      NA   NA
2      NA   NA
3 24.0944 23.8
4      NA   NA