如何在R中读取文本文件,每隔几行有不同的格式

时间:2014-05-30 22:32:54

标签: r

你们可以建议我如何将这些内容/类型的文本文件读入表格?最终我会将表推入情节。我尝试了多种组合 averages< -read.able(" name.txt",header = T,sep ="")并收到此消息:

read.table出错(" name.txt",header = T,sep =""):   列数多于列名

我需要能够保留[[1]],[[2]],[[3]],[[4]]结果并在表格或数据框中单独区分它们以便稍后绘图。非常感谢任何帮助。

> dput(ll)
c("[[1]]", "Time Queue_Size", "1  0.00000000          0", "2  0.01463509          1", 
"3  0.18473331          0", "4  0.70555473          1", "5  1.10322362          0", 
"6  2.24972346          1", "7  2.32344665          0", "8  3.30621739          1", 
"9  3.37728921          2", "10 3.47074298          1", "11 3.75560929          0", 
"12 4.56816309          1", "", "[[2]]", "        Time Queue_Size", 
"1  0.0000000          0", "2  0.1080389          1", "3  0.5729134          0", 
"4  1.0917759          1", "5  1.1280721          0", "6  1.3647759          1", 
"7  1.9137004          0", "8  3.1164888          1", "9  3.1500754          0", 
"10 3.2951701          1", "11 3.9362245          0", "12 4.7629641          1", 
"", "[[3]]", "        Time Queue_Size", "1  0.0000000          0", 
"2  0.2151396          1", "3  0.5810463          0", "4  1.2669130          1", 
"5  1.2694239          0", "6  1.2890854          1", "7  1.7050347          0", 
"8  2.3904563          1", "9  2.6800687          2", "10 2.7654936          3", 
"11 2.9624973          4", "12 2.9652142          3", "13 3.0096070          4", 
"14 3.1811061          3", "15 3.5783809          2", "16 3.6793138          1", 
"17 3.9339087          0", "18 4.5799301          1", "", "[[4]]", 
"        Time Queue_Size", "1  0.0000000          0", "2  0.1200693          1", 
"3  0.3663455          2", "4  0.5931517          1", "5  0.8235883          2", 
"6  0.8590099          1", "7  0.9474114          0", "8  1.1327633          1", 
"9  1.2933192          0", "10 1.8779916          1", "11 2.2328193          0", 
"12 2.7430489          1", "13 2.8380578          2", "14 2.8465716          3", 
"15 3.0760839          4", "16 3.4489915          5", "17 3.8352777          4", 
"18 4.2612698          5")

看起来像

[[1]]
Time Queue_Size
1  0.00000000          0
2  0.01463509          1
3  0.18473331          0
4  0.70555473          1
5  1.10322362          0
6  2.24972346          1
7  2.32344665          0
8  3.30621739          1
9  3.37728921          2
10 3.47074298          1
11 3.75560929          0
12 4.56816309          1

[[2]]
        Time Queue_Size
1  0.0000000          0
2  0.1080389          1
3  0.5729134          0
4  1.0917759          1
5  1.1280721          0
6  1.3647759          1
7  1.9137004          0
8  3.1164888          1
9  3.1500754          0
10 3.2951701          1
11 3.9362245          0
12 4.7629641          1

[[3]]
        Time Queue_Size
1  0.0000000          0
2  0.2151396          1
3  0.5810463          0
4  1.2669130          1
5  1.2694239          0
6  1.2890854          1
7  1.7050347          0
8  2.3904563          1
9  2.6800687          2
10 2.7654936          3
11 2.9624973          4
12 2.9652142          3
13 3.0096070          4
14 3.1811061          3
15 3.5783809          2
16 3.6793138          1
17 3.9339087          0
18 4.5799301          1

[[4]]
        Time Queue_Size
1  0.0000000          0
2  0.1200693          1
3  0.3663455          2
4  0.5931517          1
5  0.8235883          2
6  0.8590099          1
7  0.9474114          0
8  1.1327633          1
9  1.2933192          0
10 1.8779916          1
11 2.2328193          0
12 2.7430489          1
13 2.8380578          2
14 2.8465716          3
15 3.0760839          4
16 3.4489915          5
17 3.8352777          4
18 4.2612698          5

1 个答案:

答案 0 :(得分:4)

您可以使用readLines来阅读您的文件。然后,您可以使用正则表达式分割线条。最后,您使用read.table(text=…)

阅读每个部分
ll <- readLines('filename.txt')
lapply(split(ll,cumsum(grepl('[[',ll,fixed=TRUE))),
         function(x)read.table(text=x[-1],header=T))

EDIT对这个想法的解释

使用cumsum是将逻辑向量转换为整数向量的经典方法(此处用于拆分列表)。实际上,grepl创建了一个逻辑向量,该向量被转换为split函数使用的分割向量。一个例子:

cumsum(c(T,F,F,T,F))
[1] 1 1 1 2 2

然后您可以使用结果分割矢量:

split(1:5,cumsum(c(T,F,F,T,F)))

$`1`
[1] 1 2 3

$`2`
[1] 4 5