我正在尝试将沉重的文本文件导入R。文本文件中有很多换行符,如下所示。如何将数据转换为原始格式?请注意,此处的分隔符为~~
。
这是它的样子
PSU~~WEST BENGAL~~SOUTH 24 PARGANAS~~1~~21~~53~~66014~~WB19~~2011~~Q3~~13~~3~~0~~61965~~0~~1098~~323~~775~~~~~~~~18428.79781420765~~43536.202185792346~~~~~~~~0~~0~~~~~~
PSU~~WEST BENGAL~~SOUTH 24 PARGANAS~~1~~21~~54
~~11018~~WB19~~2011~~Q1~~5~~1~~0~~6045~~0~~366~~315~~51~~~~~~~~5202.6639344262294~~842.3360655737705~~~~~~~~0~~0~~~~~~
PSU~~WEST BENGAL~~SOUTH 24 PARGANAS~~1~~21~~54
~~11018~~WB19~~2011~~Q3~~4~~1~~0~~6195~~0~~366~~167~~199~~~~~~~~2826.6803278688526~~3368.3196721311474~~~~~~~~0~~0~~~~~~
PSU~~WEST BENGAL~~SOUTH 24 PARGANAS~~1~~21~~54
~~6027~~WB19~~2011~~Q2~~14~~1~~0~~6195~~0~~366~~184~~182~~~~~~~~3114.4262295081967~~3080.5737704918033~~~~~~~~0~~0~~~~~~
PSU~~WEST BENGAL~~SOUTH 24 PARGANAS~~1~~21~~54
~~6027~~WB19~~2011~~Q3~~7~~1~~0~~6195~~0~~366~~183~~183~~~~~~~~3097.5~~3097.5~~~~~~~~0~~0~~~~~~
PSU~~WEST BENGAL~~SOUTH 24 PARGANAS~~1~~21~~54
~~6027~~WB19~~2011~~Q4~~14~~1~~0~~6195~~0~~366~~87~~279~~~~~~~~1472.5819672131147~~4722.4180327868853~~~~~~~~0~~0~~~~~~
PSU~~WEST BENGAL~~SOUTH 24 PARGANAS~~1~~21~~54
~~66014~~WB19~~2011~~Q1~~14~~1~~0~~6045~~0~~366~~287~~79~~~~~~~~4740.2049180327867~~1304.795081967213~~~~~~~~0~~0~~~~~~
PSU~~WEST BENGAL~~SOUTH 24 PARGANAS~~1~~21~~54
~~66014~~WB19~~2011~~Q1~~9~~2~~0~~9800~~0~~732~~629~~103~~~~~~~~8198.920765027322~~1601.0792349726776~~~~~~~~0~~0~~~~~~
PSU~~WEST BENGAL~~SOUTH 24 PARGANAS~~1~~21~~54~~10016~~WB19~~2011~~Q4~~11~~1~~0~~8285~~0~~366~~74~~292~~~~~~~~1675.1092896174864~~6609.890710382514~~~~~~~~0~~0~~~~~~
答案 0 :(得分:1)
文件在完整的数据行之间有一些换行符,因此首先要根据行是否以“ ~~”开头将它们连接起来。我使用迭代的概念通过Reduce()
粘贴每行,然后您将得到一个长度为1的字符串,即我分配的text
。
text <- Reduce(function(x, y)
if(grepl("^~~", y)) paste0(x, y) else paste(x, y, sep = "\n"),
readLines("test.txt"))
data <- read.table(text = gsub("~~", ",", text), sep = ",")
data
# V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20 V21 V22 V23 V24 V25 V26 V27 V28 V29 V30 V31
# 1 PSU WEST BENGAL SOUTH 24 PARGANAS 1 21 53 66014 WB19 2011 Q3 13 3 0 61965 0 1098 323 775 NA NA NA 18428.798 43536.2022 NA NA NA 0 0 NA NA NA
# 2 PSU WEST BENGAL SOUTH 24 PARGANAS 1 21 54 11018 WB19 2011 Q1 5 1 0 6045 0 366 315 51 NA NA NA 5202.664 842.3361 NA NA NA 0 0 NA NA NA
# 3 PSU WEST BENGAL SOUTH 24 PARGANAS 1 21 54 11018 WB19 2011 Q3 4 1 0 6195 0 366 167 199 NA NA NA 2826.680 3368.3197 NA NA NA 0 0 NA NA NA
# 4 PSU WEST BENGAL SOUTH 24 PARGANAS 1 21 54 6027 WB19 2011 Q2 14 1 0 6195 0 366 184 182 NA NA NA 3114.426 3080.5738 NA NA NA 0 0 NA NA NA
# 5 PSU WEST BENGAL SOUTH 24 PARGANAS 1 21 54 6027 WB19 2011 Q3 7 1 0 6195 0 366 183 183 NA NA NA 3097.500 3097.5000 NA NA NA 0 0 NA NA NA
# 6 PSU WEST BENGAL SOUTH 24 PARGANAS 1 21 54 6027 WB19 2011 Q4 14 1 0 6195 0 366 87 279 NA NA NA 1472.582 4722.4180 NA NA NA 0 0 NA NA NA
# 7 PSU WEST BENGAL SOUTH 24 PARGANAS 1 21 54 66014 WB19 2011 Q1 14 1 0 6045 0 366 287 79 NA NA NA 4740.205 1304.7951 NA NA NA 0 0 NA NA NA
# 8 PSU WEST BENGAL SOUTH 24 PARGANAS 1 21 54 66014 WB19 2011 Q1 9 2 0 9800 0 732 629 103 NA NA NA 8198.921 1601.0792 NA NA NA 0 0 NA NA NA
# 9 PSU WEST BENGAL SOUTH 24 PARGANAS 1 21 54 10016 WB19 2011 Q4 11 1 0 8285 0 366 74 292 NA NA NA 1675.109 6609.8907 NA NA NA 0 0 NA NA NA
答案 1 :(得分:0)
不确定是否要使用以下内容。
假设示例中包含内容的文件名为dat.txt
。
fileName <- 'dat.txt'
writeLines(gsub("~~","\n",readChar(fileName, file.info(fileName)$size)))
(在您的控制台上)
PSU
WEST BENGAL
SOUTH 24 PARGANAS
1
21
53
66014
WB19
2011
Q3
13
3
0
61965
0
1098
323
775
18428.79781420765
43536.202185792346
0
0
PSU
WEST BENGAL
SOUTH 24 PARGANAS
1
21
54
11018
WB19
2011
Q1
5
1
0
6045
0
366
315
51
5202.6639344262294
842.3360655737705
0
0
PSU
WEST BENGAL
SOUTH 24 PARGANAS
1
21
54
11018
WB19
2011
Q3
4
1
0
6195
0
366
167
199
2826.6803278688526
3368.3196721311474
0
0
PSU
WEST BENGAL
SOUTH 24 PARGANAS
1
21
54
6027
WB19
2011
Q2
14
1
0
6195
0
366
184
182
3114.4262295081967
3080.5737704918033
0
0
PSU
WEST BENGAL
SOUTH 24 PARGANAS
1
21
54
6027
WB19
2011
Q3
7
1
0
6195
0
366
183
183
3097.5
3097.5
0
0
PSU
WEST BENGAL
SOUTH 24 PARGANAS
1
21
54
6027
WB19
2011
Q4
14
1
0
6195
0
366
87
279
1472.5819672131147
4722.4180327868853
0
0
PSU
WEST BENGAL
SOUTH 24 PARGANAS
1
21
54
66014
WB19
2011
Q1
14
1
0
6045
0
366
287
79
4740.2049180327867
1304.795081967213
0
0
PSU
WEST BENGAL
SOUTH 24 PARGANAS
1
21
54
66014
WB19
2011
Q1
9
2
0
9800
0
732
629
103
8198.920765027322
1601.0792349726776
0
0
PSU
WEST BENGAL
SOUTH 24 PARGANAS
1
21
54
10016
WB19
2011
Q4
11
1
0
8285
0
366
74
292
1675.1092896174864
6609.890710382514
0
0