分隔文件中的换行符

时间:2019-11-27 07:53:37

标签: r text

我正在尝试将沉重的文本文件导入R。文本文件中有很多换行符,如下所示。如何将数据转换为原始格式?请注意,此处的分隔符为~~

这是它的样子

Raw image of how the file looks

PSU~~WEST BENGAL~~SOUTH 24 PARGANAS~~1~~21~~53~~66014~~WB19~~2011~~Q3~~13~~3~~0~~61965~~0~~1098~~323~~775~~~~~~~~18428.79781420765~~43536.202185792346~~~~~~~~0~~0~~~~~~
PSU~~WEST BENGAL~~SOUTH 24 PARGANAS~~1~~21~~54
~~11018~~WB19~~2011~~Q1~~5~~1~~0~~6045~~0~~366~~315~~51~~~~~~~~5202.6639344262294~~842.3360655737705~~~~~~~~0~~0~~~~~~
PSU~~WEST BENGAL~~SOUTH 24 PARGANAS~~1~~21~~54
~~11018~~WB19~~2011~~Q3~~4~~1~~0~~6195~~0~~366~~167~~199~~~~~~~~2826.6803278688526~~3368.3196721311474~~~~~~~~0~~0~~~~~~
PSU~~WEST BENGAL~~SOUTH 24 PARGANAS~~1~~21~~54
~~6027~~WB19~~2011~~Q2~~14~~1~~0~~6195~~0~~366~~184~~182~~~~~~~~3114.4262295081967~~3080.5737704918033~~~~~~~~0~~0~~~~~~
PSU~~WEST BENGAL~~SOUTH 24 PARGANAS~~1~~21~~54
~~6027~~WB19~~2011~~Q3~~7~~1~~0~~6195~~0~~366~~183~~183~~~~~~~~3097.5~~3097.5~~~~~~~~0~~0~~~~~~
PSU~~WEST BENGAL~~SOUTH 24 PARGANAS~~1~~21~~54
~~6027~~WB19~~2011~~Q4~~14~~1~~0~~6195~~0~~366~~87~~279~~~~~~~~1472.5819672131147~~4722.4180327868853~~~~~~~~0~~0~~~~~~
PSU~~WEST BENGAL~~SOUTH 24 PARGANAS~~1~~21~~54
~~66014~~WB19~~2011~~Q1~~14~~1~~0~~6045~~0~~366~~287~~79~~~~~~~~4740.2049180327867~~1304.795081967213~~~~~~~~0~~0~~~~~~
PSU~~WEST BENGAL~~SOUTH 24 PARGANAS~~1~~21~~54
~~66014~~WB19~~2011~~Q1~~9~~2~~0~~9800~~0~~732~~629~~103~~~~~~~~8198.920765027322~~1601.0792349726776~~~~~~~~0~~0~~~~~~
PSU~~WEST BENGAL~~SOUTH 24 PARGANAS~~1~~21~~54~~10016~~WB19~~2011~~Q4~~11~~1~~0~~8285~~0~~366~~74~~292~~~~~~~~1675.1092896174864~~6609.890710382514~~~~~~~~0~~0~~~~~~

2 个答案:

答案 0 :(得分:1)

文件在完整的数据行之间有一些换行符,因此首先要根据行是否以“ ~~”开头将它们连接起来。我使用迭代的概念通过Reduce()粘贴每行,然后您将得到一个长度为1的字符串,即我分配的text

text <- Reduce(function(x, y)
  if(grepl("^~~", y)) paste0(x, y) else paste(x, y, sep = "\n"),
  readLines("test.txt"))

data <- read.table(text = gsub("~~", ",", text), sep = ",")
data

#    V1          V2                V3 V4 V5 V6    V7   V8   V9 V10 V11 V12 V13   V14 V15  V16 V17 V18 V19 V20 V21       V22        V23 V24 V25 V26 V27 V28 V29 V30 V31
# 1 PSU WEST BENGAL SOUTH 24 PARGANAS  1 21 53 66014 WB19 2011  Q3  13   3   0 61965   0 1098 323 775  NA  NA  NA 18428.798 43536.2022  NA  NA  NA   0   0  NA  NA  NA
# 2 PSU WEST BENGAL SOUTH 24 PARGANAS  1 21 54 11018 WB19 2011  Q1   5   1   0  6045   0  366 315  51  NA  NA  NA  5202.664   842.3361  NA  NA  NA   0   0  NA  NA  NA
# 3 PSU WEST BENGAL SOUTH 24 PARGANAS  1 21 54 11018 WB19 2011  Q3   4   1   0  6195   0  366 167 199  NA  NA  NA  2826.680  3368.3197  NA  NA  NA   0   0  NA  NA  NA
# 4 PSU WEST BENGAL SOUTH 24 PARGANAS  1 21 54  6027 WB19 2011  Q2  14   1   0  6195   0  366 184 182  NA  NA  NA  3114.426  3080.5738  NA  NA  NA   0   0  NA  NA  NA
# 5 PSU WEST BENGAL SOUTH 24 PARGANAS  1 21 54  6027 WB19 2011  Q3   7   1   0  6195   0  366 183 183  NA  NA  NA  3097.500  3097.5000  NA  NA  NA   0   0  NA  NA  NA
# 6 PSU WEST BENGAL SOUTH 24 PARGANAS  1 21 54  6027 WB19 2011  Q4  14   1   0  6195   0  366  87 279  NA  NA  NA  1472.582  4722.4180  NA  NA  NA   0   0  NA  NA  NA
# 7 PSU WEST BENGAL SOUTH 24 PARGANAS  1 21 54 66014 WB19 2011  Q1  14   1   0  6045   0  366 287  79  NA  NA  NA  4740.205  1304.7951  NA  NA  NA   0   0  NA  NA  NA
# 8 PSU WEST BENGAL SOUTH 24 PARGANAS  1 21 54 66014 WB19 2011  Q1   9   2   0  9800   0  732 629 103  NA  NA  NA  8198.921  1601.0792  NA  NA  NA   0   0  NA  NA  NA
# 9 PSU WEST BENGAL SOUTH 24 PARGANAS  1 21 54 10016 WB19 2011  Q4  11   1   0  8285   0  366  74 292  NA  NA  NA  1675.109  6609.8907  NA  NA  NA   0   0  NA  NA  NA

答案 1 :(得分:0)

不确定是否要使用以下内容。

假设示例中包含内容的文件名为dat.txt

fileName <- 'dat.txt'
writeLines(gsub("~~","\n",readChar(fileName, file.info(fileName)$size)))

(在您的控制台上)

PSU
WEST BENGAL
SOUTH 24 PARGANAS
1
21
53
66014
WB19
2011
Q3
13
3
0
61965
0
1098
323
775



18428.79781420765
43536.202185792346



0
0



PSU
WEST BENGAL
SOUTH 24 PARGANAS
1
21
54

11018
WB19
2011
Q1
5
1
0
6045
0
366
315
51



5202.6639344262294
842.3360655737705



0
0



PSU
WEST BENGAL
SOUTH 24 PARGANAS
1
21
54

11018
WB19
2011
Q3
4
1
0
6195
0
366
167
199



2826.6803278688526
3368.3196721311474



0
0



PSU
WEST BENGAL
SOUTH 24 PARGANAS
1
21
54

6027
WB19
2011
Q2
14
1
0
6195
0
366
184
182



3114.4262295081967
3080.5737704918033



0
0



PSU
WEST BENGAL
SOUTH 24 PARGANAS
1
21
54

6027
WB19
2011
Q3
7
1
0
6195
0
366
183
183



3097.5
3097.5



0
0



PSU
WEST BENGAL
SOUTH 24 PARGANAS
1
21
54

6027
WB19
2011
Q4
14
1
0
6195
0
366
87
279



1472.5819672131147
4722.4180327868853



0
0



PSU
WEST BENGAL
SOUTH 24 PARGANAS
1
21
54

66014
WB19
2011
Q1
14
1
0
6045
0
366
287
79



4740.2049180327867
1304.795081967213



0
0



PSU
WEST BENGAL
SOUTH 24 PARGANAS
1
21
54

66014
WB19
2011
Q1
9
2
0
9800
0
732
629
103



8198.920765027322
1601.0792349726776



0
0



PSU
WEST BENGAL
SOUTH 24 PARGANAS
1
21
54
10016
WB19
2011
Q4
11
1
0
8285
0
366
74
292



1675.1092896174864
6609.890710382514



0
0