如何从大文本文件中选择行数?

时间:2012-07-01 15:03:24

标签: c++ algorithm

我想知道如何从某个文本文件中选择行数。举个例子:我有一个包含以下行的文本文件:

branch 27 : rect id 23400
rect:   -115.475609 -115.474907
    31.393650   31.411301
branch 28 : rect id 23398
rect:   -115.474907 -115.472282
    31.411301   31.417351
branch 29 : rect id 23396
rect:   -115.472282 -115.468033
    31.417351   31.427151
branch 30 : rect id 23394
rect:   -115.468033 -115.458733
    31.427151   31.438181
Non-Leaf Node:  level=1  count=31  address=53
branch 0 : rect id 42
rect:   -115.768539 -106.251556
    31.425039   31.717550
branch 1 : rect id 50
rect:   -109.559479 -106.009361
    31.296721   31.775299
branch 2 : rect id 51
rect:   -110.937401 -106.226143
    31.285870   31.771971
branch 3 : rect id 54
rect:   -109.584412 -106.069092
    31.285240   31.775230
branch 4 : rect id 56
rect:   -109.570961 -106.000954
    31.296721   31.780769
branch 5 : rect id 58
rect:   -115.806213 -106.366188
    31.400450   31.687519
branch 6 : rect id 59
rect:   -113.173859 -106.244057
    31.297440   31.627750
branch 7 : rect id 60
rect:   -115.811478 -106.278252
    31.400450   31.679470
branch 8 : rect id 61
rect:   -109.953888 -106.020111
    31.325319   31.775270
branch 9 : rect id 64
rect:   -113.070969 -106.015968
    31.331841   31.704750
branch 10 : rect id 68
rect:   -113.065689 -107.034576
    31.326300   31.770809
branch 11 : rect id 71
rect:   -112.333344 -106.059860
    31.284081   31.662920
branch 12 : rect id 73
rect:   -115.071083 -106.309677
    31.267879   31.466850
branch 13 : rect id 74
rect:   -116.094414 -106.286308
    31.236290   31.424770
branch 14 : rect id 75
rect:   -115.423264 -106.286308
    31.229691   31.415510
branch 15 : rect id 76
rect:   -116.111656 -106.313110
    31.259390   31.478300
branch 16 : rect id 77
rect:   -116.247467 -106.309677
    31.240231   31.451799
branch 17 : rect id 78
rect:   -116.170792 -106.094543
    31.156429   31.391781
branch 18 : rect id 79
rect:   -116.225723 -106.292709
    31.239960   31.442850
branch 19 : rect id 80
rect:   -116.268013 -105.769913
    31.157240   31.378111
branch 20 : rect id 82
rect:   -116.215424 -105.827202
    31.198441   31.383421
branch 21 : rect id 83
rect:   -116.095734 -105.826439
    31.197460   31.373819
branch 22 : rect id 84
rect:   -115.423264 -105.815018
    31.182640   31.368891
branch 23 : rect id 85
rect:   -116.221527 -105.776512
    31.160931   31.389830
branch 24 : rect id 86
rect:   -116.203369 -106.473831
    31.168350   31.367611
branch 25 : rect id 87
rect:   -115.727631 -106.501587
    31.189100   31.395941
branch 26 : rect id 88
rect:   -116.237289 -105.790756
    31.164780   31.358959
branch 27 : rect id 89
rect:   -115.791344 -105.990044
    31.072620   31.349529
branch 28 : rect id 90
rect:   -115.736847 -106.495079
    31.187969   31.376900
branch 29 : rect id 91
rect:   -115.721710 -106.000130
    31.160351   31.354601
branch 30 : rect id 92
rect:   -115.792236 -106.000793
    31.166620   31.378811
Leaf Node:  level=0  count=21  address=42
branch 0 : rect id 18312
rect:   -106.412270 -106.401367
    31.704750   31.717550
branch 1 : rect id 18288
rect:   -106.278252 -106.253387
    31.520321   31.548361

我只想要那些介于Non-Leaf Node level = 1和Leaf Node Level = 0之间的行,并且还有很多这样的段,我需要它们。

1 个答案:

答案 0 :(得分:1)

最简单的方法是将尽可能多的文件读入内存,然后扫描乞讨令牌。复制或处理所有数据,直到找到终止令牌。某些平台具有将文件拖入内存的功能,例如: mmap(),虽然这不是标准语言。

如果文件未更改,您可以保存令牌行的偏移量。

如果您确实需要按行号索引,请创建一个std::map<line number, offset>变量。逐行读取文件,并将行号和偏移量存储为读取文件。