关于白空间填充的成功结果的第2部分

时间:2017-10-27 13:25:21

标签: python duplicates

所以,我的第一个问题得到了正确回答。作为参考,你可以去这里......

How to fill the white-space with info while leaving the rest unchanged?

简而言之,我需要这个......

POLYGON_POINT -79.750000000217,42.017498354525,0
POLYGON_POINT -79.750000000217,42.016478251402,0
POLYGON_POINT -79.750598748133,42.017193264943,0
POLYGON_POINT -79.750000000217,42.017498354525,0


POLYGON_POINT -79.750000000217,42.085882815878,0
POLYGON_POINT -79.750000000217,42.082008734634,0
POLYGON_POINT -79.751045507507,42.082126409633,0
POLYGON_POINT -79.750281907508,42.083166574215,0
POLYGON_POINT -79.750781149174,42.084212672130,0
POLYGON_POINT -79.750000000217,42.085882815878,0

成为这个......

BEGIN_POLYGON
POLYGON_POINT -79.750000000217,42.017498354525,0
POLYGON_POINT -79.750000000217,42.016478251402,0
POLYGON_POINT -79.750598748133,42.017193264943,0
POLYGON_POINT -79.750000000217,42.017498354525,0
END_POLY
BEGIN_POLYGON
POLYGON_POINT -79.750000000217,42.085882815878,0
POLYGON_POINT -79.750000000217,42.082008734634,0
POLYGON_POINT -79.751045507507,42.082126409633,0
POLYGON_POINT -79.750281907508,42.083166574215,0
POLYGON_POINT -79.750781149174,42.084212672130,0
POLYGON_POINT -79.750000000217,42.085882815878,0
END_POLY

使用python脚本成功实现了这一目标。现在我发现我需要删除重复的行,特别是每个块的最后一行。该行关闭多边形,但构建批处理产生错误,因为它关闭了它自己的多边形。基本上我需要它在这一切的最后......

BEGIN_POLYGON
POLYGON_POINT -79.750000000217,42.017498354525,0
POLYGON_POINT -79.750000000217,42.016478251402,0
POLYGON_POINT -79.750598748133,42.017193264943,0
END_POLY
BEGIN_POLYGON
POLYGON_POINT -79.750000000217,42.085882815878,0
POLYGON_POINT -79.750000000217,42.082008734634,0
POLYGON_POINT -79.751045507507,42.082126409633,0
POLYGON_POINT -79.750281907508,42.083166574215,0
POLYGON_POINT -79.750781149174,42.084212672130,0
END_POLY

并且有3,415,978行可以通过。每个其他重复的卸妆器都会占用空白和所有措辞。嗯

3 个答案:

答案 0 :(得分:0)

正如评论中所指出的,请保留对前一行的引用:

with open('in.txt') as fin, open('out.txt', 'w') as fout:
    prev = None
    for i, line in enumerate(fin):
      if line.strip() != 'END_POLY' and prev:
        fout.write(prev)
      prev = line
      if not i % 10000:
        print('Processing line {}'.format(i))
    fout.write(line)

答案 1 :(得分:0)

如果你不想要重复的数据,你可以将列表转换成集合,然后转换成列表(从另一个问题的@ Jean-FrançoisFabre代码中稍加修改):

import itertools, collections

with open("file.txt") as f, open("fileout.txt","w") as fw:
    fw.writelines(itertools.chain.from_iterable([["BEGIN_POLYGON\n"]+list(collections.OrderedDict.fromkeys(v).keys())+["END_POLYGON\n"] for k,v in itertools.groupby(f,key = lambda l : bool(l.strip())) if k]))

如你所见,你可以看到:

print(list(collections.OrderedDict.fromkeys([1,1,1,1,1,1,2,2,2,2,5,3,3,3,3,3]).keys()))

它将是 - > [1, 2, 5, 3]并保留订单

答案 2 :(得分:0)

虽然不在python中,但如果使用sed

,这些类型的编辑非常简单
sed 'N;s/.*\n\(END_POLY\)/\1/' file.txt

它的作用基本上是它使用N一次读取2行,如果第二行包含字符串END_POLY,它会删除第一行,只留下END_POLY