删除文件中特定位置的相同坐标

时间:2012-03-30 14:44:19

标签: python

我有这个带有x / y坐标的文件,我正在尝试解决这个问题。该文件包含各种信息,但坐标位于一行内的相同位置,如下所示:

IMPORTANT information 12213   1541515      COORDINATEX.COORDINATEY
IMPORTANT assadad213114141 asdadad         COORDINATEX.COORDINATEY
IMPORTANT assadad2ssss4141 asdadad         COORDINATEX.COORDINATEY
IMPORTANT ass 141 asd135566666666d         COORDINATEX.COORDINATEY

我想要的是删除坐标(COORDINATEX.COORDINATEY)相同的所有行,并且标记为IMPORTANT的前10个字符是相同的,除了第一个。我已经尝试在unix中使用sort -u,但是这不起作用,因为整行必须是相同的,这不是这里的情况。

示例:

IMPORTANTLINE1 713)#!=%!3839413!"¤#(!¤! COORDINATEX.COORDINATEY1
IMPORTANTLINE1 1339220"##"#"#"""""""""" COORDINATEX.COORDINATEY144
IMPORTANTLINE1 fsafasdasd!38aaa!"¤#(!¤! COORDINATEX.COORDINATEY1
IMPORTANTLINE1 713)#!=%!3839413!"¤#(!¤! COORDINATEX.COORDINATEY1
IMPORTANTLINE2 sadasdasdadadadadadadada COORDINATEX.COORDINATE2
IMPORTANTLINE2 sadasdasdadadadadadadada COORDINATEX.COORDINATE1
IMPORTANTLINE2 sadasda333333333dadadada COORDINATEX.COORDINATE1

应如下所示:

IMPORTANTLINE1 713)#!=%!3839413!"¤#(!¤! COORDINATEX.COORDINATEY1
IMPORTANTLINE1 1339220"##"#"#"""""""""" COORDINATEX.COORDINATEY144
IMPORTANTLINE2 sadasdasdadadadadadadada COORDINATEX.COORDINATE2
IMPORTANTLINE2 sadasdasdadadadadadadada COORDINATEX.COORDINATE1

提前致谢!

3 个答案:

答案 0 :(得分:1)

对于从文件中读取的每一行,请获取定义重复的部分并将它们分成单个字符串。检查一个集合以查看它是否包含字符串,如果没有,则将该行写入输出并将该字符串放入集合中。

答案 1 :(得分:1)

所以,每行有四个字段,由空格分隔。在第二个领域 - 是吗?

lines = []
found_lines = set()
with open("mydatafile.dat", "rt") as data_file:
   for line in data_file:
       #avoid stopping on blank lines (usually the last line in the file is blank)
       if not line.strip(): continue
       # separate fields
       imp, field1, x, y = line.split()
       #separate significative chars in field1:
       field1 = field1[1:10]  # "first 10 chars, except first"
       if (field1, x, y) in found_lines:
            continue
       found_lines.add(field1, x ,y)
       lines.append(line)

答案 2 :(得分:0)

我认为这是:

import re

data='''
IMPORTANTLINE1 713)#!=%!3839413!"#(!! COORDINATEX.COORDINATEY1
IMPORTANTLINE1 1339220"##"#"#"""""""""" COORDINATEX.COORDINATEY144
IMPORTANTLINE1 fsafasdasd!38aaa!"#(!! COORDINATEX.COORDINATEY1
IMPORTANTLINE1 713)#!=%!3839413!"#(!! COORDINATEX.COORDINATEY1
IMPORTANTLINE2 sadasdasdadadadadadadada COORDINATEX.COORDINATE2
IMPORTANTLINE2 sadasdasdadadadadadadada COORDINATEX.COORDINATE1
IMPORTANTLINE2 sadasda333333333dadadada COORDINATEX.COORDINATE1
'''
d={}
data_out=[]

for i,line in enumerate(data.split('\n')):
    m=re.search(r'^(IMPORTANTLINE\d+).*(COORDINATEX)\.(COORDINATE(Y)?\d+)',line)
    if m:
        h=m.group(1)+m.group(2)+m.group(3)
        if h not in d:
            d[h]=i
            data_out.append(line)

for line in data_out:
    print line  

输出:

IMPORTANTLINE1 713)#!=%!3839413!"#(!! COORDINATEX.COORDINATEY1
IMPORTANTLINE1 1339220"##"#"#"""""""""" COORDINATEX.COORDINATEY144
IMPORTANTLINE2 sadasdasdadadadadadadada COORDINATEX.COORDINATE2
IMPORTANTLINE2 sadasdasdadadadadadadada COORDINATEX.COORDINATE1