我有这个带有x / y坐标的文件,我正在尝试解决这个问题。该文件包含各种信息,但坐标位于一行内的相同位置,如下所示:
IMPORTANT information 12213 1541515 COORDINATEX.COORDINATEY
IMPORTANT assadad213114141 asdadad COORDINATEX.COORDINATEY
IMPORTANT assadad2ssss4141 asdadad COORDINATEX.COORDINATEY
IMPORTANT ass 141 asd135566666666d COORDINATEX.COORDINATEY
我想要的是删除坐标(COORDINATEX.COORDINATEY)相同的所有行,并且标记为IMPORTANT的前10个字符是相同的,除了第一个。我已经尝试在unix中使用sort -u,但是这不起作用,因为整行必须是相同的,这不是这里的情况。
示例:
IMPORTANTLINE1 713)#!=%!3839413!"¤#(!¤! COORDINATEX.COORDINATEY1
IMPORTANTLINE1 1339220"##"#"#"""""""""" COORDINATEX.COORDINATEY144
IMPORTANTLINE1 fsafasdasd!38aaa!"¤#(!¤! COORDINATEX.COORDINATEY1
IMPORTANTLINE1 713)#!=%!3839413!"¤#(!¤! COORDINATEX.COORDINATEY1
IMPORTANTLINE2 sadasdasdadadadadadadada COORDINATEX.COORDINATE2
IMPORTANTLINE2 sadasdasdadadadadadadada COORDINATEX.COORDINATE1
IMPORTANTLINE2 sadasda333333333dadadada COORDINATEX.COORDINATE1
应如下所示:
IMPORTANTLINE1 713)#!=%!3839413!"¤#(!¤! COORDINATEX.COORDINATEY1
IMPORTANTLINE1 1339220"##"#"#"""""""""" COORDINATEX.COORDINATEY144
IMPORTANTLINE2 sadasdasdadadadadadadada COORDINATEX.COORDINATE2
IMPORTANTLINE2 sadasdasdadadadadadadada COORDINATEX.COORDINATE1
提前致谢!
答案 0 :(得分:1)
对于从文件中读取的每一行,请获取定义重复的部分并将它们分成单个字符串。检查一个集合以查看它是否包含字符串,如果没有,则将该行写入输出并将该字符串放入集合中。
答案 1 :(得分:1)
所以,每行有四个字段,由空格分隔。在第二个领域 - 是吗?
lines = []
found_lines = set()
with open("mydatafile.dat", "rt") as data_file:
for line in data_file:
#avoid stopping on blank lines (usually the last line in the file is blank)
if not line.strip(): continue
# separate fields
imp, field1, x, y = line.split()
#separate significative chars in field1:
field1 = field1[1:10] # "first 10 chars, except first"
if (field1, x, y) in found_lines:
continue
found_lines.add(field1, x ,y)
lines.append(line)
答案 2 :(得分:0)
我认为这是:
import re
data='''
IMPORTANTLINE1 713)#!=%!3839413!"#(!! COORDINATEX.COORDINATEY1
IMPORTANTLINE1 1339220"##"#"#"""""""""" COORDINATEX.COORDINATEY144
IMPORTANTLINE1 fsafasdasd!38aaa!"#(!! COORDINATEX.COORDINATEY1
IMPORTANTLINE1 713)#!=%!3839413!"#(!! COORDINATEX.COORDINATEY1
IMPORTANTLINE2 sadasdasdadadadadadadada COORDINATEX.COORDINATE2
IMPORTANTLINE2 sadasdasdadadadadadadada COORDINATEX.COORDINATE1
IMPORTANTLINE2 sadasda333333333dadadada COORDINATEX.COORDINATE1
'''
d={}
data_out=[]
for i,line in enumerate(data.split('\n')):
m=re.search(r'^(IMPORTANTLINE\d+).*(COORDINATEX)\.(COORDINATE(Y)?\d+)',line)
if m:
h=m.group(1)+m.group(2)+m.group(3)
if h not in d:
d[h]=i
data_out.append(line)
for line in data_out:
print line
输出:
IMPORTANTLINE1 713)#!=%!3839413!"#(!! COORDINATEX.COORDINATEY1
IMPORTANTLINE1 1339220"##"#"#"""""""""" COORDINATEX.COORDINATEY144
IMPORTANTLINE2 sadasdasdadadadadadadada COORDINATEX.COORDINATE2
IMPORTANTLINE2 sadasdasdadadadadadadada COORDINATEX.COORDINATE1