Question

我需要访问.txt文件，它有2列和许多重复名称的行（使用Python）。我只想复制其中一列而不重复其中的名称，将其打印在新的.txt文件中。我试过了：

g = open(file,'r')
linesg = g.readlines()
h = open(file,'w+')
linesh = h.readlines()
for line in range(len(linesg)):
     if linesg[line] in linesh:
        line += 1
     else:
        h.write(linesg[line].split('\t')[1])

但我继续在.txt文件上重复名称。谁能帮助我？（是的，我是Python编程的新手）。非常感谢！

Answer 1

g = open(file,'r')
names = {}
for line in g.readlines():
    name = line.split('\t')[1] #Name is in the second tab
    names[name] = 1 #create a dictionary with the names

#names.keys() returns a list of all the names here
# change the file handle here if needed, or the original file would be overwritten. 
h = open(file,'w+')
for name in names.keys():
    h.write("%s\n"%name)

Answer 2

sep = '\t'
lines = open('in_file.txt').readlines()
lines_out = []
for line in lines:
    line = line.strip()
    parts = line.split(sep)
    line_out = "%s\n" %(parts[0],) # if only the first column is copied
    if line_out not in lines_out:
        lines_out.append(line_out)

h = open('out_file.txt','w')
h.writelines(lines_out)
h.close()

将其更改为部分[1]以复制第2列，..

复制列而不重复

2 个答案: