我有两个文件,第一个包含必要的数据:1st file,第二个包含要保留的行列表:2nd file
我试图通过python代码进行过滤:
import os.path
# loading the input files
output = open('descmat.txt', 'w+')
input = open('descmat_all.txt', 'r')
lists = open('training_lines.txt', 'r')
print "Test1"
# reading the input files
list_lines = lists.readlines()
list_input = input.readlines()
print "Test2"
output.write(list_input[0])
for i in range(len(list_lines)):
for ii in range(len(list_input)):
position = list_input[ii].find(list_lines[i][:-1])
if position > -1:
output.write(list_input[ii])
break
print "Test3"
output.close()
但是这个脚本找不到任何匹配项。仅保留第一个文件中与第二个文件匹配的行的最简单方法是什么?
答案 0 :(得分:2)
对于这类问题,Python具有set
数据类型
# prepare a set of normalised training lines
# stripping new lines avoids possible problems with the last line
OK_lines = set(line.rstrip('\n') for line in open('training_lines.txt'))
# when you leave a with block, all the resources are released
# i.e., no need for file.close()
with open('descmat_all.txt') as infile:
with open('descmat.txt', 'w') as outfile:
for line in infile:
# OK_lines have been stripped, input lines must be stripped as well
if line.rstrip('\n') in OK_lines:
outfile.write(line)
boffi@debian:~/Documents/tmp$ cat check.py
# prepare a set of normalised training lines
# stripping new lines avoids possible problems with the last line
OK_lines = set(line.rstrip('\n') for line in open('training_lines.txt'))
# when you leave a with block, all the resources are released
# i.e., no need for file.close()
with open('descmat_all.txt') as infile:
with open('descmat.txt', 'w') as outfile:
for line in infile:
# OK_lines have been stripped, input lines must be stripped as well
if line.rstrip('\n') in OK_lines:
outfile.write(line)
boffi@debian:~/Documents/tmp$ cat training_lines.txt
ada
bob
boffi@debian:~/Documents/tmp$ cat descmat_all.txt
bob
doug
ada
doug
eddy
ada
bob
boffi@debian:~/Documents/tmp$ python check.py
boffi@debian:~/Documents/tmp$ cat descmat.txt
bob
ada
ada
bob
boffi@debian:~/Documents/tmp$
答案 1 :(得分:1)
如果您将文件都读入列表,则可以简单地比较列表。看here怎么做。 lightbulb, RES_16M, 711, 1, 16M
lightbulb, RES_16Ms, 7112, 1, 16Mk
card, CAP_2700pf, 75, 26, 2700pf
card, CAP_2700pfs, 75, 262, 2700pff
Current, ASDba, 0, 800, "doesn't follow trend"
Current, TL741, 20, 12, "doesn't either"
应该包含可以匹配的字符串列表。
out
答案 2 :(得分:0)
替换这部分代码:
for i in range(len(list_lines)):
for ii in range(len(list_input)):
position = list_input[ii].find(list_lines[i][:-1])
if position > -1:
output.write(list_input[ii])
break
由此:
for i in range(len(list_lines)):
for ii in range(len(list_input)):
if list_input[ii][:26] == list_lines[i][:-1]:
output.write(list_input[ii])
完全符合我的需要。