很抱歉,如果标题有点令人困惑。我有两个文件,file1和file2都有很多列。我需要在某列中找到常用元素,如果它们匹配,则应将file1中的整行添加到file2中的匹配行:
e.g:
file1.txt:
[a,b,c],
[x,e,y],
...
file2.txt:
[d,e,f],
[s,p,z],
...
注意,这里只是元素“e”匹配,结果应该是(在一个新文件中,但是包含file2.txt中的所有信息):
newfile.txt:
[d,e,f],[x,e,y],
[s,p,z]
...
我的想法:
output = open('file2.txt', 'w')
for f in variants:
add = ""
if f[0] in sources:
add = ???
output.write("\t".join(f) + add + "\n")
output.close()
“variants”包含file1.txt中的列表,我真的不明白如何将file1.txt中的其余信息添加到file2.txt中的匹配行,请帮忙!
答案 0 :(得分:0)
from collections import defaultdict
def parse_data(line):
# Returns a list of values from line of text.
return line[1:-2].split(',')
with open('newfile.txt', 'wb') as new_file, open('file1.txt', 'rb') as f1, open('file2.txt', 'rb') as f2:
mapping = defaultdict(list)
# Zero-based indexing.
CERTAIN_COLUMN = 1
for line in f1:
# Remove new-lines and get comma-separated values.
line = line.strip()
columns = parse_data(line)
mapping[columns[CERTAIN_COLUMN]].append(line)
for line in f2:
line = line.strip()
columns = parse_data(line)
for matched in mapping[columns[CERTAIN_COLUMN]]:
new_file.write('{},{},\n'.format(matched, line))
第一个循环用search_criteria
- >填充dict。 matched rows
映射,即e
- > ['[x,e,y]']
。
第二个循环打印符合file2.txt