我有一个唯一字符串列表(“样本ID”)。我还有一个表,其中包含第一个列表中的字符串子集,每个字符串都与下一列中的另一个字符串(“样本特征”)相关联(空格作为分隔符)。例如:
# All Sample IDs
id-001
id-002
id-003
id-004
id-005
# Subset of Samples, with associated characteristics string
id-001 'batch-1, yellow'
id-003 'batch-1, yellow'
id-005 'batch-9, blue'
# Desired Output
id-001 'batch-1, yellow'
id-002 NA
id-003 'batch-1, yellow'
id-004 NA
id-005 'batch-9, blue'
我正在尝试将两个列表组合在一起,创建一个表,其中第一列将包含所有"样本ID",第二列将包含每个ID的相应“样本特征”字符串或“NA” “如果第二个列表中没有ID。
我一直在使用此代码比较两个ID列表,以找出哪些样本ID可用“样本特征”字符串:
with open('FILE1.txt', 'r') as file1:
with open('FILE2.txt', 'r') as file2:
same = set(file1).intersection(file2)
with open('RESULT.txt', 'w') as file_out:
for line in same:
file_out.write(line)
我无法弄清楚如何获取这些ID的“样本特征”并将它们与第一个列表组合在一起。我认为使用dict应该是第一步:
with open('FILE1.txt', 'r') as file1, open('FILE2.txt', 'r') as file2:
data1 = file1
data2 = dict(file2)
我不知道如何继续。
答案 0 :(得分:0)
我认为你正在寻找类似的东西:
import csv
results = {}
with open('FILE1.txt') as file1:
for id_num in file1:
results[id_num.strip()] = None
with open('FILE2.txt') as file2:
csv_reader = csv.reader(file2, delimiter=' ')
for row in csv_reader:
id_num, characteristic = row
results[id_num] = characteristic
with open('RESULT.txt', 'w') as file_out:
csv_writer = csv.writer(file_out, delimiter=' ')
for id_num, characteristic in results.items():
if characteristic is None:
characteristic = 'NA'
row = [id_num, characteristic]
csv_writer.writerow(row)
这基本上设置了一个dict,其中包含第一个文件中的所有id作为dict的键。
然后它遍历第二个文件的每一行,以更新出现的每个id的dict。
然后它将更新的dict写入新的csv文件。