我一直坚持这个问题,我希望有人可以提供帮助。我试图遍历一个名为transcripts_test.csv的csv文件中的列[1],并且对于[1]中的每个字符串,我在另一个名为coors_test.csv的csv文件中创建的名为OCR_dict的字典中匹配相同的字符串。
transcripts_test.csv包含:
ENST00000347869,chr3,50126341,50156454,1
ENST00000452166,chr14,21679063,21737583,2
ENST00000452166,chr14,21679063,21737583,2
coors_test.csv包含:
chr3,141030221,141031065,Valid_10009,1000,+
chr6,141030221,141031065,Valid_10005,1000,+
chr14,141047080,141047610,Valid_10006,1000,+
这是我的代码:
import csv
with open('coors_test.csv', mode='r') as coors_infile:
coors_reader = csv.reader(coors_infile)
for row in coors_reader:
chromo = row[0]
start = row[1]
end = row[2]
coordinates_list = [chromo,start,end]
OCR_dict = {row[3]:coordinates_list}
for keys,values in OCR_dict.items():
OCR_chromosome = values[0]
with open('transcripts_test.csv', mode='r') as transcripts_infile:
transcripts_reader = csv.reader(transcripts_infile)
for row in transcripts_reader:
transcript_chromosome = row[1]
if transcript_chromosome == OCR_chromosome:
print(transcript_chromosome, keys, OCR_chromosome)
当我执行上面的代码时,我得到的输出是:
chr14 Valid_10006 chr14
chr14 Valid_10006 chr14
我正在寻找的输出是:
chr3 Valid_10009 chr3
chr14 Valid_10006 chr14
chr14 Valid_10006 chr14
为什么我的代码不匹配并打印chr3 Valid_10009 chr3
?任何帮助将不胜感激。谢谢!
答案 0 :(得分:2)
这不是你想要的:
coordinates_list = [chromo,start,end]
OCR_dict = {row[3]:coordinates_list}
for keys,values in OCR_dict.items():
OCR_chromosome = values[0]
它在每次迭代中创建一个 new dict,并且dict只有一个键。然后循环遍历那一项并更改局部变量...
你想要的可能更像是这样:
from collections import defaultdict
OCR_dict = defaultdict(list)
with open('coors_test.csv', mode='r') as coors_infile:
coors_reader = csv.reader(coors_infile)
for row in coors_reader:
chromo = row[0]
start = row[1]
end = row[2]
# OCR_dict is a mapping `chromo -> [(start,end), (start,end), ...]`
OCR_dict[chromo].append((start,end))
with open('transcripts_test.csv', mode='r') as transcripts_infile:
transcripts_reader = csv.reader(transcripts_infile)
for row in transcripts_reader:
transcript_chromosome = row[1]
# look that chromosome up in the dict and print it if it exists
if transcript_chromosome in OCR_dict:
print(transcript_chromosome, OCR_dict[transcript_chromosome])
答案 1 :(得分:0)
OCR_chromosome
设置为遇到的chromo
的最后一个值。换句话说,OCR_chromosome
将是coors_test.csv最后一行中的第一个值。 chr14将是唯一可以匹配的值。我不确定你到底想要什么,但这应该产生你正在寻找的chromo
值:
import csv
chromos = set()
with open('coors_test.csv', mode='r') as coors_infile:
for row in csv.reader(coors_infile):
chromo = row[0]
chromos.add(chromo)
with open('transcripts_test.csv', mode='r') as transcripts_infile:
for row in csv.reader(transcripts_infile):
transcript_chromosome = row[1]
if transcript_chromosome in chromos:
print transcript_chromosome