我想编写一个读取文件的程序,将4个字符长的短文本字符串转换为4个字符的新字符串。目前,我读了一个制表符分隔的文本文件,其中包含两列:“旧标记”和“新标记”。我能够成功构建一个字典,将“旧标签”作为键,将“新标签”作为值。
当我尝试使用maketrans()
和str.translate()
时出现问题。不知怎的,我的“old_tag”正在转换为“new_tag”,我甚至拥有在我的字典中!我附上了我的意思截图。
"P020" should get converted to "AGAC" as outline in my dictionary.
这是我的函数,它执行字符串翻译:
def translate_tag(f_in, old_tag, trn_dict):
"""Function to convert any old tags to their new format based on the translation dictionary (variable "trn_dict")."""
try:
# tag_lookup = trn_dict[old_tag]
# trans = maketrans(old_tag, tag_lookup)
trans = maketrans(old_tag, trn_dict[old_tag]) # Just did the above two lines on one line
except KeyError:
print("Error in file {}! The tag {} wasn't found in the translation table. "
"Make sure the translation table is up to date. "
"The program will continue with the rest of the file, but this tag will be skipped!".format(f_in,
old_tag))
return None
new_tag = old_tag.translate(trans)
return new_tag
Here's我的翻译表。它是制表符分隔的文本文件,旧标记是第1列,新标记是第2列。我从旧标记转换为新标记。
奇怪的是,它可以很好地转换为某些标签。例如,“P010”被正确翻译。可能导致问题的原因是什么?
答案 0 :(得分:1)
您不应该使用maketrans
,因为它适用于个别字符(每the official documentation)。将它作为字典,将原始文本(第1列)作为键,将新文本(第2列)作为其值。
然后,您可以使用x
查找任何标记trn_dict[x]
,并使用try
或事先if x in trn_dict
进行测试。
database = """P001 AAAA
P002 AAAT
P003 AAAG
P004 AAAC
P005 AATA
P006 AATT
P007 AATG
P008 AATC
P009 ATAA
P010 ATAT
P011 ATAG
P012 ATAC
P013 ATTA
P014 ATTT
P015 ATTG
P016 ATTC
P017 AGAA
P018 AGAT
P019 AGAG
P020 AGAC
P021 AGTA
P022 AGTT
P023 AGTG
P024 AGTC
""".splitlines()
trn_dict = {str.split()[0]:str.split()[1] for str in database}
def translate_tag(old_tag, trn_dict):
"""Function to convert any old tags to their new format based on the translation dictionary (variable "trn_dict")."""
try:
return trn_dict[old_tag]
except KeyError:
print("Error in file {}! The tag {} wasn't found in the translation table. "
"Make sure the translation table is up to date. "
"The program will continue with the rest of the file, but this tag will be skipped!")
return None
print (translate_tag('P020', trn_dict))
显示预期值AGAC
。
(那个字符串到列表到字典的代码很快就能获得程序中的数据,并不是这个方法的一部分。)