字典中的意外字符串转换

时间:2018-01-16 22:12:46

标签: python string translate

我想编写一个读取文件的程序,将4个字符长的短文本字符串转换为4个字符的新字符串。目前,我读了一个制表符分隔的文本文件,其中包含两列:“旧标记”和“新标记”。我能够成功构建一个字典,将“旧标签”作为键,将“新标签”作为值。

当我尝试使用maketrans()str.translate()时出现问题。不知怎的,我的“old_tag”正在转换为“new_tag”,我甚至拥有在我的字典中!我附上了我的意思截图。

"P020" should get converted to "AGAC" as outline in my dictionary.

The error is that variable "old_tag" should get converted to "AGAC" as outlined in my dictionary, but it's instead getting converted to "ACAC" (look at variable "new_tag"). I don't even have ACAC in my translation table!

这是我的函数,它执行字符串翻译:

def translate_tag(f_in, old_tag, trn_dict):
"""Function to convert any old tags to their new format based on the translation dictionary (variable "trn_dict")."""
try:
    # tag_lookup = trn_dict[old_tag]
    # trans = maketrans(old_tag, tag_lookup)
    trans = maketrans(old_tag, trn_dict[old_tag])  # Just did the above two lines on one line
except KeyError:
    print("Error in file {}! The tag {} wasn't found in the translation table. "
          "Make sure the translation table is up to date. "
          "The program will continue with the rest of the file, but this tag will be skipped!".format(f_in,
                                                                                                      old_tag))
    return None
new_tag = old_tag.translate(trans)
return new_tag

Here's我的翻译表。它是制表符分隔的文本文件,旧标记是第1列,新标记是第2列。我从旧标记转换为新标记。

奇怪的是,它可以很好地转换为某些标签。例如,“P010”被正确翻译。可能导致问题的原因是什么?

1 个答案:

答案 0 :(得分:1)

您不应该使用maketrans,因为它适用于个别字符(每the official documentation)。将它作为字典,将原始文本(第1列)作为键,将新文本(第2列)作为其值。

然后,您可以使用x查找任何标记trn_dict[x],并使用try或事先if x in trn_dict进行测试。

database = """P001  AAAA
P002    AAAT
P003    AAAG
P004    AAAC
P005    AATA
P006    AATT
P007    AATG
P008    AATC
P009    ATAA
P010    ATAT
P011    ATAG
P012    ATAC
P013    ATTA
P014    ATTT
P015    ATTG
P016    ATTC
P017    AGAA
P018    AGAT
P019    AGAG
P020    AGAC
P021    AGTA
P022    AGTT
P023    AGTG
P024    AGTC
""".splitlines()

trn_dict = {str.split()[0]:str.split()[1] for str in database}

def translate_tag(old_tag, trn_dict):
    """Function to convert any old tags to their new format based on the translation dictionary (variable "trn_dict")."""
    try:
        return trn_dict[old_tag]
    except KeyError:
        print("Error in file {}! The tag {} wasn't found in the translation table. "
              "Make sure the translation table is up to date. "
              "The program will continue with the rest of the file, but this tag will be skipped!")
    return None

print (translate_tag('P020', trn_dict))

显示预期值AGAC

(那个字符串到列表到字典的代码很快就能获得程序中的数据,并不是这个方法的一部分。)