为什么此字符串不会更改为大写?

时间:2015-07-02 18:17:00

标签: python string uppercase

所以我有一个氨基酸文件,我试图阅读mdvfmkglskakegvvaaaektkqgvaeaagktkegvlyvgsktkegvvhgvatvaektk eqvtnvggavvtgvtavaqktvegagsiaaatgfvkkdqlgkneegapqegiledmpvdp dneayempseegyqdyepea

我有一个名为氨基酸的大写字母列表。问题是我无法读取序列,因为字母是小写的。我一直试图把它变成大写。读取文件没有问题,我认为我已成功将其内容转换为字符串(但也许我没有?)。

aminoacids = ['A','C','D','E','F','G','H','I','K','L','M','N','P','Q','R','S','T','V','W','Y']
content1 = fh.readline() #first line, which is not the sequence
        #print content1
charline1 = len(content1)-1 #number of characters in the first line
        #print charline1
contentall = fh.readlines() #each line is converted into a string and put into a list
        #print contentall
numlines = len(contentall) #number of elements in list = number of lines, not the first one
        #print numlines
contentjoined = ''.join(contentall) #list elements are combined, but this includes new lines as characters
contentjoined = contentjoined.translate(None, "\n")
contentjoined = contentjoined.translate(None,''.join([i for i in contentjoined if i not in aminoacids]))
contentjoined = contentjoined.upper()
print contentjoined
numaa = len(contentjoined)
print numaa #this shouldn't be zero but it is

为什么这不起作用?我该怎么办才能修复它?我现在正处于with状态......以前没有问题,但现在是吗? Numaa是0,当它不应该是。我意识到我可以在我的列表中添加小写字母但是应该有更多的" pythonic"解决这个问题的方法。

3 个答案:

答案 0 :(得分:2)

是否因为在检查aminoacids中的字符串后是否将字符串设为大写?尝试将contentjoined = contentjoined.upper()一行或两行向上移动。

当您检查aminoacids时,您正在为str.translate提供完全小写的字符串,因此它与字符串不匹配。最终看起来像这样:

>>> c = contentjoined.translate(None,''.join([i for i in contentjoined if i not in aminoacids]))
>>> c
''

如果先调用upper,则会将大写字符串与大写字符串列表进行比较,因此您实际上会匹配。它看起来像这样:

>>> contentjoined = contentjoined.upper()
>>> c = contentjoined.translate(None,''.join([i for i in contentjoined if i not in aminoacids]))
>>> c
'MDVFMKGLSKAKEGVVAAAEKTKQGVAEAAGKTKEGVLYVGSKTKEGVVHGVATVAEKTKEQVTNVGGAVVTGVTAVAQKTVEGAGSIAAATGFVKKDQLGKNEEGAPQEGILEDMPVDPDNEAYEMPSEEGYQDYEPEA'

如果要将字符串保留为小写字母,则可以使用大写字母进行比较并保留小写字母。这看起来像这样:

>>> c = contentjoined.translate(None,''.join([i for i in contentjoined.upper() if i not in aminoacids]))
>>> c
'mdvfmkglskakegvvaaaektkqgvaeaagktkegvlyvgsktkegvvhgvatvaektkeqvtnvggavvtgvtavaqktvegagsiaaatgfvkkdqlgkneegapqegiledmpvdpdneayempseegyqdyepea'

答案 1 :(得分:0)

问题在于您的translate()命令:

contentjoined = contentjoined.translate(None, "\n")
contentjoined = contentjoined.translate(None,''.join([i for i in contentjoined if i not in aminoacids]))

在这里,您正在使用contentjoined替换找到的所有内容(我不确定aminoacidsNone中的数据)。 喜欢如果你尝试:

>>>temp = "this is a test string"
>>>temp.translate(None, "aeiou")
>>>'ths s  tst strng' #THIS IS OUTPUT

所以我猜你的整个字符串都变成了None。 查看translate() Docs

答案 2 :(得分:0)

当您拉入文件时,您可以将所有内容转换为大写。也许是这样的?

with open('myfile.txt', 'r') as f:
    data = f.read().upper()

print(data)
'MDVFMKGLSKAKEGVVAAAEKTKQGVAEAAGKTKEGVLYVGSKTKEGVVHGVATVAEKTK\nEQVTNVGGAVVTGVTAVAQKTVEGAGSIAAATGFVKKDQLGKNEEGAPQEGILEDMPVDP\nDNEAYEMPSEEGYQDYEPEA\n'