我目前在尝试将字符串附加到新列表时遇到了一些问题。但是,当我结束时,我的列表看起来像这样:
[ 'MDAALLLNVEGVKKTILHGGTGELPNFITGSRVIFHFRTMKCDEERTVIDDSRQVGQPMH \ nIIIGNMFKLEVWEILLTSMRVHEVAEFWCDTIHTGVYPILSRSLRQMAQGKDPTEWHVHT \ nCGLANMFAYHTLGYEDLDELQKEPQPLVFVIELLQVDAPSDYQRETWNLSNHEKMKAVPV \ nLHGEGNRLFKLGRYEEASSKYQEAIICLRNLQTKEKPWEVQWLKLEKMINTLILNYCQCL \ nLKKEEYYEVLEHTSDILRHHPGIVKAYYVRARAHAEVWNEAEAKADLQKVLELEPSMQKA \ nVRRELRLLENRMAEKQEEERLRCRNMLSQGATQPPAEPPTEPPAQSSTEPPAEPPTAPSA \ nELSAGPPAEPATEPPPSPGHSLQH \ N']
我想以某种方式删除换行符。我在这里查看了其他问题,大多数建议使用.rstrip,但是在我的代码中添加相同的输出。我在这里错过了什么?如果有人提出这个问题,请道歉。
我的输入也看起来像这样(前3行):
sp | Q9NZN9 | AIPL1_HUMAN芳基 - 烃相互作用蛋白样1 OS = Homo sapiens OX = 9606 GN = AIPL1 PE = 1 SV = 2 MDAALLLNVEGVKKTILHGGTGELPNFITGSRVIFHFRTMKCDEERTVIDDSRQVGQPMH IIIGNMFKLEVWEILLTSMRVHEVAEFWCDTIHTGVYPILSRSLRQMAQGKDPTEWHVHT
from sys import argv
protein = argv[1] #fasta file
sequence = '' #string linker
get_line = False #False = not the sequence
Uniprot_ID = []
sequence_list =[]
with open(protein) as pn:
for line in pn:
line.rstrip("\n")
if line.startswith(">") and get_line == False:
sp, u_id, name = line.strip().split('|')
Uniprot_ID.append(u_id)
get_line = True
continue
if line.startswith(">") and get_line == True:
sequence.rstrip('\n')
sequence_list.append(sequence) #add the amino acids onto the list
sequence = '' #resets the str
if line != ">" and get_line == True: #if the first line is not a fasta ID and is it a sequence?
sequence += line
print(sequence_list)
答案 0 :(得分:1)
按documentation,rstrip
删除尾随字符 - 最后的字符。您可能误解了其他人使用它来删除\n
,因为通常那些仅会出现在最后。
要用整个字符串中的其他内容替换字符,请改用replace
。
这些命令不修改你的字符串!它们返回 new 字符串,因此如果要更改当前字符串变量中的某些内容,请将结果返回给原始变量:
>>> line = 'ab\ncd\n'
>>> line.rstrip('\n')
'ab\ncd' # note: this is the immediate result, which is not assigned back to line
>>> line = line.replace('\n', '')
>>> line
'abcd'
答案 1 :(得分:0)
当我问这个问题时,我没有花时间查看文档和文档。理解我的代码。看了之后,我意识到了两件事:
对于我问的具体问题,我可以简单地使用line.split()删除'\ n'。
sequence = '' #string linker
get_line = False #False = not the sequence
uni_seq = {}
"""this block of code takes a uniprot FASTA file and creates a
dictionary with the key as the uniprot id and the value as a sequence"""
with open (protein) as pn:
for line in pn:
if line.startswith(">"):
if get_line == False:
sp, u_id, name = line.strip().split('|')
Uniprot_ID.append(u_id)
get_line = True
else:
uni_seq[u_id] = sequence
sequence_list.append(sequence)
sp, u_id, name = line.strip().split('|')
Uniprot_ID.append(u_id)
sequence = ''
else:
if get_line == True:
sequence += line.strip() # removes the newline space
uni_seq[u_id] = sequence
sequence_list.append(sequence)