我有一个名为:
的清单FirstSequenceToSplit
它包含一个项目,这是一个DNA序列说:
'ATTTTACGTA'
我可以轻松返回此项目的长度,因此用户知道它长度为10个字符,然后我想要做的是让用户说他们想要提取索引字符[0:6] ],然后在新列表中生成两个项目。第一个项目具有用户定义索引的字符,后跟一个问号替换未提取的其他字符,第二个项目具有反转。
为了说明我想要的,如果用户说他们想要[0:5],你会得到一个包含以下项目的新列表:
[ 'ATTTT ?????', '????? ACGTA']
这是一个更大问题的一部分,我有一组FASTA格式的DNA序列('> Sequence1 / nATTTTACGTA','> Sequence2 / nATTGCACGTA'等),我希望用户能够根据其ID选择一个序列,并根据预定义的输入分割该序列,并称为Sequence2a和Sequence2b('> Sequence1a / n ????? ACGTA','> Sequence1b / nATTTT ??? ??''> Sequence2 / nATTGCACGTA'等)。我目前通过打印序列的名称来解决问题,让用户选择一个来拼接提取序列(没有ID),然后一旦我解决了上面显示的问题,我将创建一个包含新项目的新列表。
因为我是初学者(我现在肯定是显而易见的!)我将不胜感激任何给出的代码解释。非常感谢您提供任何可能的帮助
到目前为止我的代码是:
import sys
import re
#Creating format so more user friendly
class color:
PURPLE = '\033[95m'
CYAN = '\033[96m'
DARKCYAN = '\033[36m'
BLUE = '\033[94m'
GREEN = '\033[92m'
YELLOW = '\033[93m'
RED = '\033[91m'
BOLD = '\033[94m'
UNDERLINE = '\033[4m'
END = '\033[0m'
fileName = raw_input("Give the name of the Fasta file you wish to divide up ")
# i.e TopTenFasta
#Reading in the sequences splitting them by the > symbol
in_file = open(fileName,"r")
sequences = in_file.read().split('>')[1:]
in_file.close()
#Putting all these sequences into a list
allSequences = []
for item in sequences:
allSequences.append(item)
#Letting you know how many sequences there are in total
NumberOfSequences = len(allSequences)
print color.BOLD + "The Number of Sequences in this list is: " +color.END, NumberOfSequences
#Returning the names of the IDs to allow you to decide which ones to split
SequenceIds = []
for x in allSequences:
SequenceIds.append(x[0:10])
print color.BOLD + "With the following names: " + color.END, "\n", "\n".join(SequenceIds)
#-----------------------Starting the Splice ------------------------------------
#-----------------------------------------------------------------------------
#------------------------------------------------------------------------------
#Choosing the sequence you wish to splice
FirstSequenceToSplitID = raw_input(color.BOLD + "Which sequence would you like to splice " + color.END)
#Seeing whether that item is in the list
for x in SequenceIds:
if FirstSequenceToSplitID == x:
print "valid input"
FirstSequenceToSplit = []
#making a new list (FirstSequenceToSplit) and putting into it just the sequence (no ID)
for listItem in allSequences:
if listItem[0:10]==FirstSequenceToSplitID:
FirstSequenceToSplit.append(listItem[11:])
#Printing the Length of the sequence to splice
for element in FirstSequenceToSplit:
print color.BOLD + "The Length of this sequence is" + color.END, len(element)
答案 0 :(得分:1)
我会使用理解和拉链。我已经对代码进行了评论,但可以随意询问是否有不明确的内容。
my_str = 'ATTTTACGTA'
# This loop will check that
# - the casting to int is ok
# - there are only two numbers inputted
# - stop >= start
# - start > 0
# - stop < len(my_str)
while True:
try:
start, stop = map(int, raw_input(
'Please enter start and stop index separated by whitespace\n').split())
if stop < start or start < 0 or stop > len(my_str):
raise ValueError
break
except ValueError:
print 'Bad input, try again'
# Loop over all chars, check if the current index is inside range(start, stop).
# If it is, add (char, '?') to the array, if not, add ('?', char) to the array.
#
# This would give you an array of something like this:
# [('?', 'A'), ('?', 'T'), ('T', '?'), ('T', '?'), ('?', 'T'), ('?', 'A'),
# ('?', 'C'), ('?', 'G'), ('?', 'T'), ('?', 'A')]
#
# By using zip(*array), we unpack each element, and saves the first indexes as
# one list, and the second indexes as another, giving you a list like this:
#
# [('?', '?', 'T', 'T', '?', '?', '?', '?', '?', '?'),
# ('A', 'T', '?', '?', 'T', 'A', 'C', 'G', 'T', 'A')]
chars = zip(*((c, '?') if i in range(start, stop) else ('?', c)
for i, c in enumerate(my_str)))
# ''.join is used to concencate all chars into two strings
my_lst = [''.join(s) for s in chars]
print my_lst
示例输出:
Please enter start and stop index separated by whitespace
4
Bad input, try again
Please enter start and stop index separated by whitespace
5 4
Bad input, try again
Please enter start and stop index separated by whitespace
e 3
Bad input, try again
Please enter start and stop index separated by whitespace
4 5
['????T?????', 'ATTT?ACGTA']
答案 1 :(得分:0)
此表达式将起作用:
[ c[0:n] + '?' * (len(c)-n), '?' * n + c[n:] ]