python中字符串的子串

时间:2013-06-14 17:15:47

标签: python string

我有这个字符串

str = CTGGCATAACAAGACAAAAACAAAAGCAATAAATGCTGAAAAAACAAAATGCCGTGATCGTTTGTAATACTGGAACATAGTCATGATGAATGAAGGTTTCTGAACCTGAAGAACGACCTGAAAAAGTCAAACCGCAAGAATATCACGACGCAGTGAACCAGAATAGCAACGACGAAAATGTCCAGGAAAAATCCTGGAGTCAGATTCAGGGTTATTCGTTAGTGGCAGGATTACGAAGCGTGGGGCACAGGAGATACATCTCCAGTAAGATGGCAACGTAATCGCGGGCTTCTTTTTTAAGATCAAAAGATTGCGGGGCAAAGAGCCAGTTTTCCATCAGGCCGGAAATATAGCCGCGCATAATAATTGCTGCGCGACGCGTCATTAAATCCGCAGGCAACATTTTCGCTTCAATACAATGTTTTAACGTTTGTTCTATACGGTCATAACTTTCCAGACAGAGATTACGTTGTGCCTGTTGCACAACAGCCATTTCTCCGACAAATTCGCATTTGTGGAATATAATCTCCATCAATAATCGACGCCGTTCTTCTGTCACCGTGGATTCAAGAACATGAATTAATATCTCTCTTAATACTGAGAGTGGATCGCCAGGGAATTTTGCCTGATACTCAAGCTCTAGTTCACCAATATTGGATTCTGACAGTTCCCAGATCTCACTGAACAAATCCGACTTGTCTTTAAAATGCCAGTAGATTGCACCGCGCGTAACGCCAGCTGCTTTTGCAATCTCGCCCAGCGAGGTGGATGATACCCCCTGCTGTGAGAAAAGACGTAGAGCCACATCGAGGATGTGTTGGCGCGTTTCTTGCGCTTCTTGTTTGGTTTTTCGTGCCATATGTTCGTGAATTTACAGGCGTTAGATTTACATACATTTGTGAATGTATGTACCATAGCACGACGATAATATAAACGCAGCAATGGGTTTATTAACTTTTGACCATTGACCAATTTGAAATCGGACACTCGAGGTTTACATA

我想在str [0:19] str [1:20] str [2:21] str [3:22] .....等多个子字符串中剪切这个字符串到最后。

3 个答案:

答案 0 :(得分:3)

使用字符串切片:

>>> strs = "CTGGCATAACAAGACAAAAACAAAAGCAATAAATGCTGAAAAAACAAAATGCCGTGATCGTTTGTAATACTGGAACATAGTCATGATGAATGAAGGTTTCTGAACCTGAAGAACGACCTGAAAAAGTCAAACCGCAAGAATATCACGACGCAGTGAACCAGAATAGCAACGACGAAAATGTCCAGGAAAAATCCTGGAGTCAGATTCAGGGTTATTCGTTAGTGGCAGGATTACGAAGCGTGGGGCACAGGAGATACATCTCCAGTAAGATGGCAACGTAATCGCGGGCTTCTTTTTTAAGATCAAAAGATTGCGGGGCAAAGAGCCAGTTTTCCATCAGGCCGGAAATATAGCCGCGCATAATAATTGCTGCGCGACGCGTCATTAAATCCGCAGGCAACATTTTCGCTTCAATACAATGTTTTAACGTTTGTTCTATACGGTCATAACTTTCCAGACAGAGATTACGTTGTGCCTGTTGCACAACAGCCATTTCTCCGACAAATTCGCATTTGTGGAATATAATCTCCATCAATAATCGACGCCGTTCTTCTGTCACCGTGGATTCAAGAACATGAATTAATATCTCTCTTAATACTGAGAGTGGATCGCCAGGGAATTTTGCCTGATACTCAAGCTCTAGTTCACCAATATTGGATTCTGACAGTTCCCAGATCTCACTGAACAAATCCGACTTGTCTTTAAAATGCCAGTAGATTGCACCGCGCGTAACGCCAGCTGCTTTTGCAATCTCGCCCAGCGAGGTGGATGATACCCCCTGCTGTGAGAAAAGACGTAGAGCCACATCGAGGATGTGTTGGCGCGTTTCTTGCGCTTCTTGTTTGGTTTTTCGTGCCATATGTTCGTGAATTTACAGGCGTTAGATTTACATACATTTGTGAATGTATGTACCATAGCACGACGATAATATAAACGCAGCAATGGGTTTATTAACTTTTGACCATTGACCAATTTGAAATCGGACACTCGAGGTTTACATA"
>>> substrings = [strs[i:i+19] for i in xrange(len(strs))]
>>> substrings
['CTGGCATAACAAGACAAAA', 'TGGCATAACAAGACAAAAA', 'GGCATAACAAGACAAAAAC',...]

答案 1 :(得分:1)

chopped_str = []
for i in range(0, len(str)-19):
   chopped_str.append(str[i:i+19])

答案 2 :(得分:1)

如果你想从链中提取19个核苷酸的所有序列,那么就可以了:

>>> SEQ_LEN = 20
>>> [strs[i:i+SEQ_LEN] for i in xrange(len(strs)-SEQ_LEN)]

但是,它的内存效率不高,因为它会产生所有子序列的列表。它的用途是什么?


处理N个核苷酸的每个亚序列的另一种方法可以是:

for seq in (strs[i:i+SEQ_LEN] for i in xrange(len(strs)-SEQ_LEN)):
    do_something_with(seq)

针对您的具体问题,do_something_with将主要使用核苷酸位置更新PWM。如果您遇到困难,请随意发布其他问题;)