如何在Python中有多个分隔符时列出列表?

时间:2016-10-10 18:42:47

标签: python list readfile strip

示例文件看起来像这样(全部在一行上,为了易读性而包装):

 ['>1\n', 'TCCGGGGGTATC\n', '>2\n', 'TCCGTGGGTATC\n',
  '>3\n', 'TCCGTGGGTATC\n', '>4\n', 'TCCGGGGGTATC\n',
  '>5\n', 'TCCGTGGGTATC\n', '>6\n', 'TCCGTGGGTATC\n',
  '>7\n', 'TCCGTGGGTATC\n', '>8\n', 'TCCGGGGGTATC\n','\n',
  '$$$\n', '\n',
  '>B1\n', 'ATCGGGGGTATT\n', '>B2\n', 'TT-GTGGGAATC\n',
  '>3\n', 'TTCGTGGGAATC\n', '>B4\n', 'TT-GTGGGTATC\n',
  '>B5\n', 'TTCGTGGGTATT\n', '>B6\n','TTCGGGGGTATC\n',
  '>B7\n', 'TT-GTGGGTATC\n', '>B8\n', 'TTCGGGGGAATC\n',
  '>B9\n', 'TTCGGGGGTATC\n','>B10\n', 'TTCGGGGGTATC\n',
  '>B42\n', 'TT-GTGGGTATC\n']

$$$分隔两组。我需要使用.strip函数并删除\n和所有"标题"。

我需要列出一个列表(如下所示)并替换" - "与Z(再次,所有在一条线上;为了易读性而包裹在这里):

  [['TCCGGGGGTATC','TCCGTGGGTATC','TCCGTGGGTATC', 'TCCGGGGGTATC',
    'TCCGTGGGTATC',CGTGGGTATC','TCCGTGGGTATC', 'TCCGGGGGTATC'],
   ['ATCGGGGGTATT', 'TT-GTGGGAATC','TTCGTGGGAATC', 'TT-GTGGGTATC',
    'TTCGTGGGTATT', 'TTCGGGGGTATC','TT-GTGGGTATC', 'TTCGGGGGAATC',
    'TTCGGGGGTATC', 'TTCGGGGGTATC','TT-GTGGGTATC]]

2 个答案:

答案 0 :(得分:2)

您可以利用较小长度的标题(以及其他不需要的项目)作为过滤它们的标准。首先创建一个包含一个列表的列表,然后将通过长度测试的项目附加到内部列表。

当到达分隔符 lst = ['>1\n', 'TCCGGGGGTATC\n', '>2\n', 'TCCGTGGGTATC\n', '>3\n', 'TCCGTGGGTATC\n', '>4\n', 'TCCGGGGGTATC\n', '>5\n', 'TCCGTGGGTATC\n', '>6\n', 'TCCGTGGGTATC\n', '>7\n', 'TCCGTGGGTATC\n', '>8\n', 'TCCGGGGGTATC\n','\n', '$$$\n', '\n', '>B1\n', 'ATCGGGGGTATT\n', '>B2\n', 'TT-GTGGGAATC\n', '>3\n', 'TTCGTGGGAATC\n', '>B4\n', 'TT-GTGGGTATC\n', '>B5\n', 'TTCGTGGGTATT\n', '>B6\n','TTCGGGGGTATC\n', '>B7\n', 'TT-GTGGGTATC\n', '>B8\n', 'TTCGGGGGAATC\n', '>B9\n', 'TTCGGGGGTATC\n','>B10\n', 'TTCGGGGGTATC\n','>B42\n', 'TT-GTGGGTATC\n'] result = [[]] for x in lst: if len(x) > 6: result[-1].append(x.strip()) if x.startswith('$$$'): result.append([]) print(result) # [['TCCGGGGGTATC', 'TCCGTGGGTATC', 'TCCGTGGGTATC', 'TCCGGGGGTATC', 'TCCGTGGGTATC', 'TCCGTGGGTATC', 'TCCGTGGGTATC', 'TCCGGGGGTATC'], ['ATCGGGGGTATT', 'TT-GTGGGAATC', 'TTCGTGGGAATC', 'TT-GTGGGTATC', 'TTCGTGGGTATT', 'TTCGGGGGTATC', 'TT-GTGGGTATC', 'TTCGGGGGAATC', 'TTCGGGGGTATC', 'TTCGGGGGTATC', 'TT-GTGGGTATC']] 时,会在结果列表中添加一个新的子列表,并再次使用长度测试将剩余的项目添加到此新的子列表中:

result = dict_puzzle(puzzle,cell)
result['11'] = 'die'
print(result)

答案 1 :(得分:1)

以下是Moses Koledoye的答案的变体,它检查>的第一个字符并丢弃任何匹配以及任何空元素。我还包括更换" - "与" Z"。

lst = ['>1\n', 'TCCGGGGGTATC\n', '>2\n', 'TCCGTGGGTATC\n',
   '>3\n', 'TCCGTGGGTATC\n', '>4\n', 'TCCGGGGGTATC\n',
   '>5\n', 'TCCGTGGGTATC\n', '>6\n', 'TCCGTGGGTATC\n',
   '>7\n', 'TCCGTGGGTATC\n', '>8\n', 'TCCGGGGGTATC\n','\n',
   '$$$\n', '\n',
   '>B1\n', 'ATCGGGGGTATT\n', '>B2\n', 'TT-GTGGGAATC\n',
   '>3\n', 'TTCGTGGGAATC\n', '>B4\n', 'TT-GTGGGTATC\n',
   '>B5\n', 'TTCGTGGGTATT\n', '>B6\n','TTCGGGGGTATC\n',
   '>B7\n', 'TT-GTGGGTATC\n', '>B8\n', 'TTCGGGGGAATC\n',
   '>B9\n', 'TTCGGGGGTATC\n','>B10\n', 'TTCGGGGGTATC\n',
   '>B42\n', 'TT-GTGGGTATC\n']

result = [[]]
for x in lst:
    if x.startswith('>'):
        continue
    if x.startswith('$$$'):
        result.append([])
        continue
    x = x.strip()
    if x:
        result[-1].append(x.replace("-", "Z"))
print(result)

这可以避免为任何元素的长度赋予任何特定的意义。