Python - 除了某些字符外,从字符串中删除所有内容

时间:2014-02-24 21:02:13

标签: python

不确定此问题是否曾被问过,但我找不到,所以这就是:

randomList = ["ACGT","A#$..G","..,/\]AGC]]]T"]
randomList2 = []
for i in randomList:
  if i <contains any characters other than "A",C","G", or "T">:
    <add a string without junk to randomList2>

我如何在&lt;&gt;?中完成所有事情? 谢谢,

3 个答案:

答案 0 :(得分:4)

>>> randomList = ["ACGT","A#$..G","..,/\]AGC]]]T"]
>>> import re
>>> [re.sub("[^ACGT]+", "", s) for s in randomList]
['ACGT', 'AG', 'AGCT']

[^ACGT]+匹配除+之外的一个或多个(ACGT)个字符。

一些时间:

>>> import timeit
>>> setup = '''randomList = ["ACGT","A#$..G","..,/\]AGC]]]T"]
... import re'''
>>> timeit.timeit(setup=setup, stmt='[re.sub("[^ACGT]+", "", s) for s in randomList]')
8.197133132976195
>>> timeit.timeit(setup=setup, stmt='[re.sub("[^ACGT]", "", s) for s in randomList]')
9.395620040786165

没有re,它会更快(请参阅@ cmd的回答):

>>> timeit.timeit(setup=setup, stmt="[''.join(c for c in s if c in 'ACGT') for s in randomList]")
6.874829817476666

更快(参见@ JonClement的评论):

>>> setup='''randomList = ["ACGT","A#$..G","..,/\]AGC]]]T"]\nascii_exclude = ''.join(set('ACGT').symmetric_difference(map(chr, range(256))))'''
>>> timeit.timeit(setup=setup, stmt="""[item.translate(None, ascii_exclude) for item in randomList]""")
2.814761871275735

也可能:

>>> setup='randomList = ["ACGT","A#$..G","..,/\]AGC]]]T"]'
>>> timeit.timeit(setup=setup, stmt="[filter(set('ACGT').__contains__, item) for item in randomList]")
4.341086316883207

答案 1 :(得分:4)

re对此

来说太过分了
randomList2 = [''.join(c for c in s if c in 'ACGT') for s in randomList]

如果你不想要那些最初没有垃圾的那些

valid = set("ACGT")
randomList2 = [''.join(c for c in s if c in valid) for s in randomList if any(c2 not in valid for c2 in s)]

答案 2 :(得分:0)

您可以使用正则表达式:

import re
randomList = ["ACGT","A#$..G","..,/\]AGC]]]T"]
nonACGT = re.compile('[^ACGT]')
for i in range(len(randomList)):
    randomList[i] = nonACGT.sub('', randomList[i])
print randomList