不确定此问题是否曾被问过,但我找不到,所以这就是:
randomList = ["ACGT","A#$..G","..,/\]AGC]]]T"]
randomList2 = []
for i in randomList:
if i <contains any characters other than "A",C","G", or "T">:
<add a string without junk to randomList2>
我如何在&lt;&gt;?中完成所有事情? 谢谢,
答案 0 :(得分:4)
>>> randomList = ["ACGT","A#$..G","..,/\]AGC]]]T"]
>>> import re
>>> [re.sub("[^ACGT]+", "", s) for s in randomList]
['ACGT', 'AG', 'AGCT']
[^ACGT]+
匹配除+
之外的一个或多个(ACGT
)个字符。
一些时间:
>>> import timeit
>>> setup = '''randomList = ["ACGT","A#$..G","..,/\]AGC]]]T"]
... import re'''
>>> timeit.timeit(setup=setup, stmt='[re.sub("[^ACGT]+", "", s) for s in randomList]')
8.197133132976195
>>> timeit.timeit(setup=setup, stmt='[re.sub("[^ACGT]", "", s) for s in randomList]')
9.395620040786165
没有re
,它会更快(请参阅@ cmd的回答):
>>> timeit.timeit(setup=setup, stmt="[''.join(c for c in s if c in 'ACGT') for s in randomList]")
6.874829817476666
更快(参见@ JonClement的评论):
>>> setup='''randomList = ["ACGT","A#$..G","..,/\]AGC]]]T"]\nascii_exclude = ''.join(set('ACGT').symmetric_difference(map(chr, range(256))))'''
>>> timeit.timeit(setup=setup, stmt="""[item.translate(None, ascii_exclude) for item in randomList]""")
2.814761871275735
也可能:
>>> setup='randomList = ["ACGT","A#$..G","..,/\]AGC]]]T"]'
>>> timeit.timeit(setup=setup, stmt="[filter(set('ACGT').__contains__, item) for item in randomList]")
4.341086316883207
答案 1 :(得分:4)
re
对此
randomList2 = [''.join(c for c in s if c in 'ACGT') for s in randomList]
如果你不想要那些最初没有垃圾的那些
valid = set("ACGT")
randomList2 = [''.join(c for c in s if c in valid) for s in randomList if any(c2 not in valid for c2 in s)]
答案 2 :(得分:0)
您可以使用正则表达式:
import re
randomList = ["ACGT","A#$..G","..,/\]AGC]]]T"]
nonACGT = re.compile('[^ACGT]')
for i in range(len(randomList)):
randomList[i] = nonACGT.sub('', randomList[i])
print randomList