如何从python中的字符串中删除连续的单字母字符

时间:2019-07-27 13:54:27

标签: python

我有一个如下所示的字符串,用于删除长度大于5的连续单字母字符。

mystring = "the nucleotide sequence of wheat triticum aestivum l chloroplastid ribosome associated 4 5 s rna is u a g u g a g c g c g a g a c g a g c g u a u a g u g u c a g u g a g u g c a g u g a u g u a u g c a g c u g a g c a u c u a c g a c g a c g a u g a coh"

我的输出应该如下。

myoutput = "the nucleotide sequence of wheat triticum aestivum l chloroplastid ribosome associated 4 5 s rna is coh"

我尝试如下进行操作。

 for i, my in enumerate(line.split()):
     if len(my) == 1:
             count = count + 1
     else:
            count = 0
     if count == 5:
             print(i)

总而言之,我要计数并检查它是否有5个长度的单字母字符,并从列表中删除5个位置,依此类推。

但是,在不使用变量来计算长度和不将5乘以5的情况下,我想以更高效的pythonic方式执行此操作。

很高兴在需要时提供更多详细信息。

1 个答案:

答案 0 :(得分:1)

我相信在这种情况下,我们可以使用正则表达式解决此问题:

mystring = ("the nucleotide sequence of wheat triticum aestivum l"
            "chloroplastid ribosome associated 4 5 s rna is u a "
            "g u g a g c g c g a g a c g a g c g u a u a g u g u "
            "c a g u g a g u g c a g u g a u g u a u g c a g c u "
            "g a g c a u c u a c g a c g a c g a u g a coh")
print(mystring)

# See https://regex101.com/r/aUDK7K/1
# \b: word boundary
# \w: word char
# \s+: one or more white spaces
# {5,}: 5 or more times
shorten = re.sub(r'(\b\w\s+){5,}', '', mystring)
print(shorten)