如何在python中减去字符串

时间:2017-03-15 00:19:24

标签: python string python-2.7 bioinformatics

基本上,如果我有一个字符串'AJ'和另一个字符串'AJYF',我希望能够写'AJYF'-'AJ'并获得'YF'

我尝试了这个,但语法错误。

在旁注中,减法器将始终短于从中减去的字符串。此外,减法器将始终类似于从中减去的字符串。例如,如果我有' GTYF'我想从中减去一个长度为3的字符串,该字符串必须是' GTY'。

如果可能,我尝试的完整功能是根据列表中每个项目的长度将字符串转换为列表。有没有办法做到这一点?

4 个答案:

答案 0 :(得分:13)

简易解决方案是:

>>> string1 = 'AJYF'
>>> string2 = 'AJ'
>>> if string2 in string1:
...     string1.replace(string2,'')
'YF'
>>>

答案 1 :(得分:5)

我认为你想要的是:

a = 'AJYF'
b = a.replace('AJ', '')
print a     # produces 'YF'
a = 'GTYF'
b = a.replace('GTY', '')
print a     # produces 'F'

答案 2 :(得分:2)

如果第二个字符串出现在多个位置,

replace可以做一些你不想要的事情:

s1 = 'AJYFAJYF'
s2 = 'AJ'
if s1.startswith(s2):
    s3 = s1.replace(s2, '')
s3
# 'YFYF'

您可以向replace添加额外参数,以表明您只想要进行一次替换:

if s1.startswith(s2):
    s3 = s1.replace(s2, '', 1)
s3
# 'YFAJYF'

或者您可以使用re模块:

import re
if s1.startswith(s2):
    s3 = re.sub('^' + s2, '', s1)
s3
# 'YFAJYF'

'^'是为了确保s2仅在s1的第一个位置替换它。

评论中建议的另一种方法是从len(s2)中取出第一个s1字符:

if s1.startswith(s2):
    s3 = s1[len(s2):] 
s3
# 'YFAJYF'

在ipython中使用%timeit magic的一些测试(python 2.7.12,ipython 5.1.0)表明最后一种方法更快:

In [1]: s1 = 'AJYFAJYF'

In [2]: s2 = 'AJ'

In [3]: %timeit s3 = s1[len(s2):]
The slowest run took 24.47 times longer than the fastest. This could mean that an intermediate result is being cached.
10000000 loops, best of 3: 87.7 ns per loop

In [4]: %timeit s3 = s1[len(s2):]
The slowest run took 32.58 times longer than the fastest. This could mean that an intermediate result is being cached.
10000000 loops, best of 3: 87.8 ns per loop

In [5]: %timeit s3 = s1[len(s2):]
The slowest run took 21.81 times longer than the fastest. This could mean that an intermediate result is being cached.
10000000 loops, best of 3: 87.4 ns per loop

In [6]: %timeit s3 = s1.replace(s2, '', 1)
The slowest run took 17.64 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 230 ns per loop

In [7]: %timeit s3 = s1.replace(s2, '', 1)
The slowest run took 17.79 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 228 ns per loop

In [8]: %timeit s3 = s1.replace(s2, '', 1)
The slowest run took 16.27 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 234 ns per loop

In [9]: import re

In [10]: %timeit s3 = re.sub('^' + s2, '', s1)
The slowest run took 82.02 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 1.85 µs per loop

In [11]: %timeit s3 = re.sub('^' + s2, '', s1)
The slowest run took 12.82 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 1.86 µs per loop

In [12]: %timeit s3 = re.sub('^' + s2, '', s1)
The slowest run took 13.08 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 1.84 µs per loop

答案 3 :(得分:0)

如果您坚持使用'-'运算符,请使用覆盖了__ sub __ dunder方法的类,并结合上面提供的一种解决方案:

class String(object):
    def __init__(self, string):
        self.string = string

    def __sub__(self, other):
        if self.string.startswith(other.string):
            return self.string[len(other.string):]

    def __str__(self):
        return self.string


sub1 = String('AJYF') - String('AJ')
sub2 = String('GTYF') - String('GTY')
print(sub1)
print(sub2)

它打印:

YF
F