从字符串

时间:2016-04-13 17:09:29

标签: python regex

如果我有这样的字符串:

my_string = 'this is is is is a string'

如何删除多个is以便只显示一个?{/ p>

此字符串可以包含任意数量的is,例如

my_string = 'this is is a string'
other_string = 'this is is is is is is is is a string'

我认为正则表达式解决方案是可能的,但我不确定如何去做。感谢。

5 个答案:

答案 0 :(得分:1)

您可以使用itertools.groupby

from itertools import groupby
a = 'this is is is is a a a string string a a a'
print ' '.join(word for word, _ in groupby(a.split(' ')))

答案 1 :(得分:1)

这是我的方法:

my_string = 'this is is a string'
other_string = 'this is is is is is is is is a string'
def getStr(s):
    res = []
    [res.append(i) for i in s.split() if i not in res]
    return ' '.join(res)

print getStr(my_string)
print getStr(other_string)

输出:

this is a string
this is a string

UPDATE 攻击它的正则表达方式:

import re
print ' '.join(re.findall(r'(?:^|)(\w+)(?:\s+\1)*', other_string))

LIVE DEMO

答案 2 :(得分:0)

如果您想要逐个删除所有重复项,可以尝试

l = my_string.split()
tmp = [l[0]]
for word in l:
    if word != tmp[-1]:
        tmp.append(word)
s = ''
for word in tmp:
    s += word + ' '
my_string = s

当然,如果你想要它比这更聪明,那将会更复杂。

答案 3 :(得分:0)

对于oneliners:

for image in resultsdict['images']['VariationsSpecificPictureSet']:
    print(image['PictureUR‌​L'])

答案 4 :(得分:0)

正规救援!

((\b\w+\b)\s*\2\s*)+
# capturing group
# inner capturing group
# ... consisting of a word boundary, at least ONE word character and another boundary
# followed by whitespaces
# and the formerly captured group (aka the inner group)
# the whole pattern needs to be present at least once, but can be there
# multiple times

Python代码

import re

string = """
this is is is is is is is is a string
and here is another another another another example
"""
rx = r'((\b\w+\b)\s*\2\s*)+'

string = re.sub(rx, r'\2 ', string)
print string
# this is a string
# and here is another example

演示

查看此 approach on regex101.com 的演示以及 ideone.com