我正在尝试使用python中的正则表达式删除单个重复字符的单词,例如:
good => good
gggggggg => g
到目前为止,我一直在尝试
re.sub(r'([a-z])\1+', r'\1', 'ffffffbbbbbbbqqq')
上述解决方案的问题是它更改了good to god
,我只想删除单个重复字符的单词。
答案 0 :(得分:3)
一种更好的方法是使用set
def modify(s):
#Create a set from the string
c = set(s)
#If you have only one character in the set, convert set to string
if len(c) == 1:
return ''.join(c)
#Else return original string
else:
return s
print(modify('good'))
print(modify('gggggggg'))
如果要使用正则表达式,请用^
和$
(从@bobblebubble注释中得到启发)在正则表达式中标记字符串的开始和结尾。
import re
def modify(s):
#Create the sub string with a regex which only matches if a single character is repeated
#Marking the start and end of string as well
out = re.sub(r'^([a-z])\1+$', r'\1', s)
return out
print(modify('good'))
print(modify('gggggggg'))
输出将为
good
g
答案 1 :(得分:2)
您可以使用修剪命令:
看看这个例子:
"ggggggg".Trim('g');
更新: 对于字符串中间的字符,请使用此功能,这要感谢this answer
在Java中:
public static string RemoveDuplicates(string input)
{
return new string(input.ToCharArray().Distinct().ToArray());
}
在python中:
used = set()
unique = [x for x in mylist if x not in used and (used.add(x) or True)]
但是我认为所有这些答案都与aaaaabbbbbcda
之类的情况不匹配,该字符串在字符串末尾有一个a,它不会出现在结果(abcd)
中。对于这种情况,请使用我编写的以下函数:
在:
def unique(s):
used = set()
ret = list()
s = list(s)
for x in s:
if x not in used:
ret.append(x)
used = set()
used.add(x)
return ret
print(unique('aaaaabbbbbcda'))
退出:
['a', 'b', 'c', 'd', 'a']
答案 2 :(得分:2)
如果您不想在方法中使用set
,则可以做到这一点:
def simplify(s):
l = len(s)
if l>1 and s.count(s[0]) == l:
return s[0]
return s
print(simplify('good'))
print(simplify('abba'))
print(simplify('ggggg'))
print(simplify('g'))
print(simplify(''))
输出:
good
abba
g
g
说明: