Python:删除连续出现N次以上的单词

时间:2017-12-02 11:02:42

标签: python text

假设我有一句话:

sentence = "Eveeery mondayyy I waaake upp"

我想创建一个函数,删除在单词中连续出现N次以上的所有字母。

所以,如果我说:N = 2 结果应该是:

result = Eveery mondayy I waake upp

我怎样才能以有效的方式做到这一点?

4 个答案:

答案 0 :(得分:2)

为您提供良好的开端: 只需发布可能对您有帮助的样本:

import re
regex = r"(.)\1+"
test_str = "sentence = Eveeery mondayyy I waaake upp"
# use \\1\\1 if you need to replace with two characters and so on 
subst = "\\1"
# You can manually specify the number of replacements by changing the 4th argument
result = re.sub(regex, subst, test_str, 0)
if result:
    print (result)

输出:

>>>Every monday I wake up

希望这有帮助

答案 1 :(得分:2)

re.sub()解决方案:

import re

def remove_continued_char(s, n):
    pat = re.compile(r'([a-z])(\1{' + str(n) + '})')
    return pat.sub('\\2', s)

sentence = 'Eveeery mondayyy I waaake upp'
print(remove_continued_char(sentence, 2))

输出:

Eveery mondayy I waake upp
  • [a-z] - 仅匹配字母字符(字母)
  • \1 - 对第一个捕获组的反向引用,即([a-z])
  • \\2 - 指向第二个捕获(带括号的)组值

答案 2 :(得分:1)

你必须迭代句子的字母,同时记录前一个字母的轨迹,以及它被看到的次数。

def del_n(n, s):
    so_far = 1
    previous = s[0]
    res = [s[0]]

    for idx, c in enumerate(s[1:]):
        if c == previous:
            so_far += 1
            if so_far >= n+1:
                continue
        else:
            previous = c
            so_far = 1
        res.append(c)
    return ''.join(res)


sentence = "Eveeery mondayyy I waaake upp"
del_n(2, sentence)

输出:

'Eveery mondayy I waake upp'

答案 3 :(得分:1)

您可以使用功能尝试此操作而无需导入任何外部模块:

sentence = "Eveeery mondayyy I waaake upp"



def no_dublicate(senten,N):
    final=[]
    for word in senten.split():
        track=[]
        for chara in word:
            track.append(chara)
            if track.count(chara)>N:
                track.remove(chara)


        final.append(track)

    return ["".join(item) for item in final]


print(no_dublicate(sentence,2))

输出:

['Eveery', 'mondayy', 'I', 'waake', 'upp']