细长的单词检查句子

时间:2013-11-24 01:26:10

标签: python regex file sentence

如果有细长的话,我想查看一个句子。例如,soooo,toooo,thaaatttt等。现在我不知道用户可能输入什么,因为我有一个句子列表,可能有也可能没有拉长的单词。我如何在python中检查它。我是python的新手。

4 个答案:

答案 0 :(得分:3)

试试这个:

import re
s1 = "This has no long words"
s2 = "This has oooone long word"

def has_long(sentence):
    elong = re.compile("([a-zA-Z])\\1{2,}")
    return bool(elong.search(sentence))


print has_long(s1)
False
print has_long(s2)
True

答案 1 :(得分:3)

@HughBothwell有个好主意。据我所知,没有一个英文单词连续三次重复相同的字母。因此,您可以搜索执行此操作的单词:

>>> from re import search
>>> mystr = "word word soooo word tooo thaaatttt word"
>>> [x for x in mystr.split() if search(r'(?i)[a-z]\1\1+', x)]
['soooo,', 'tooo', 'thaaatttt']
>>>

任何你发现的都是细长的话。

答案 2 :(得分:2)

好吧,你可以在逻辑上列出每个细长的单词。然后循环翻译句子中的单词,然后单击列表中的单词以查找细长的单词。

sentence = "Hoow arre you doing?"
elongated = ["hoow",'arre','youu','yoou','meee'] #You will need to have a much larger list
for word in sentence:
    word = word.lower()
    for e_word in elongated:
        if e_word == word:
            print "Found an elongated word!"

如果你想做Hugh Bothwell所说的话,那么:

sentence = "Hooow arrre you doooing?"
elongations = ["aaa","ooo","rrr","bbb","ccc"]#continue for all the letters 
for word in sentence:
    for x in elongations:
        if x in word.lower():
            print '"'+word+'" is an elongated word'

答案 3 :(得分:1)

您需要提供有效的英语单词。在* NIX系统上,您可以使用/etc/share/dict/words/usr/share/dict/words或同等对象,并将所有字词存储到set对象中。

然后,您需要检查句子中的每个单词

  1. 这个词本身不是一个有效的词(即word not in all_words);和
  2. 当你将所有连续序列缩短为一个或两个字母时,新单词就是一个有效单词。
  3. 这是您试图提取所有可能性的一种方式:

    import re
    import itertools
    
    regex = re.compile(r'\w\1\1')
    
    all_words = set(get_all_words())
    
    def without_elongations(word):
        while re.search(regex, word) is not None:
            replacing_with_one_letter = re.sub(regex, r'\1', word, 1)
            replacing_with_two_letters = re.sub(regex, r'\1\1', word, 1)
            return list(itertools.chain(
                without_elongations(replacing_with_one_letter),
                without_elongations(replacing_with_two_letters),
            ))
    
    for word in sentence.split():
        if word not in all_words:
            if any(map(lambda w: w in all_words, without_elongations(word)):
                print('%(word) is elongated', { 'word': word })