Question

我的句子如下

mainsentence="My words aren't available give didn't give apple and did happening me"

stopwords=['are','did','word', 'able','give','happen']

要删除是否有任何单词与之间的单词匹配（例如：“单词”应与“单词”匹配并删除它，“ did”应与“ did n't”匹配并删除它，“ able”应删除“ available ”，因为“可用”一词位于“可用”中

finalsentence="My apple and me"

尝试以下代码，但

querywords = mainsentence.split()
resultwords  = [word for word in querywords if word.lower() not in stopwords]
result = ' '.join(resultwords)
print(result)

，但只有在完全匹配时才能使用。

请帮助我。

Answer 1

您可以执行以下操作：

>>> ' '.join([word for word in mainsentence.split() if not any([stopword in word for stopword in stopwords])])
'My apple and me'

编辑：这不需要进行两种方式的检查，只需查看单词是否包含停用词
即可。 EDIT2：使用更新的问题参数更新结果

不区分大小写的版本：

' '.join([word for word in mainsentence.split() if not any([stopword.lower() in word.lower() for stopword in stopwords])])

Answer 2

以下代码将满足您在问题中所述的要求，但结果并不一定是您想要的。代码的一般基础结构应该正确，但是您可能需要更改部分匹配（stopword in testword）的条件：

def filter_out_stopwords(text, stopwords):
    result = []
    for word in text.split():
        testword = word.lower()
        flag = True
        for stopword in stopwords:
            if stopword in testword:
                flag = False
                break
        if flag:
            result.append(word)
    return result


' '.join(filter_out_stopwords("My words aren't available give didn't give apple and did happening me", ['are', 'did', 'word', 'able', 'give', 'happen']))
# "My apple and me"

或者，使用列表理解和all()（可以等效地使用any()）：

def filter_out_stopwords(text, stopwords):                                                                                                   
    return [
        word for word in text.split()
        if all(stopword not in word.lower() for stopword in stopwords)]


' '.join(filter_out_stopwords("My words aren't available give didn't give apple and did happening me", ['are', 'did', 'word', 'able', 'give', 'happen']))
# "My apple and me"

Answer 3

您可以使用正则表达式解决这些问题。

import re

您可以得到所有这样的数学单词：

words = re.findall(r'[a-z]*did[a-z]*', mainsentence)

您也可以替换它们：

re.sub(r'[a-z]*able[a-z]* ', '', mainsentence)

最后的答案：

mainsentence="My words aren't available give didn't give apple and did happening me"

stopwords=['are','did','word', 'able','give','happen']

for word in stopwords:
    mainsentence = re.sub(fr'[a-z\']*{word}[a-z\']* ', '', mainsentence)
# My apple and me

Answer 4

您遇到的问题可以通过以下步骤找到可持续的解决方案。

展开类似我的单词->我有，没有->没有。查看pycontractions。
使用单词的词缀获得每个单词的基本形式，即将单词的形式更改为其根形式。例如：玩，玩，玩变成玩。让我们将语料库的当前状态称为干净语料库。查看lemmatization。
现在从干净的语料库中删除所有停用词。

您可能还会发现我写的有趣的text cleaning module，其中还包括拼写更正，可用于创建文本清理管道。

在句子中找到并删除一个单词（在单词匹配之间）python

4 个答案: