Question

无论标点符号如何，程序都能正确识别单词。我无法将其集成到spam_indicator（文本）中。

def spam_indicator（text）：

text=text.split()
w=0
s=0
words=[]

for char in string.punctuation:
    text = text.replace(char, '')
return word

for word in text:
    if word.lower() not in words:
        words.append(word.lower())
        w=w+1
    if word.lower() in SPAM_WORDS:
        s=s+1
return float("{:.2f}".format(s/w))

enter image description here

第二块错了。我试图删除标点来运行该功能。

Answer 1

首先尝试删除标点符号，然后将文本拆分为单词。

def spam_indicator(text):
    for char in string.punctuation:
        text = text.replace(char, ' ')    # N.B. replace with ' ', not ''

    text = text.split()
    w = 0
    s = 0
    words = []

    for word in text:
        if word.lower() not in words:
            words.append(word.lower())
            w=w+1
        if word.lower() in SPAM_WORDS:
            s=s+1

    return float("{:.2f}".format(s/w))

可以对您的代码进行许多改进。

使用words的集合而不是列表。由于套装不能包含重复内容，因此在将其添加到集合之前，您无需检查是否已经看过该单词。
使用str.translate()删除标点符号。您希望用空格替换标点符号，以便split()将文本拆分为单词。
使用round()而不是转换为字符串，然后转换为浮动。

以下是一个例子：

import string

def spam_indicator(text):
    trans_table = {ord(c): ' ' for c in string.punctuation}
    text = text.translate(trans_table).lower()

    text = text.split()
    word_count = 0
    spam_count = 0
    words = set()

    for word in text:
        if word not in SPAM_WORDS:
            words.add(word)
            word_count += 1
        else:
            spam_count += 1

    return round(spam_count / word_count, 2)

如果没有非垃圾话，你需要注意不要除以0。无论如何，我不确定你想要什么作为垃圾邮件指标值。也许它应该是垃圾词的数量除以单词总数（垃圾邮件和非垃圾邮件），使其成为0到1之间的值？

Python字符串：标点修复请

1 个答案: