Question

我试图通过定义一个应该使用替换的字典来替换推文中的所有收缩，但不能理解为什么这不起作用：

tweet = "I luv <3 my iphone & you’re awsm apple. DisplayIsAwesome, sooo happppppy  http://www.apple.com"

APPOSTOPHES = {"'s": " is", "'re":" are"}  

sentence_list = tweet.split()

print(sentence_list)

new_sentence = []

for word in sentence_list:
    for candidate_replacement in APPOSTOPHES:
        if candidate_replacement in word:
            word = word.replace(candidate_replacement, APPOSTOPHES[candidate_replacement])

    new_sentence.append(word)

rfrm = " ".join(new_sentence)
print(rfrm)

我试图用最常见的收缩来改变字典，但它没有用。

最后输出的句子与输入的句子完全相同。

注意：在此之前，推文通过html解析，但我怀疑这会影响任何事情。

Answer 1

您的输入字符串tweet包含不可打印的字符’，而不是单引号'。
在大多数情况下，您可以将APPOSTOPHES字典扩展为以下内容：

...
APPOSTOPHES = {"'s": " is", "’s": " is", "'re":" are", "’re":" are"}

然后，您将获得预期的结果：

I luv <3 my iphone & you are awsm apple. DisplayIsAwesome, sooo happppppy  http://www.apple.com

Answer 2

非常简单。您在dict APPOSTOPHES中使用了错误的符号。

"’re" != "'re"

尝试使用：

APPOSTOPHES = {"’s": " is", "’re": " are"}

使用Python扩展英语语言的收缩

2 个答案: