Question

下午好，这是一个奇怪的问题，所以我会尽力解释。

我有2个输入，列表['tweet 1', 'tweet 2',...]中有几条推文，字典中有{'word1': value1;'word 2': value2;...}。

想象一下第一条推文是：

“我喜欢吃土豆”

在字典中的500个单词中，有一个值

{...;'love'：3; ...;'potatoes'：-1; ...}。

此词典中不包含单词“ I”，“饮食”。因此，对于我拥有的每条推文，我都需要搜索其中未包含的单词，以便为它们提供该推文的价值得分。

示例：我喜欢吃土豆= 2

如此

I = 2

饮食= 2

我已经开始：

tweet=[]
values={}
    for list in tweet:
        divided_tweet=list.split()

为了得到分数，我用这个

[sum(valores.get(j, 0) for j in i.split()) for i in divided_tweet]

总而言之，我需要搜索每条推文，字典中未存在的每个单词以为其赋予值。

打印输出应为：

'I':2

'eating':2

（下一条推文）

'Inexistent word #3':'score of tweet #2' 

'Inexistent word #4':'score of tweet #2'

'Inexistent word #5':'score of tweet #2'

...

依此类推

有人可以帮我吗？

谢谢

P.D .：有负值和正值

Answer 1

您可以尝试这样的操作。我假设您词典中的单词值是整数或浮点数，而不是字符串：

tweets=[]
values={}
for tweet in tweets:
    twit = tweet.split()        
    item_vals = []    
    not_in_tweet = []    
    for item in twit:
        #get value of words
        if item in values:
            ival = values[item]
            item_vals.append(ival)
        else: #word not in dict
            not_in_tweet.append(item)
    sum_items = sum(item_vals)
    for item in not_in_tweet:
        values[item] = sum_items

Answer 2

下面是一个代码示例，为您提供了继续操作的方法：

>>> import re
>>> values = {'love': 3, 'potatoes': -1}
>>> tweet = 'I love eating potatoes'
>>> tweet_words = re.split("\W+", tweet)
>>> tweet_value = sum(values.get(word, 0) for word in tweet_words)
>>> {w: tweet_value for w in tweet_words if w not in values}
{'I': 2, 'eating': 2}

首先，我们在每个非单词字符序列（不是字母，数字或下划线）上使用tweet，将re.split分成单词。这比简单的split更好，因为您不会保留使徒符，逗号等。其次，我们计算tweet的值：如果单词在{中，则values.get(word, 0)返回该值{1}}和values否则。第三，我们创建一个dict（您可以在以后打印它），其单词不在0中，并为其分配values。

两次通过过程是不可避免的，因为必须计算全局值，然后才将其分配给未知单词。

对于完整程序，只需执行以下操作：

tweet_value

Answer 3

您提到

所以对于我的每条推文，我需要搜索哪些词不是包括在内，以便给他们发推文的价值得分。

我假设您有每条推文的字典，并有一个价值得分。示例-以下代码中的tweet_values_dc。如果不是这种情况，请告诉我推文在何处以及如何获得价值评分。

tweets_ls = ['I love eating potatoes', 'I love eating mangoes']
tweet_values_dc = {'I love eating potatoes': 2, 'I love eating mangoes': 3}
missing_words_values_dc = {'love':3,'potatoes':-1}
for atweet in tweets_ls:
    tweet_splited = atweet.split()
    for aword in tweet_splited:
        if aword not in missing_words_values_dc.keys():
            aTweetValue = tweet_values_dc.get(atweet)
            missing_words_values_dc.update({aword:aTweetValue})
print(missing_words_values_dc)

输出

{'love': 3, 'potatoes': -1, 'I': 2, 'eating': 2, 'mangoes': 3}

在字典中找到不存在的词以为其赋值

3 个答案: