我有一个单词栏:
> print(df['words'])
0 [awww, thats, bummer, shoulda, got, david, car...
1 [upset, that, he, cant, update, his, facebook,...
2 [dived, many, time, ball, managed, save, rest,...
3 [whole, body, feel, itchy, like, it, on, fire]
4 [no, it, not, behaving, at, all, im, mad, why,...
5 [not, whole, crew]
和每个词的“情感”值的另一个情感栏:
> print(sentiment)
abandon -2
0 abandoned -2
1 abandons -2
2 abducted -2
3 abduction -2
4 abductions -2
5 abhor -3
6 abhorred -3
7 abhorrent -3
8 abhors -3
9 abilities 2
...
对于df['words']
中的每一行单词,我想总结它们各自的情感价值。对于情绪中不存在的单词,请等于0。
这是我到目前为止所拥有的:
df['sentiment_value'] = Sum(df['words'].apply(lambda x: ''.join(x+x for x in sentiment))
预期结果
print(df['sentiment_value'])
0 -5
1 2
2 15
3 -6
4 -8
...
答案 0 :(得分:0)
如果第二列的字符串中有值,那么您需要先通过转换来过滤数据 分为两列
SUM(myvalue::bigint)
然后您可以从“情感”列中找到情感索引,并从“情感_值”列中获取价值
答案 1 :(得分:0)
如果您将分数设为系列,将单词作为标签:
In [11]: s # e.g. sentiment.set_index("word")["score"]
Out[11]:
abandon -2
abandoned -2
abandons -2
abducted -2
abduction -2
Name: score, dtype: int64
然后您可以查看列表的分数:
In [12]: s.loc[["abandon", "abducted"]].sum()
Out[12]: -4
因此适用条件为:
df['words'].apply(lambda ls: s.loc[ls])
如果您需要支持缺少的单词(不在s中),则可以使用reindex:
In [21]: s.reindex(["abandon", "abducted", "missing_word"]).sum()
Out[21]: -4.0
df['words'].apply(lambda ls: s.reindex(ls))