Question

我有一个字符串列表和一个带有句子的系列，这些句子的所有标点符号都已删除：

系列 = test_data [“评论”]

单词 = ['很棒'，'很棒'，'确定'，'很烂']

我需要从系列中删除所有不在列表中的单词，然后分配给新系列。我进行了在线搜索，并尝试过但无法找到解决方案。

有人可以帮忙吗？

这就是我所拥有的：

new_series= []
for word in words:
    if  word in significant_words:
         new_series.append(word)
print (new_series)

非常感谢。

Answer 1

如果数据包含句子并且需要列表填充新列，请使用：

words = [ 'great', 'awesome', 'ok', 'sucky'] 
test_data = pd.DataFrame({'reviews':['great it is', 'ok good well awesome']})

words = [ 'great', 'awesome', 'ok', 'sucky'] 

def func(x):
    a, b = [], []
    for word in x.split():
        if word not in words:
            a.append(word)
        else:
            b.append(word)

    return pd.Series([a, b])

test_data[['out','in']]  = test_data["reviews"].apply(func)
print (test_data)
                reviews           out             in
0           great it is      [it, is]        [great]
1  ok good well awesome  [good, well]  [ok, awesome]

从列表中未找到的熊猫系列中删除单词

1 个答案: