Question

我有一个CSV文件，该文件已通过Pandas导入，列名为“关键字”。列中的每个单元格都有不同数量的关键字，例如“自信”，“黑暗”，“神秘”等。

    Keywords
0   Confident, Mysterious
1   Confident
2   Dark

我有这些关键字的同义词字典

    terms = {'Confident': 'Cool', 'Dark': ['Gloomy', 'Negative', 'Haunting'], 'Mysterious': 'Mystical'}

我正在尝试编写一些东西来查找“关键字”列中的字典键，并将相应的同义词（值）添加到单元格中，因此最终的结果变为：

    Keywords
0   Confident, Cool, Mysterious, Mystical
1   Confident, Cool
2   Dark, Gloomy, Negative, Haunting

我尝试了一些类似的事情：

    df['Keywords'].map(terms)

或者：

    df['Keywords'].apply(lambda l: [terms[e] for e in l])

...但是还没有运气。感谢所有帮助！

Answer 1

我的第一个建议是将字典值更改为相同类型。这将使以后更容易填充结果。像这样：

terms = {
    'Confident': ['Cool'],
    'Dark': ['Gloomy', 'Negative', 'Haunting'],
    'Mysterious': ['Mystical']
}

鉴于此，我们然后需要返回同义词和原始单词的列表。

def mapper(row):
    # Replace all the whitespace
    blanks = row['Keywords'].replace(' ', '')
    # Split based on commas
    s = blanks.split(',')

    # Find all synonyms
    res = []
    for keyword in s:
        res.append(keyword)
        if keyword in terms:
            for synonym in terms[keyword]:
                res.append(synonym)
    return res

有了这个，我们就可以调用df.apply。

# This is what I think your dataframe looks like
d = {'Keywords': ['Confident, Mysterious', 'Confident', 'Dark']}
df = pd.DataFrame(data=d)
new_df = df.apply(mapper, axis=1)

使用axis = 1进行调用意味着我们要遍历行而不是列。

如果单元格中存在键，则插入字典值

1 个答案: