Question

我正在建立一个字数向量，我的进度如下所示：

我以这种形式构建了一个熊猫数据框：

样本数据框：

    file     body
0  PP3169   {'Under':1, 'natur':6, 'view':10, 'condit':2, 'human':7,...}

我也有每个单词及其ID的字典。

带有单词ID的字典摘要：

{'AFOSR': '0', 'ARO': '1', 'AUC': '2', 'Accuracy': '3', 'Acknowledgments': '4', 'Active': '5', 'Adam': '6', 'Adaptive': '7', 'After': '8',...}

在上述词典中，每个单词都被分配了一个“单词ID”。例如，AFOSR的ID为0，ARO的ID为1，依此类推。

目标：我想用单词ID字典中的相应值替换数据框中的字典键。假设如果在数据帧中单词“ under”在单词ID字典中的ID为477，则数据帧中的字符串将被其各自的ID代替。因此它将是477：1，格式为<word ID of word> : <frequency of word>。

数据帧的预期输出格式：

    file     body
0  PP3169   {<word ID of word#1> : <frequency of word#1>, <word ID of word#2> : <frequency of word#2>, <word ID of word#3> : <frequency of word#3>,...}

很高兴对这个问题有所帮助。

Answer 1

我想这是您要查找的代码：

（假设wordID代表每个单词的字典及其ID）

for word in wordID:
    df['body'][0][wordID[word]] = df['body'][0].pop(word)

在替换期间，由于它是Dataframe中的记录，因此需要使用[0]为Dataframe列建立索引。

Answer 2

我会尝试这种方式

new_body = {}
for i in body:
  new_body.update({ids[i] : body[i]})

body = new_body

将pandas数据框中的值替换为字典中的另一个值

2 个答案: