用for循环填充pandas数据框

时间:2020-10-06 15:23:57

标签: python pandas dataframe for-loop

我有4个报纸的4个数据框(newspaper1,newspaper2,newspaper3,newspaper4]) 其中只有一列作者姓名。

现在,我想将这4个数据帧合并为一个,有5列:作者,以及报纸1,报纸2,报纸3,报纸4,其中包含1/0的值(作者为该报纸写作时为1)

import pandas as pd 

listOfMedia =[newspaper1,newspaper2,newspaper3,newspaper4]
merged = pd.DataFrame(columns=['author','newspaper1','newspaper2', 'newspaper4', 'newspaper4'])

尽管此循环可以实现我的预期(用名称填充合并的df author列):

for item in listOfMedia:
    merged.author = item.author

我不知道如何用1/0值填充报纸栏...

for item in listOfMedia:
    if item == newspaper1:
        merged['newspaper1'] = '1'
    elif item == newspaper2:
        merged['newspaper2'] = '1'
    elif item == newspaper3:
        merged['newspaper3'] = '1'
    else:
        merged['newspaper4'] = '1'

我不断出错

在处理上述异常期间,发生了另一个异常: TypeError:attrib()得到了意外的关键字参数'convert' 曾尝试向Google发送该错误,但并没有帮助我确定问题所在。 我在这里想念什么?我还认为必须有一种更聪明的方法来填充报纸/作者矩阵,但是即使这样简单的方法似乎也无法弄清楚。我正在使用jupyter笔记本。

2 个答案:

答案 0 :(得分:0)

实际上,您将所有行设置为1,因此请使用:

for col in merged.columns:
    merged[col].values[:] = 1

答案 1 :(得分:0)

我已经猜到了我认为您的数据框的外观。

newspaper1 = pd.DataFrame({'author': ['author1', 'author2', 'author3']})
newspaper2 = pd.DataFrame({'author': ['author1', 'author2', 'author4']})
newspaper3 = pd.DataFrame({'author': ['author1', 'author2', 'author5']})
newspaper4 = pd.DataFrame({'author': ['author1', 'author2', 'author6']})

首先,我们将复制数据帧,以免影响原始数据:

newspaper1_temp = newspaper1.copy()
newspaper2_temp = newspaper2.copy()
newspaper3_temp = newspaper3.copy()
newspaper4_temp = newspaper4.copy()

接下来,我们用作者姓名替换每个数据框的索引:

newspaper1_temp.index = newspaper1['author']
newspaper2_temp.index = newspaper2['author']
newspaper3_temp.index = newspaper3['author']
newspaper4_temp.index = newspaper4['author']

然后我们将这些数据帧连接起来(通过我们设置的索引将它们匹配在一起):

merged = pd.concat([newspaper1_temp, newspaper2_temp, newspaper3_temp, newspaper4_temp], axis =1)
merged.columns = ['newspaper1', 'newspaper2', 'newspaper3', 'newspaper4']

最后我们将NaN替换为0,然后将非零条目(它们中仍将包含作者姓名)替换为1:

merged = merged.fillna(0)
merged[merged != 0] = 1