我有4个报纸的4个数据框(newspaper1,newspaper2,newspaper3,newspaper4]) 其中只有一列作者姓名。
现在,我想将这4个数据帧合并为一个,有5列:作者,以及报纸1,报纸2,报纸3,报纸4,其中包含1/0的值(作者为该报纸写作时为1)
import pandas as pd
listOfMedia =[newspaper1,newspaper2,newspaper3,newspaper4]
merged = pd.DataFrame(columns=['author','newspaper1','newspaper2', 'newspaper4', 'newspaper4'])
尽管此循环可以实现我的预期(用名称填充合并的df author列):
for item in listOfMedia:
merged.author = item.author
我不知道如何用1/0值填充报纸栏...
for item in listOfMedia:
if item == newspaper1:
merged['newspaper1'] = '1'
elif item == newspaper2:
merged['newspaper2'] = '1'
elif item == newspaper3:
merged['newspaper3'] = '1'
else:
merged['newspaper4'] = '1'
我不断出错
在处理上述异常期间,发生了另一个异常: TypeError:attrib()得到了意外的关键字参数'convert' 曾尝试向Google发送该错误,但并没有帮助我确定问题所在。 我在这里想念什么?我还认为必须有一种更聪明的方法来填充报纸/作者矩阵,但是即使这样简单的方法似乎也无法弄清楚。我正在使用jupyter笔记本。
答案 0 :(得分:0)
实际上,您将所有行设置为1,因此请使用:
for col in merged.columns:
merged[col].values[:] = 1
答案 1 :(得分:0)
我已经猜到了我认为您的数据框的外观。
newspaper1 = pd.DataFrame({'author': ['author1', 'author2', 'author3']})
newspaper2 = pd.DataFrame({'author': ['author1', 'author2', 'author4']})
newspaper3 = pd.DataFrame({'author': ['author1', 'author2', 'author5']})
newspaper4 = pd.DataFrame({'author': ['author1', 'author2', 'author6']})
首先,我们将复制数据帧,以免影响原始数据:
newspaper1_temp = newspaper1.copy()
newspaper2_temp = newspaper2.copy()
newspaper3_temp = newspaper3.copy()
newspaper4_temp = newspaper4.copy()
接下来,我们用作者姓名替换每个数据框的索引:
newspaper1_temp.index = newspaper1['author']
newspaper2_temp.index = newspaper2['author']
newspaper3_temp.index = newspaper3['author']
newspaper4_temp.index = newspaper4['author']
然后我们将这些数据帧连接起来(通过我们设置的索引将它们匹配在一起):
merged = pd.concat([newspaper1_temp, newspaper2_temp, newspaper3_temp, newspaper4_temp], axis =1)
merged.columns = ['newspaper1', 'newspaper2', 'newspaper3', 'newspaper4']
最后我们将NaN替换为0,然后将非零条目(它们中仍将包含作者姓名)替换为1:
merged = merged.fillna(0)
merged[merged != 0] = 1