从列表中更新熊猫数据框值

时间:2020-11-04 11:37:29

标签: python pandas dataframe

想象一下,我们有以下数据框:

Name    ID  Phone   Email
Paul    10  000001  paul@mail.com
Sarah   20          sara@mail.com
John    30  000003  
Will    40  
Evelyn  50  000005  evelyn@mail.com

还有以下列表:

['Sarah', '20', '000002', 'sara@mail.com']
['John', '30', '000003', 'john@mail.com']
['Will', '40', '000004', 'will@mail.com']

是否有任何pythonic pandas方式可以从列表中更新数据框中的None值,而不必循环查看各个字段?

结果应为:

Name    ID  Phone   Email
Paul    10  000001  paul@mail.com
Sarah   20  000002  sara@mail.com
John    30  000003  john@mail.com
Will    40  000004  will@mail.com
Evelyn  50  000005  evelyn@mail.com

提前谢谢!

1 个答案:

答案 0 :(得分:2)

您可以从列表创建DataFrame,将Name设置为在两个DataFrame中建立索引,并使用DataFrame.combine_first,因为相同的顺序将索引转换为列,然后按该列进行处理和最后排序:

L = [['Sarah', '20', '000002', 'sara@mail.com'],
['John', '30', '000003', 'john@mail.com'],
['Will', '40', '000004', 'will@mail.com']]

df1 = pd.DataFrame(L, columns=['Name','ID','Phone','Email']).set_index('Name')
print (df1)
       ID   Phone          Email
Name                            
Sarah  20  000002  sara@mail.com
John   30  000003  john@mail.com
Will   40  000004  will@mail.com

df = (df.reset_index()
       .set_index('Name')
       .combine_first(df1)
       .reset_index()
       .sort_values('index', ignore_index=True)
       .reindex(df.columns, axis=1))
print (df)
     Name  ID   Phone            Email
0    Paul  10  000001    paul@mail.com
1   Sarah  20  000002    sara@mail.com
2    John  30  000003    john@mail.com
3    Will  40  000004    will@mail.com
4  Evelyn  50  000005  evelyn@mail.com

另一个想法是使用DataFrame.update,但是所有值都被忽略了,不仅NaN s:

df = df.set_index('Name')
df.update(df1)
df = df.reset_index()
print (df)
     Name  ID   Phone            Email
0    Paul  10  000001    paul@mail.com
1   Sarah  20  000002    sara@mail.com
2    John  30  000003    john@mail.com
3    Will  40  000004    will@mail.com
4  Evelyn  50  000005  evelyn@mail.com