Question

我的示例数据如下：

data1 = {'index':  ['001', '001', '001', '002', '002', '003', '004','004'],
        'type' : ['red', 'red', 'red', 'yellow', 'red', 'green', 'blue', 'blue'],
        'class' : ['A', 'A', 'A', 'A', 'A', 'A', 'A', 'A']}
df1 = pd.DataFrame (data1, columns = ['index', 'type', 'class']) 
df1
    index   type    class
0   001     red     A
1   001     red     A
2   001     red     A
3   002     yellow  A
4   002     red     A
5   003     green   A
6   004     blue    A
7   004     blue    A

data2 = {'index':  ['001', '001', '002', '003', '004'],
        'type' : ['red', 'red', 'yellow', 'green', 'blue'],
        'class' : ['A', 'A', 'A', 'B', 'A']}
df2 = pd.DataFrame (data2, columns = ['index', 'type', 'class']) 
df2
    index   type    class   
0   001     red     A      
1   001     red     A      
2   002     yellow  A      
3   003     green   B      
4   004     blue    A

在df1中，class = A在df2中可以是A，B或C。我想在df2中的df1中添加缺失的行。 df1具有每个索引的类型计数。例如，如果在df1中索引001出现3次，则意味着我也应该在df2中使其索引3次。输出应为：

    index   type    class   
0   001     red     A       
1   001     red     A       
2   001     red     A       
3   002     yellow  A      
4   002     red     A       
5   003     green   A       
6   003     green   B       
7   004     blue    A       
8   004     blue    A

我尝试使用pd.concat和pd.merge，但是我一直在重复或添加错误的行。有人对如何执行此操作有想法吗？

Answer 1

您可以将df1与df2中不在df1中的记录连接起来：df2[~df2.isin(df1)].dropna()
然后您对值进行排序和reset_index

长话短说，您可以一行完成：

pd.concat([df1, df2[~df2.isin(df1)].dropna()]).sort_values(['index','type','class']).reset_index(drop=True)

将给出以下输出：

    index   type    class
0   001     red     A
1   001     red     A
2   001     red     A
3   002     yellow  A
4   002     red     A
5   003     green   A
6   003     green   B
7   004     blue    A
8   004     blue    A

如何根据熊猫的条件将缺失的行从一个数据框添加到另一个数据框？

1 个答案: