将熊猫数据帧与NaN合并以丢失行

时间:2019-02-05 23:14:07

标签: python pandas dataframe join merge

我想使用参考日历作为支架,以填写我的主要数据中缺少的数据。为此,我想将这两个数据框合并。

import pandas as pd
import numpy as np

d1 = { 'Year': [2019,2019,2019,2019,2019,2019],
        'Week': [1,2,3,5,5,6],
        'Part': ['A','A','A','A','B','B'],
        'Static': [20,20,20,20,40,40],
        'Value': [np.nan,10,np.nan,50,30,np.nan] }

d2 = { 'Year':[2019,2019,2019,2019,2019,2019,2019,2019,2019,2019],
        'Week':[1,2,3,4,5,6,7,8,9,10] }

df1 = pd.DataFrame(d1)
df2 = pd.DataFrame(d2)

预期输出如下

    Year  Week Part  Static  Value
0   2019     1    A      20    NaN
1   2019     2    A      20   10.0
2   2019     3    A      20    NaN
3   2019     4    A      20    NaN
4   2019     5    A      20   50.0
5   2019     6    A      20    NaN
6   2019     7    A      20    NaN
7   2019     8    A      20    NaN
8   2019     9    A      20    NaN
9   2019    10    A      20    NaN
10  2019     1    B      40    NaN
11  2019     2    B      40    NaN
12  2019     3    B      40    NaN
13  2019     4    B      40    NaN
14  2019     5    B      40   30.0
15  2019     6    B      40    NaN
16  2019     7    B      40    NaN
17  2019     8    B      40    NaN
18  2019     9    B      40    NaN
19  2019    10    B      40    NaN

1 个答案:

答案 0 :(得分:1)

内嵌评论

# First, replicate `df2` for each unique Part.  
df3 = (df2.assign(Key=1)
          .merge(pd.DataFrame({'Part': df1.Part.unique(), 'Key': 1}), on='Key')
          .drop('Key', 1))
df3

    Year  Week Part
0   2019     1    A
1   2019     1    B
2   2019     2    A
3   2019     2    B
4   2019     3    A
5   2019     3    B
6   2019     4    A
7   2019     4    B
8   2019     5    A
9   2019     5    B
10  2019     6    A
11  2019     6    B
12  2019     7    A
13  2019     7    B
14  2019     8    A
15  2019     8    B
16  2019     9    A
17  2019     9    B
18  2019    10    A
19  2019    10    B

# Next, perform left outer merge with `df1`.     
df3.merge(df1, on=['Year', 'Week', 'Part'], how='left')

    Year  Week Part  Static  Value
0   2019     1    A    20.0    NaN
1   2019     1    B     NaN    NaN
2   2019     2    A    20.0   10.0
3   2019     2    B     NaN    NaN
4   2019     3    A    20.0    NaN
5   2019     3    B     NaN    NaN
6   2019     4    A     NaN    NaN
7   2019     4    B     NaN    NaN
8   2019     5    A    20.0   50.0
9   2019     5    B    40.0   30.0
10  2019     6    A     NaN    NaN
11  2019     6    B    40.0    NaN
12  2019     7    A     NaN    NaN
13  2019     7    B     NaN    NaN
14  2019     8    A     NaN    NaN
15  2019     8    B     NaN    NaN
16  2019     9    A     NaN    NaN
17  2019     9    B     NaN    NaN
18  2019    10    A     NaN    NaN
19  2019    10    B     NaN    NaN