如何根据另一个数据框中的列填充数据框中的空值?

时间:2019-04-26 22:02:55

标签: python pandas dataframe replace

我有一个名为df1的数据框:

ID     Value       Name      Score
-1      10           A         -1
-1       5           B         -1
NaN     0.2       Track C     100
NaN     0.5       Track C     200
1        0           D        100
5        0           D        200

我想用数据行NaN中的多行ID数据填充Score列中的df2

df2

Score    ID
100      1
100      2
100      3
100      4
200      5
200      6
200      7

所以最终,我的最终数据帧如下所示: df3

ID     Value       Name      Score
-1      10           A         -1
-1       5           B         -1
1       0.2       Track C     100
2       0.2       Track C     100
3       0.2       Track C     100
4       0.2       Track C     100
5       0.5       Track C     200
6       0.5       Track C     200
7       0.5       Track C     200
1        0           D        100
5        0           D        200

我该怎么做?

3 个答案:

答案 0 :(得分:3)

我有一个解决方案,但是它并不优雅,我恳请经验丰富的用户来看看。

为使他人感到轻松,以下是用于设置测试用例的代码:

df1 = pd.DataFrame(
columns=\
'ID     Value       Name      Score'.split(),

data = [
re.split('\s{2,}', line)  for line in \
"""
-1      10           A         -1
-1       5           B         -1
NaN     0.2       Track C     100
NaN     0.5       Track C     200
1        0           D        100
5        0           D        200
""".strip().split('\n')  
],
)

df1 = df1.replace({'NaN':np.nan})

df2 = pd.DataFrame(

columns=\
'Score    ID'.split(),

data = [
re.split('\s{2,}', line)  for line in \
"""
100      1
100      2
100      3
100      4
200      5
200      6
200      7
""".strip().split('\n')  
],
)

我的解决方法是:

"""
the general first reaction is to pd.merge().
however the hurdle is, how to deal with the fillna of the column "ID".
mine works, but it is too hard coded.
"""

df = pd.merge(left=df1, right=df2, on='Score', how='left')

df['ID'] = df['ID_x'].fillna(df['ID_y'])

finalresult = df.drop(columns=['ID_x', 'ID_y']).drop_duplicates(subset=['ID','Name'])

输出:

   Value     Name Score  ID
0     10        A    -1  -1
1      5        B    -1  -1
2    0.2  Track C   100   1
3    0.2  Track C   100   2
4    0.2  Track C   100   3
5    0.2  Track C   100   4
6    0.5  Track C   200   5
7    0.5  Track C   200   6
8    0.5  Track C   200   7
9      0        D   100   1
13     0        D   200   5

答案 1 :(得分:2)

您可以先使用pandas.merge,然后使用pandas.concataxis=0上合并两个数据帧:

s = pd.merge(df2, df, on='Score', how='left', suffixes=['', '_2'])\
      .drop('ID_2', axis=1)\
      .drop_duplicates('ID')

df3 = pd.concat([df.dropna(), s], ignore_index=True)

输出

print(df3)
     ID     Name  Score  Value
0  -1.0        A     -1   10.0
1  -1.0        B     -1    5.0
2   1.0        D    100    0.0
3   5.0        D    200    0.0
4   1.0  Track C    100    0.2
5   2.0  Track C    100    0.2
6   3.0  Track C    100    0.2
7   4.0  Track C    100    0.2
8   5.0  Track C    200    0.5
9   6.0  Track C    200    0.5
10  7.0  Track C    200    0.5

答案 2 :(得分:0)

分割df,然后使用mergeconcat返回

df1_1=df1.loc[df1.ID.isnull()].copy()
df1_2=df1.loc[df1.ID.notnull()].copy()
df1_1=df1_1.reset_index().drop('ID',1).merge(df2,on='Score',how='left').set_index('index')

yourdf=pd.concat([df1_1,df1_2],sort=False).sort_index()
yourdf
Out[645]: 
   Value    Name  Score   ID
0   10.0       A     -1 -1.0
1    5.0       B     -1 -1.0
2    0.2  TrackC    100  1.0
2    0.2  TrackC    100  2.0
2    0.2  TrackC    100  3.0
2    0.2  TrackC    100  4.0
3    0.5  TrackC    200  5.0
3    0.5  TrackC    200  6.0
3    0.5  TrackC    200  7.0
4    0.0       D    100  1.0
5    0.0       D    200  5.0