我有一个名为df1
的数据框:
ID Value Name Score
-1 10 A -1
-1 5 B -1
NaN 0.2 Track C 100
NaN 0.5 Track C 200
1 0 D 100
5 0 D 200
我想用数据行NaN
中的多行ID
数据填充Score
列中的df2
。
df2
:
Score ID
100 1
100 2
100 3
100 4
200 5
200 6
200 7
所以最终,我的最终数据帧如下所示:
df3
:
ID Value Name Score
-1 10 A -1
-1 5 B -1
1 0.2 Track C 100
2 0.2 Track C 100
3 0.2 Track C 100
4 0.2 Track C 100
5 0.5 Track C 200
6 0.5 Track C 200
7 0.5 Track C 200
1 0 D 100
5 0 D 200
我该怎么做?
答案 0 :(得分:3)
我有一个解决方案,但是它并不优雅,我恳请经验丰富的用户来看看。
为使他人感到轻松,以下是用于设置测试用例的代码:
df1 = pd.DataFrame(
columns=\
'ID Value Name Score'.split(),
data = [
re.split('\s{2,}', line) for line in \
"""
-1 10 A -1
-1 5 B -1
NaN 0.2 Track C 100
NaN 0.5 Track C 200
1 0 D 100
5 0 D 200
""".strip().split('\n')
],
)
df1 = df1.replace({'NaN':np.nan})
df2 = pd.DataFrame(
columns=\
'Score ID'.split(),
data = [
re.split('\s{2,}', line) for line in \
"""
100 1
100 2
100 3
100 4
200 5
200 6
200 7
""".strip().split('\n')
],
)
我的解决方法是:
"""
the general first reaction is to pd.merge().
however the hurdle is, how to deal with the fillna of the column "ID".
mine works, but it is too hard coded.
"""
df = pd.merge(left=df1, right=df2, on='Score', how='left')
df['ID'] = df['ID_x'].fillna(df['ID_y'])
finalresult = df.drop(columns=['ID_x', 'ID_y']).drop_duplicates(subset=['ID','Name'])
输出:
Value Name Score ID
0 10 A -1 -1
1 5 B -1 -1
2 0.2 Track C 100 1
3 0.2 Track C 100 2
4 0.2 Track C 100 3
5 0.2 Track C 100 4
6 0.5 Track C 200 5
7 0.5 Track C 200 6
8 0.5 Track C 200 7
9 0 D 100 1
13 0 D 200 5
答案 1 :(得分:2)
您可以先使用pandas.merge
,然后使用pandas.concat
在axis=0
上合并两个数据帧:
s = pd.merge(df2, df, on='Score', how='left', suffixes=['', '_2'])\
.drop('ID_2', axis=1)\
.drop_duplicates('ID')
df3 = pd.concat([df.dropna(), s], ignore_index=True)
输出
print(df3)
ID Name Score Value
0 -1.0 A -1 10.0
1 -1.0 B -1 5.0
2 1.0 D 100 0.0
3 5.0 D 200 0.0
4 1.0 Track C 100 0.2
5 2.0 Track C 100 0.2
6 3.0 Track C 100 0.2
7 4.0 Track C 100 0.2
8 5.0 Track C 200 0.5
9 6.0 Track C 200 0.5
10 7.0 Track C 200 0.5
答案 2 :(得分:0)
分割df,然后使用merge
和concat
返回
df1_1=df1.loc[df1.ID.isnull()].copy()
df1_2=df1.loc[df1.ID.notnull()].copy()
df1_1=df1_1.reset_index().drop('ID',1).merge(df2,on='Score',how='left').set_index('index')
yourdf=pd.concat([df1_1,df1_2],sort=False).sort_index()
yourdf
Out[645]:
Value Name Score ID
0 10.0 A -1 -1.0
1 5.0 B -1 -1.0
2 0.2 TrackC 100 1.0
2 0.2 TrackC 100 2.0
2 0.2 TrackC 100 3.0
2 0.2 TrackC 100 4.0
3 0.5 TrackC 200 5.0
3 0.5 TrackC 200 6.0
3 0.5 TrackC 200 7.0
4 0.0 D 100 1.0
5 0.0 D 200 5.0