我正在尝试将一个数据框中的行值替换为另一个。
以下是示例代码
import pandas as pd
import numpy as np
from pprint import pprint
raceA = ['r1','r3','r4','r5','r6','r7','r8', 'r9']
qualifierA = ['last','first','first','first','last','last','first','first']
participantA = ['rat','rat','cat','cat','rat','dog','dog','dog']
dfA = pd.DataFrame(
{'race':raceA,
'qualifier':qualifierA,
'participant':participantA
}
)
pprint(dfA)
raceB = ['r1','r2','r3','r4','r5','r6','r7','r8', 'r9','r10']
qualifierB = ['last',np.nan,np.nan,'first','first','last','last','first','first',np.nan]
participantB = ['rat','rat',np.nan,'cat','cat','rat','dog','dog',np.nan,np.nan]
dfB = pd.DataFrame(
{'race':raceB,
'qualifier':qualifierB,
'participant':participantB
}
)
pprint(dfB)
dfB.loc[dfB.race.isin(dfA.race), ['qualifier','participant']] = dfA[['qualifier','participant']]
pprint(dfB)
例如在dfA中,
r9 first dog
dfB包含
r9 first NaN
所需的输出: dfB
r9 first dog
获得的输出:
r9 NaN NaN
有人可以看看吗?
答案 0 :(得分:2)
对数据帧使用DataFrame.fillna
,
df = dfB.set_index('race').fillna(dfA.set_index('race')).reset_index()
print(df)
race qualifier participant
0 r1 last rat
1 r2 NaN rat
2 r3 first rat
3 r4 first cat
4 r5 first cat
5 r6 last rat
6 r7 last dog
7 r8 first dog
8 r9 first dog
9 r10 NaN NaN
或使用update
:
dfB = dfB.set_index('race')
dfA = dfA.set_index('race')
dfB.update(dfA)
print(dfB.reset_index())
race qualifier participant
0 r1 last rat
1 r2 NaN rat
2 r3 first rat
3 r4 first cat
4 r5 first cat
5 r6 last rat
6 r7 last dog
7 r8 first dog
8 r9 first dog
9 r10 NaN NaN
答案 1 :(得分:1)
我会分多个步骤做类似的事情。
首先,我将合并两个数据框-
dfB_PreProcessing = dfB.merge(dfA,left_on='race',right_on='race',how="left")
dfB_PreProcessing['participant_x'] = dfB_PreProcessing['participant_x'] .replace(np.nan, '', regex=True)
dfB_PreProcessing['participant'] = np.where(dfB_PreProcessing['participant_x'] == '', dfB_PreProcessing['participant_y'], dfB_PreProcessing['participant_x'])
然后清理限定符列(如果需要)-
dfB_PreProcessing['qualifier_x'] = dfB_PreProcessing['qualifier_x'] .replace(np.nan, '', regex=True)
dfB_PreProcessing['qualifier'] = np.where(dfB_PreProcessing['qualifier_x'] == '', dfB_PreProcessing['qualifier_y'], dfB_PreProcessing['qualifier_x'])*
然后仅选择所需的列作为输出df-
dfB = dfB_PreProcessing.loc[:,['race','qualifier','participant']]
让我知道它是否有效。
答案 2 :(得分:0)
更正我,如果我没有正确获取它。 如果要更新一行或多列的行,则可以更新该列的特定索引的值。 例如。 如果我要更新B列中的所有行,则
df = pd.DataFrame({'A':[1,2,3],'B': [4,5,6]})
df1 = pd.DataFrame({'B':[7,8,9]})
df.update(df1)
pprint(df)