将一个数据框中的行替换为另一个

时间:2019-06-28 10:45:19

标签: python python-3.x pandas dataframe

我正在尝试将一个数据框中的行值替换为另一个。

以下是示例代码

import pandas as pd
import numpy as np
from pprint import pprint

raceA = ['r1','r3','r4','r5','r6','r7','r8', 'r9']
qualifierA = ['last','first','first','first','last','last','first','first']
participantA = ['rat','rat','cat','cat','rat','dog','dog','dog']
dfA = pd.DataFrame(
    {'race':raceA,
     'qualifier':qualifierA,
     'participant':participantA

    }
)
pprint(dfA)

raceB = ['r1','r2','r3','r4','r5','r6','r7','r8', 'r9','r10']
qualifierB = ['last',np.nan,np.nan,'first','first','last','last','first','first',np.nan]
participantB = ['rat','rat',np.nan,'cat','cat','rat','dog','dog',np.nan,np.nan]
dfB = pd.DataFrame(
    {'race':raceB,
     'qualifier':qualifierB,
     'participant':participantB

    }
)
pprint(dfB)
dfB.loc[dfB.race.isin(dfA.race), ['qualifier','participant']] = dfA[['qualifier','participant']]
pprint(dfB)

例如在dfA中,

r9     first         dog

dfB包含

 r9     first         NaN

所需的输出: dfB

r9     first         dog

获得的输出:

r9       NaN         NaN

有人可以看看吗?

3 个答案:

答案 0 :(得分:2)

对数据帧使用DataFrame.fillna

df = dfB.set_index('race').fillna(dfA.set_index('race')).reset_index()

print(df)
  race qualifier participant
0   r1      last         rat
1   r2       NaN         rat
2   r3     first         rat
3   r4     first         cat
4   r5     first         cat
5   r6      last         rat
6   r7      last         dog
7   r8     first         dog
8   r9     first         dog
9  r10       NaN         NaN

或使用update

dfB = dfB.set_index('race')
dfA = dfA.set_index('race')

dfB.update(dfA)

print(dfB.reset_index())
 race qualifier participant
0   r1      last         rat
1   r2       NaN         rat
2   r3     first         rat
3   r4     first         cat
4   r5     first         cat
5   r6      last         rat
6   r7      last         dog
7   r8     first         dog
8   r9     first         dog
9  r10       NaN         NaN

答案 1 :(得分:1)

我会分多个步骤做类似的事情。

首先,我将合并两个数据框-

dfB_PreProcessing = dfB.merge(dfA,left_on='race',right_on='race',how="left")

enter image description here 然后清理参与者列-

dfB_PreProcessing['participant_x'] = dfB_PreProcessing['participant_x'] .replace(np.nan, '', regex=True)
dfB_PreProcessing['participant'] = np.where(dfB_PreProcessing['participant_x'] == '', dfB_PreProcessing['participant_y'], dfB_PreProcessing['participant_x'])

然后清理限定符列(如果需要)-

dfB_PreProcessing['qualifier_x'] = dfB_PreProcessing['qualifier_x'] .replace(np.nan, '', regex=True)
dfB_PreProcessing['qualifier'] = np.where(dfB_PreProcessing['qualifier_x'] == '', dfB_PreProcessing['qualifier_y'], dfB_PreProcessing['qualifier_x'])*

然后仅选择所需的列作为输出df-

dfB = dfB_PreProcessing.loc[:,['race','qualifier','participant']]

enter image description here

让我知道它是否有效。

答案 2 :(得分:0)

更正我,如果我没有正确获取它。 如果要更新一行或多列的行,则可以更新该列的特定索引的值。 例如。 如果我要更新B列中的所有行,则

df = pd.DataFrame({'A':[1,2,3],'B': [4,5,6]})
df1 = pd.DataFrame({'B':[7,8,9]})
df.update(df1)
pprint(df)