Pandas数据框:根据另一列中的值替换多行

时间:2018-07-30 05:50:11

标签: python pandas dataframe merge

我正在尝试用另一个数据框的列中的值替换一个数据框的列中的某些值。这是数据帧的外观。 df2有很多行和列。

df1

    0                   1029
0   aaaaa               Green
1   bbbbb               Green
2   fffff               Blue
3   xxxxx               Blue
4   zzzzz               Green

df2
    0       1   2     3  ....    1029
0   aaaaa   1   NaN   14         NaN
1   bbbbb   1   NaN   14         NaN
2   ccccc   1   NaN   14         Blue
3   ddddd   1   NaN   14         Blue
...    
25  yyyyy   1   NaN   14         Blue
26  zzzzz   1   NaN   14         Blue

最终的df应该看起来像这样

    0       1   2     3  ....    1029
0   aaaaa   1   NaN   14         Green 
1   bbbbb   1   NaN   14         Green
2   ccccc   1   NaN   14         Blue
3   ddddd   1   NaN   14         Blue
...    
25  yyyyy   1   NaN   14         Blue
26  zzzzz   1   NaN   14         Green

因此,基本上需要发生的是df1[0]df[2]必须匹配,然后df2[1029]需要用df1[1029]中对应行替换的值。匹配的行。我不想丢失df2['1029']中没有的任何值,df1['1029']

我相信python中的re模块可以做到吗?这是我到目前为止的内容:

import re
for line in replace:
line = re.sub(df1['1029'], 
              '1029',
              line.rstrip())

print(line)

但这绝对不起作用。

我也可以像merged1 = df1.merge(df2, left_index=True, right_index=True, how='inner')中那样使用merge,但这不能替换内联的值。

2 个答案:

答案 0 :(得分:1)

您需要:

df1 = pd.DataFrame({'0':['aaaaa','bbbbb','fffff','xxxxx','zzzzz'], '1029':['Green','Green','Blue','Blue','Green']})

df2 = pd.DataFrame({'0':['aaaa','bbbb','ccccc','ddddd','yyyyy','zzzzz',], '1029':[None,None,'Blue','Blue','Blue','Blue']})


# Fill NaNs
df2['1029'] = df2['1029'].fillna(df1['1029'])

# Merge the dataframes 
df_ = df2.merge(df1, how='left', on=['0'])

df_['1029'] = np.where(df_['1029_y'].isna(), df_['1029_x'], df_['1029_y'])

df_.drop(['1029_y','1029_x'],1,inplace=True)
print(df_)

输出:

       0   1029
0   aaaa  Green
1   bbbb  Green
2  ccccc   Blue
3  ddddd   Blue
4  yyyyy   Blue
5  zzzzz  Green

答案 1 :(得分:-1)

import pandas as pd
import numpy as np
df1 = pd.DataFrame({'0':['aa','bb','ff','xx', 'zz'], '1029':['Green', 'Green', 'Blue', 'Blue', 'Green']})
df2 = pd.DataFrame({'0':['aa','bb','cc','dd','ff','gg','hh','xx','yy', 'zz'], '1': [1]*10, '2': [np.nan]*10, '1029':[np.nan, np.nan, 'Blue', 'Blue', np.nan, np.nan, 'Blue', 'Green', 'Blue', 'Blue']})
df1
    0   1029
0  aa  Green
1  bb  Green
2  ff   Blue
3  xx   Blue
4  zz  Green

df2
    0  1   1029   2
0  aa  1    NaN NaN
1  bb  1    NaN NaN
2  cc  1   Blue NaN
3  dd  1   Blue NaN
4  ff  1    NaN NaN
5  gg  1    NaN NaN
6  hh  1   Blue NaN
7  xx  1  Green NaN
8  yy  1   Blue NaN
9  zz  1   Blue NaN

如果两个数据帧中的列“ 0”都已排序,则将起作用。

df2.loc[(df2['1029'].isna() & df2['0'].isin(df1['0'])), '1029'] = df1['1029'][df2['0'].isin(df1['0'])].tolist()

df2
    0  1   1029   2
0  aa  1  Green NaN
1  bb  1  Green NaN
2  cc  1   Blue NaN
3  dd  1   Blue NaN
4  ff  1  Green NaN
5  gg  1    NaN NaN
6  hh  1   Blue NaN
7  xx  1  Green NaN
8  yy  1   Blue NaN
9  zz  1   Blue NaN