Python - 验证值是否在列中,替换为不同列中的值

时间:2018-02-15 15:50:46

标签: python pandas

我有两个数据帧:

DF:

 index   some_variable identifier1  identifier2 
  1        x             AB2          AB3
  2        x             BB2          BB3
  3        x             CB2          CB3
  4        y             DB2          DB3
  5        y             EB2          EB3

DFA:

 index   some_variable identifier1  identifier2 identifier3
  1        x             AB5          AB3          AB3
  2        x             BB5          BB2          AB2
  3        x             CB5          CB2          AB5
  4        y             DB5          DB3          AB3
  5        y             EB5          EB3          AB3

如果df['identifier1']的元素在dfa['identifier2']中,则df['identifier2']替换该索引dfa['identifier3'],如果some_variable等于'x'。所以条件是:

[(df['identifier1'].isin(dfa['identifier2'])&(df[some_variable]=='x')] 

我希望:

 index   some_variable identifier1  identifier2 
  1        x             AB2          AB3
  2        x             BB2          AB2
  3        x             CB2          AB5
  4        y             DB2          DB3
  5        y             EB2          EB3

我可以设置条件,但无法弄清楚如何获得输出。

2 个答案:

答案 0 :(得分:1)

我认为这就是你要做的事情:

df1

#    index some_variable identifier1 identifier2
# 0      1             x         AB2         AB3
# 1      2             x         BB2         BB3
# 2      3             x         CB2         CB3
# 3      4             y         DB2         DB3
# 4      5             y         EB2         EB3

df2

#    index some_variable identifier1 identifier2 identifier3
# 0      1             x         AB5         AB3         AB3
# 1      2             x         BB5         BB2         AB2
# 2      3             x         CB5         CB2         AB5
# 3      4             y         DB5         DB3         AB3
# 4      5             y         EB5         EB3         AB3

idx = df1['identifier1'].isin(df2['identifier2']) & (df1['some_variable'] == 'x')
df1.loc[idx, 'identifier2'] = df2['identifier3']

df1

#    index some_variable identifier1 identifier2
# 0      1             x         AB2         AB3
# 1      2             x         BB2         AB2
# 2      3             x         CB2         AB5
# 3      4             y         DB2         DB3
# 4      5             y         EB2         EB3

答案 1 :(得分:0)

类似以下内容(尽管可能有更简单的方法)

d1 = {'some_variable':['x','x','x','y','y'], 'identifier1':['AB2','BB2','CB2','DB2','EB2'], 'identifier2':['AB3','BB3','CB3','DB3','EB3']}
df = pd.DataFrame(d1)

d2 = {'some_variable':['x','x','x','y','y'], 'identifier1':['AB5','BB5','CB5','DB5','EB5'], 'identifier2':['AB3','BB2','CB2','DB3','EB3'], 'identifier3':['AB3','AB2','AB5','AB3','AB3']}
dfa = pd.DataFrame(d2)

df['identifier2'][(df['identifier1'].isin(dfa['identifier2']) & (df['some_variable'] == 'x'))] = dfa['identifier3'][
    (df['identifier1'].isin(dfa['identifier2']) & (df['some_variable'] == 'x'))]