熊猫-通过在另一个数据框中查找来替换值

时间:2018-12-17 15:39:26

标签: python pandas dataframe

我在使用Python3的pandas数据框中要解决一个问题。我有两个数据框-第一个是as;

    ID Name  Linked Model 1  Linked Model 2  Linked Model 3
0  100    A          1111.0          1112.0             NaN
1  101    B          1112.0          1113.0          1115.0
2  102    C             NaN             NaN             NaN
3  103    D          1114.0             NaN             NaN
4  104    E          1114.0          1111.0          1112.0

第二个是

   Model ID Name
0      1111    A
1      1112  A,B
2      1113    C
3      1114    D
4      1115    Q
5      1116    Z
6      1117    E
7      1118    W

因此,代码应在-例如,在 Linked Model 1 列中查找值,并在第二个数据框中的 Name 列中找到相应的值,以便ID可以像结果中所示那样用名称替换;

enter image description here

因此,如您在结果输出中看到的,None保持为None(可以替换为numpy N / As),并且第二个数据框中的名称现在被替换为它们对应的 Model ID 第一个数据帧。

我期待听到您的解决方案!

谢谢

2 个答案:

答案 0 :(得分:3)

初始化替换词典,然后使用df.replace将这些ID映射到Names。

m = df2.set_index('Model ID')['Name'].to_dict()
v = df.filter(like='Linked Model')
df[v.columns] = v.replace(m)

df
    ID Name Linked Model 1 Linked Model 2 Linked Model 3
0  100    A              A            A,B            NaN
1  101    B            A,B              C              Q
2  102    C            NaN            NaN            NaN
3  103    D              D            NaN            NaN
4  104    E              D              A            A,B

答案 1 :(得分:0)

首次尝试回答python问题,因此虽然这肯定比Coldspeed的回答更长,但使用meltmergepivot功能对我来说更有意义。 / p>

import pandas as pd
import numpy as np

# make an object from the first dataset

df_1 = pd.DataFrame(
  {"ID" : [100, 101, 102, 103, 104],
  "Name" : ["A", "B", "C", "D", "E"],
  "Linked Model 1" : [1111, 1112, np.nan, 1114, 1114],
  "Linked Model 2" : [1112, 1113, np.nan, np.nan, 1111],
  "Linked Model 3" : [np.nan, 1115, np.nan, np.nan, 1112]})

# make an object for the second data set

df_2 = pd.DataFrame(
  {"Model ID" : [1111, 1112, 1113, 1114, 1115, 1116, 1117, 1118],
  "Name" : ["A", "A,B", "C", "D", "Q", "Z", "E", "W"]})

# tidy the data
df_1 = pd.melt(df_1, ["ID", "Name"]) 

# left join the second data set
df_1 = pd.merge(df_1, df_2, how='left', left_on='value', right_on='Model ID').reset_index()

#pivot the data back out to achieve the desired format
df_1 = df_1.pivot_table(index='ID', 
                        columns='variable', 
                        values='Name_y', 
                        aggfunc='first', 
                        dropna=False))

variable Linked Model 1 Linked Model 2 Linked Model 3
ID                                                   
100                   A            A,B            NaN
101                 A,B              C              Q
102                 NaN            NaN            NaN
103                   D            NaN            NaN
104                   D              A            A,B