Question

如何在一个数据帧与另一个查询数据帧之间合并。

这是我要替换值的数据框A：

  InfoType  IncidentType    DangerType
0   NaN          A             NaN
1   NaN          C             NaN
2   NaN          B            C
3   NaN          B            NaN

这是查找表：

    ID  ParamCode   ParamValue  ParmDesc1   ParamDesc2  SortOrder   ParamStatus
0   1   IncidentType    A       ABC            DEF          1            1
1   2   IncidentType    B       GHI            JKL          2            1
2   3   IncidentType    C       MNO            PQR          7            1
2   3   DangerType      C       STU            VWX          6            1

预期输入：

  InfoType  IncidentType    DangerType
0   NaN          ABC           NaN
1   NaN          MNO           NaN
2   NaN          GHI           STU
3   NaN          GHI           NaN

请注意，ParamCode是列名，我需要将ParamDesc1替换为数据框A中的各个列。数据框A中的每一列都可能具有NaN，并且我不打算删除它们。只是忽略它们。

这就是我所做的：

ntf_cols = ['InfoType','IncidentType','DangerType']
for c in ntf_cols:
    if (c in ntf.columns) & (c in param['ParamCode'].values):
        paramValue = param['ParamValue'].unique()
        for idx, pv in enumerate(paramValue):
            ntf['NewIncidentType'] = pd.np.where(ntf.IncidentType.str.contains(pv), param['ParmDesc1'].values, "whatever")

错误：

ValueError：操作数不能与形状一起广播（25，）（13，）（）

Answer 1

编辑：Lambda的answer给了我一个想法，您可以了解如何将此逻辑模式应用于许多列：

import pandas as pd

df1 = pd.DataFrame(dict(
    InfoType = [None, None, None, None],
    IncidentType = 'A C B B'.split(),
    DangerType = [None, None, 'C', None],
))

df2 = pd.DataFrame(dict(
    ParamCode = 'IncidentType IncidentType IncidentType DangerType'.split(),
    ParamValue  = 'A B C C'.split(),
    ParmDesc1 = 'ABC GHI MNO STU'.split(),
))


for col in df1.columns[1:]:
    dict_map = dict(
        df2[df2.ParamCode == col][['ParamValue','ParmDesc1']].to_records(index=False)
    )
    df1[col] = df1[col].replace(dict_map)

print(df1)

假设df1的第一列之后的每一列都是需要更新的列，并且要更新的列名作为值存在于'ParamCode'的{{1}}列中。 / p>

Python tutor link to code

可以使用一些自定义功能和pandas.Series.apply()解决此问题：

df2

step through the code in python tutor

很有可能有一种更有效的方法来执行此操作。希望一些知道它的人可以分享它。

此外，外部作用域对import pandas as pd def find_incident_type(x): if pd.isna(x): return x return df2[ (df2['ParamCode'] == 'IncidentType') & (df2['ParamValue']==x) ]["ParmDesc1"].values[0] def find_danger_type(x): if pd.isna(x): return x return df2[ (df2['ParamCode'] == 'DangerType') & (df2['ParamValue']==x) ]["ParmDesc1"].values[0] df1 = pd.DataFrame(dict( InfoType = [None, None, None, None], IncidentType = 'A C B B'.split(), DangerType = [None, None, 'C', None], )) df2 = pd.DataFrame(dict( ParamCode = 'IncidentType IncidentType IncidentType DangerType'.split(), ParamValue = 'A B C C'.split(), ParmDesc1 = 'ABC GHI MNO STU'.split(), )) df1['IncidentType'] = df1['IncidentType'].apply(find_incident_type) df1['DangerType'] = df1['DangerType'].apply(find_danger_type) print(df1)的引用被硬编码到自定义函数中，因此仅适用于外部作用域中的该变量名。如果您不希望这些函数依赖于该引用，则需要为df2的{{1}}参数使用参数。

Answer 2

使用查找表创建一个dict，然后替换原始数据帧的列值。假设原始数据帧为df1，查找表为df2

...
dict_map = dict(zip(df2.ParamCode + "-" + df2.ParamValue, df2.ParmDesc1))

df1['IncidentType'] = ("IncidentType" +'-'+ df1.IncidentType).replace(dict_map)
df1['DangerType'] = ("DangerType" +'-'+ df1.DangerType).replace(dict_map)
...

根据来自另一个数据框的值替换一个数据框的值

2 个答案: