通过从其他数据框中查找替换列中的熊猫值

时间:2018-08-08 18:42:00

标签: python pandas

我有一个如下所示的参考DataFrame:

    Variables   Key Values  
0   GRTYPE      40  Total exclusions 4-year schools
1   GRTYPE      2   4-year institutions, Adjusted cohort
2   GRTYPE      3   4-year institutions, Completers 
41  CHRTSTAT    2   Revised cohort
42  CHRTSTAT    3   Exclusions
43  CHRTSTAT    4   Adjusted cohort 
57  SECTION     12  Bachelors/ equiv .
58  SECTION     23  Bachelors or equiv 2009 .

,我想使用参考数据框替换下面的主DataFrame中的值:

    GRTYPE      CHRTSTAT  SECTION
0   40             2    12      
1   2              3    12      
2   2              4    23      
3   3              2    12  
4   3              3    23  

最终结果将是:

    GRTYPE                                CHRTSTAT          SECTION
0   Total exclusions 4-year schools         Revised cohort       Bachelors/ equiv . 
1   4-year institutions, Adjusted cohort    Exclusions           Bachelors/ equiv .         
2   4-year institutions, Adjusted cohort    Adjusted cohort      Bachelors or equiv 2009 .      
3   4-year institutions, Completers         Revised cohort       Bachelors/ equiv . 
4   4-year institutions, Completers         Exclusions           Bachelors or equiv 2009 .  

在pandas或python中执行此操作的最佳方法是什么?我尝试从第一个数据帧中加入变量并从中提取变量,然后在第二个数据帧中循环,但没有得到任何结果。

2 个答案:

答案 0 :(得分:5)

使用map

您需要将VariablesKey设置为映射数据框的索引,然后只需在列上使用 map

mapping_df = mapping_df.set_index(['Variables', 'Key'])
df = df.apply(lambda x: x.map(mapping_df.loc[x.name]['Values']))

与以下相同:

mapping_df = mapping_df.set_index(['Variables', 'Key'])
df['GRTYPE'] = df.GRTYPE.map(mapping_df.loc['GRTYPE']['Values'])
df['CHRTSTAT'] = df.CHRTSTAT.map(mapping_df.loc['CHRTSTAT']['Values'])
df['SECTION'] = df.SECTION.map(mapping_df.loc['SECTION']['Values'])

输出:

                                 GRTYPE         CHRTSTAT                    SECTION
0       Total exclusions 4-year schools   Revised cohort         Bachelors/ equiv .
1  4-year institutions, Adjusted cohort       Exclusions         Bachelors/ equiv .
2  4-year institutions, Adjusted cohort  Adjusted cohort  Bachelors or equiv 2009 .
3       4-year institutions, Completers   Revised cohort         Bachelors/ equiv .
4       4-year institutions, Completers       Exclusions  Bachelors or equiv 2009 .

答案 1 :(得分:2)

使用defualtdict

from collections import defaultdict

d = defaultdict(dict)
for i, k, v in df1.itertuples(index=False):
    d[i][k] = v

pd.DataFrame(dict(zip(df2, [[d[i][k] for k in df2[i]] for i in df2])), df2.index)

                                 GRTYPE         CHRTSTAT                    SECTION
0       Total exclusions 4-year schools   Revised cohort         Bachelors/ equiv .
1  4-year institutions, Adjusted cohort       Exclusions         Bachelors/ equiv .
2  4-year institutions, Adjusted cohort  Adjusted cohort  Bachelors or equiv 2009 .
3       4-year institutions, Completers   Revised cohort         Bachelors/ equiv .
4       4-year institutions, Completers       Exclusions  Bachelors or equiv 2009 .

apply

df2.apply(
    lambda s: s.apply(
        lambda x, n: df1.set_index(['Variables', 'Key']).Values[(n, x)], n=s.name
    )
)

                                 GRTYPE         CHRTSTAT                    SECTION
0       Total exclusions 4-year schools   Revised cohort         Bachelors/ equiv .
1  4-year institutions, Adjusted cohort       Exclusions         Bachelors/ equiv .
2  4-year institutions, Adjusted cohort  Adjusted cohort  Bachelors or equiv 2009 .
3       4-year institutions, Completers   Revised cohort         Bachelors/ equiv .
4       4-year institutions, Completers       Exclusions  Bachelors or equiv 2009 .