使用`map`

Question

我有一个如下所示的参考DataFrame：

    Variables   Key Values  
0   GRTYPE      40  Total exclusions 4-year schools
1   GRTYPE      2   4-year institutions, Adjusted cohort
2   GRTYPE      3   4-year institutions, Completers 
41  CHRTSTAT    2   Revised cohort
42  CHRTSTAT    3   Exclusions
43  CHRTSTAT    4   Adjusted cohort 
57  SECTION     12  Bachelors/ equiv .
58  SECTION     23  Bachelors or equiv 2009 .

，我想使用参考数据框替换下面的主DataFrame中的值：

    GRTYPE      CHRTSTAT  SECTION
0   40             2    12      
1   2              3    12      
2   2              4    23      
3   3              2    12  
4   3              3    23

最终结果将是：

    GRTYPE                                CHRTSTAT          SECTION
0   Total exclusions 4-year schools         Revised cohort       Bachelors/ equiv . 
1   4-year institutions, Adjusted cohort    Exclusions           Bachelors/ equiv .         
2   4-year institutions, Adjusted cohort    Adjusted cohort      Bachelors or equiv 2009 .      
3   4-year institutions, Completers         Revised cohort       Bachelors/ equiv . 
4   4-year institutions, Completers         Exclusions           Bachelors or equiv 2009 .

在pandas或python中执行此操作的最佳方法是什么？我尝试从第一个数据帧中加入变量并从中提取变量，然后在第二个数据帧中循环，但没有得到任何结果。

Answer 1

使用`map`

您需要将Variables和Key设置为映射数据框的索引，然后只需在列上使用 map 。

mapping_df = mapping_df.set_index(['Variables', 'Key'])
df = df.apply(lambda x: x.map(mapping_df.loc[x.name]['Values']))

与以下相同：

mapping_df = mapping_df.set_index(['Variables', 'Key'])
df['GRTYPE'] = df.GRTYPE.map(mapping_df.loc['GRTYPE']['Values'])
df['CHRTSTAT'] = df.CHRTSTAT.map(mapping_df.loc['CHRTSTAT']['Values'])
df['SECTION'] = df.SECTION.map(mapping_df.loc['SECTION']['Values'])

输出：

                                 GRTYPE         CHRTSTAT                    SECTION
0       Total exclusions 4-year schools   Revised cohort         Bachelors/ equiv .
1  4-year institutions, Adjusted cohort       Exclusions         Bachelors/ equiv .
2  4-year institutions, Adjusted cohort  Adjusted cohort  Bachelors or equiv 2009 .
3       4-year institutions, Completers   Revised cohort         Bachelors/ equiv .
4       4-year institutions, Completers       Exclusions  Bachelors or equiv 2009 .

Answer 2

使用`defualtdict`

from collections import defaultdict

d = defaultdict(dict)
for i, k, v in df1.itertuples(index=False):
    d[i][k] = v

pd.DataFrame(dict(zip(df2, [[d[i][k] for k in df2[i]] for i in df2])), df2.index)

                                 GRTYPE         CHRTSTAT                    SECTION
0       Total exclusions 4-year schools   Revised cohort         Bachelors/ equiv .
1  4-year institutions, Adjusted cohort       Exclusions         Bachelors/ equiv .
2  4-year institutions, Adjusted cohort  Adjusted cohort  Bachelors or equiv 2009 .
3       4-year institutions, Completers   Revised cohort         Bachelors/ equiv .
4       4-year institutions, Completers       Exclusions  Bachelors or equiv 2009 .

`apply`

df2.apply(
    lambda s: s.apply(
        lambda x, n: df1.set_index(['Variables', 'Key']).Values[(n, x)], n=s.name
    )
)

                                 GRTYPE         CHRTSTAT                    SECTION
0       Total exclusions 4-year schools   Revised cohort         Bachelors/ equiv .
1  4-year institutions, Adjusted cohort       Exclusions         Bachelors/ equiv .
2  4-year institutions, Adjusted cohort  Adjusted cohort  Bachelors or equiv 2009 .
3       4-year institutions, Completers   Revised cohort         Bachelors/ equiv .
4       4-year institutions, Completers       Exclusions  Bachelors or equiv 2009 .

通过从其他数据框中查找替换列中的熊猫值

2 个答案:

使用`map`

使用`defualtdict`

`apply`

通过从其他数据框中查找替换列中的熊猫值

2 个答案:

使用map

使用defualtdict

apply

使用`map`

使用`defualtdict`

`apply`