我有一个如下所示的参考DataFrame:
Variables Key Values
0 GRTYPE 40 Total exclusions 4-year schools
1 GRTYPE 2 4-year institutions, Adjusted cohort
2 GRTYPE 3 4-year institutions, Completers
41 CHRTSTAT 2 Revised cohort
42 CHRTSTAT 3 Exclusions
43 CHRTSTAT 4 Adjusted cohort
57 SECTION 12 Bachelors/ equiv .
58 SECTION 23 Bachelors or equiv 2009 .
,我想使用参考数据框替换下面的主DataFrame中的值:
GRTYPE CHRTSTAT SECTION
0 40 2 12
1 2 3 12
2 2 4 23
3 3 2 12
4 3 3 23
最终结果将是:
GRTYPE CHRTSTAT SECTION
0 Total exclusions 4-year schools Revised cohort Bachelors/ equiv .
1 4-year institutions, Adjusted cohort Exclusions Bachelors/ equiv .
2 4-year institutions, Adjusted cohort Adjusted cohort Bachelors or equiv 2009 .
3 4-year institutions, Completers Revised cohort Bachelors/ equiv .
4 4-year institutions, Completers Exclusions Bachelors or equiv 2009 .
在pandas或python中执行此操作的最佳方法是什么?我尝试从第一个数据帧中加入变量并从中提取变量,然后在第二个数据帧中循环,但没有得到任何结果。
答案 0 :(得分:5)
map
您需要将Variables
和Key
设置为映射数据框的索引,然后只需在列上使用 map 。
mapping_df = mapping_df.set_index(['Variables', 'Key'])
df = df.apply(lambda x: x.map(mapping_df.loc[x.name]['Values']))
与以下相同:
mapping_df = mapping_df.set_index(['Variables', 'Key'])
df['GRTYPE'] = df.GRTYPE.map(mapping_df.loc['GRTYPE']['Values'])
df['CHRTSTAT'] = df.CHRTSTAT.map(mapping_df.loc['CHRTSTAT']['Values'])
df['SECTION'] = df.SECTION.map(mapping_df.loc['SECTION']['Values'])
输出:
GRTYPE CHRTSTAT SECTION
0 Total exclusions 4-year schools Revised cohort Bachelors/ equiv .
1 4-year institutions, Adjusted cohort Exclusions Bachelors/ equiv .
2 4-year institutions, Adjusted cohort Adjusted cohort Bachelors or equiv 2009 .
3 4-year institutions, Completers Revised cohort Bachelors/ equiv .
4 4-year institutions, Completers Exclusions Bachelors or equiv 2009 .
答案 1 :(得分:2)
defualtdict
from collections import defaultdict
d = defaultdict(dict)
for i, k, v in df1.itertuples(index=False):
d[i][k] = v
pd.DataFrame(dict(zip(df2, [[d[i][k] for k in df2[i]] for i in df2])), df2.index)
GRTYPE CHRTSTAT SECTION
0 Total exclusions 4-year schools Revised cohort Bachelors/ equiv .
1 4-year institutions, Adjusted cohort Exclusions Bachelors/ equiv .
2 4-year institutions, Adjusted cohort Adjusted cohort Bachelors or equiv 2009 .
3 4-year institutions, Completers Revised cohort Bachelors/ equiv .
4 4-year institutions, Completers Exclusions Bachelors or equiv 2009 .
apply
df2.apply(
lambda s: s.apply(
lambda x, n: df1.set_index(['Variables', 'Key']).Values[(n, x)], n=s.name
)
)
GRTYPE CHRTSTAT SECTION
0 Total exclusions 4-year schools Revised cohort Bachelors/ equiv .
1 4-year institutions, Adjusted cohort Exclusions Bachelors/ equiv .
2 4-year institutions, Adjusted cohort Adjusted cohort Bachelors or equiv 2009 .
3 4-year institutions, Completers Revised cohort Bachelors/ equiv .
4 4-year institutions, Completers Exclusions Bachelors or equiv 2009 .