数据帧small_df看起来像这样:
> smaller_df.head()
MSA Code Line RPP
0 10180 1.0 91.2
1 10180 2.0 97.4
2 10180 3.0 78.7
3 10180 4.0 93.5
4 10420 1.0 90.4
...
smaller_df.dtypes结果
MSA Code int64
Line float64
RPP float64
Wages object
dtype: object
wage_keys.head()给出:
MSA Code Average Wage
0 11260 94490.000000
1 21820 72080.000000
2 10180 71128.571429
3 13820 87338.396624
4 10420 76620.000000
...
wage_keys.dtypes是:
MSA Code int64
Average Wage float64
dtype: object
请注意,相同的“ MSA代码”在small_df中可以出现多次,而在工资密钥中则只能出现一次。
我希望将small_df中的新列“工资”设置为工资键中的相应值。
因此新表应如下所示:
MSA Code Line RPP Wages
0 10180 1.0 91.2 71128.571429
1 10180 2.0 97.4 71128.571429
2 10180 3.0 78.7 71128.571429
3 10180 4.0 93.5 71128.571429
4 10420 1.0 90.4 76620.000000
...
我有以下代码通过绘制工资字典来进行映射:
wages = wage_keys.set_index('MSA Code').to_dict()
smaller_df['Wages'] = smaller_df['MSA Code'].map(wages)
问题是这样导致的:
MSA Code Line RPP Wages
0 10180 1.0 91.2 NaN
1 10180 2.0 97.4 NaN
2 10180 3.0 78.7 NaN
3 10180 4.0 93.5 NaN
4 10420 1.0 90.4 NaN
很明显,我缺少了一些东西。如何获取“工资”列中的值以将其设置为工资字典(或工资_关键数据框)中正确的对应值?
答案 0 :(得分:1)
您的错误在于转换为字典。你做到了,
HAVING COUNT(*) = 1 AND MIN(COST::NUMERIC) < 75;
这将导致一则dict-of-dict。你应该做的是,
df2.set_index('MSA Code').to_dict()
# {
# "Average Wage": {
# "10180": 71128.571429,
# "10420": 76620.0,
# "11260": 94490.0,
# "13820": 87338.396624,
# "21820": 72080.0
# }
# }
或者,
df2.set_index('MSA Code')['Average Wage'].to_dict()
# {11260: 94490.0, 21820: 72080.0, 10180: 71128.571429, 13820: 87338.396624, 10420: 76620.0}
两者均产生df2.set_index('MSA Code')['Average Wage']
MSA Code
11260 94490.000000
21820 72080.000000
10180 71128.571429
13820 87338.396624
10420 76620.000000
Name: Average Wage, dtype: float64
理解的映射格式。现在,您的map
调用会产生预期的输出:
map