使用Series查找表替换Pandas DataFrame列中的值

时间:2016-06-17 00:31:53

标签: python pandas dataframe

我希望将DataFrame中的一列值替换为由我准备的系列形式的查找表生成的更准确/完整的值集。

我以为我可以这样做,但结果并不像预期的那样。

这是我要修复的DataFrame:

In [6]: df_normalised.head(10)
Out[6]: 
  code                                          name
0    8                             Human development
1   11                                              
2    1                           Economic management
3    6         Social protection and risk management
4    5                         Trade and integration
5    2                      Public sector governance
6   11  Environment and natural resources management
7    6         Social protection and risk management
8    7                   Social dev/gender/inclusion
9    7                   Social dev/gender/inclusion

(注意第2行中缺少的名字)。

这是我创建的用于修复的查找表:

In [20]: names
Out[20]: 
1                              Economic management
10                               Rural development
11    Environment and natural resources management
2                         Public sector governance
3                                      Rule of law
4         Financial and private sector development
5                            Trade and integration
6            Social protection and risk management
7                      Social dev/gender/inclusion
8                                Human development
9                                Urban development
dtype: object

以下是我认为可以做到的方式:

In [21]: names[df_normalised.head(10).code]
Out[21]: 
code
8                                Human development
11    Environment and natural resources management
1                              Economic management
6            Social protection and risk management
5                            Trade and integration
2                         Public sector governance
11    Environment and natural resources management
6            Social protection and risk management
7                      Social dev/gender/inclusion
7                      Social dev/gender/inclusion
dtype: object

但是,我期望上面得到的系列具有与df_normalised(即0,1,2,3)的索引相同的索引,而不是基于代码值的索引。

所以我不确定如何替换'名称中的原始值。 df_normalised中的列包含这些系列值,因为索引不相同。

顺便说一句,如何使索引具有上述重复值?

2 个答案:

答案 0 :(得分:4)

您可以使用map()功能:

In [38]: df_normalised['name'] = df_normalised['code'].map(name)

In [39]: df_normalised
Out[39]:
   code                                          name
0     8                             Human development
1    11  Environment and natural resources management
2     1                           Economic management
3     6         Social protection and risk management
4     5                         Trade and integration
5     2                      Public sector governance
6    11  Environment and natural resources management
7     6         Social protection and risk management
8     7                   Social dev/gender/inclusion
9     7                   Social dev/gender/inclusion

答案 1 :(得分:0)

这很有效。但是,我很确定必须有一种更简单的方法。

In [50]: df_normalised.name = pd.Series(names[df_normalised.code].values)

In [51]: df_normalised.head(10)
Out[51]: 
  code                                          name
0    8                             Human development
1   11  Environment and natural resources management
2    1                           Economic management
3    6         Social protection and risk management
4    5                         Trade and integration
5    2                      Public sector governance
6   11  Environment and natural resources management
7    6         Social protection and risk management
8    7                   Social dev/gender/inclusion
9    7                   Social dev/gender/inclusion