为什么python的应用功能有时可以,有时不能改变数据帧的价值?

时间:2017-04-27 07:45:36

标签: python python-3.x pandas numpy

def replace_name(row):
    if row['Country Name'] == 'Korea, Rep.':
        row['Country Name'] = 'South Korea'
    if row['Country Name'] == 'Iran, Islamic Rep.':
        row['Country Name'] = 'Iran'
    if row['Country Name'] == 'Hong Kong SAR, China':
        row['Country Name'] = 'Hong Kong'
    return row

GDP.apply(replace_name, axis = 1)

GDP是一个< pd.DataFrame'

在这个时候,当我想找到韩国'它不起作用时,这个名字仍然是“韩国,众议院”,

但如果我将代码中的最后一行更改为此

GDP = GDP.apply(replace_name, axis = 1)

它有效。

起初,我认为原因是'申请'功能不能改变GDP本身,但当我处理另一个数据帧时,它实际上是有效的。代码如下:

def change_name(row):
    if row['Country'] == "Republic of Korea":
        row['Country'] = 'South Korea'
    if row['Country'] == 'United States of America':
        row['Country'] = 'United States'
    if row['Country'] == 'United Kingdom of Great Britain and Northern Ireland':
        row['Country']  ='United Kingdom'
    if row['Country'] == 'China, Hong Kong Special Administrative Region':
        row['Country'] = 'Hong Kong'
    return row

energy.apply(change_name, axis = 1)
能源也是一个“pd.dataframe'。

。”

这次我搜索“美国”时,它的确有效。原始名称是“美利坚合众国”,因此它成功更改了名称。

能源和GDP之间的唯一区别是能源是从excel文件中读取的,而GDP是从CSV文件中读取的。那导致不同结果的原因是什么?

1 个答案:

答案 0 :(得分:1)

我认为更好的是使用replace

d = {'Korea, Rep.':'South Korea', 'Iran, Islamic Rep.':'Iran', 
     'Hong Kong SAR, China':'Hong Kong'}
GDP['Country Name'] = GDP['Country Name'].replace(d, regex=True)

因为差异可能是数据中的一些空白,可能有帮助:

GDP['Country'] = GDP['Country'].str.strip()

样品:

GDP = pd.DataFrame({'Country Name':[' Korea, Rep. ','a','Iran, Islamic Rep.','United States of America','s','United Kingdom of Great Britain and Northern Ireland'],
                    'Country':     ['s','Hong Kong SAR, China','United States of America','Hong Kong SAR, China','s','f']})

#print (GDP)

d = {'Korea, Rep.':'South Korea', 'Iran, Islamic Rep.':'Iran', 
     'United Kingdom of Great Britain and Northern Ireland':'United Kingdom',
     'Hong Kong SAR, China':'Hong Kong', 'United States of America':'United States'}

#replace by columns
#GDP['Country Name'] = GDP['Country Name'].replace(d, regex=True)
#GDP['Country'] = GDP['Country'].replace(d, regex=True)

#replace multiple columns
GDP[['Country Name','Country']] = GDP[['Country Name','Country']].replace(d, regex=True)
print (GDP)
         Country    Country Name
0              s     South Korea
1      Hong Kong               a
2  United States            Iran
3      Hong Kong   United States
4              s               s
5              f  United Kingdom