更改数据框列的特定值,最有效的方法是什么?

时间:2018-08-13 11:52:53

标签: python pandas dataframe split

我需要更改dataframe列中特定项目的值,我使用了for循环手动进行此操作,有没有一种使用成语或.where的方法,效率更高?我认为下面的代码不是最好的方法...

# change the names of the countries as requested
for index, row in energy.iterrows(): #change the name of specific 
countries
if energy.loc[index, ['Country']].str.contains('United States of 
America').bool():
    energy.loc[index, ['Country']] = 'United States'
    print(energy.loc[index, ['Country']])

if energy.loc[index, ['Country']].str.contains('Republic of 
Korea').bool():
    energy.loc[index, ['Country']] = 'South Korea'
    print(energy.loc[index, ['Country']])

if energy.loc[index, ['Country']].str.contains('United Kingdom of Great 
Britain and Northern Ireland').bool():
    energy.loc[index, ['Country']] = 'United Kingdom'
    print(energy.loc[index, ['Country']])

if energy.loc[index, ['Country']].str.contains('China, Hong Kong 
Special Administrative Region').bool():
    energy.loc[index, ['Country']] = 'Hong Kong'
    print(energy.loc[index, ['Country']])

3 个答案:

答案 0 :(得分:0)

您可以使用np.where

energy['Country'] = np.where(energy['Country'] == 'United States of America', 'United States', energy['Country'] )
energy['Country'] = np.where(energy['Country'] == 'Republic of Korea', 'Korea', energy['Country'])

或者:

energy['Country'][energy['Country'] == 'United States of America'] = 'United States'
energy['Country'][energy['Country'] == 'Republic of Korea'] = 'Korea'

df:

                    Country
0  United States of America
1                     Spain
2         Republic of Korea
3                    France

输出:

         Country
0  United States
1          Spain
2          Korea
3         France

答案 1 :(得分:0)

您可以使用映射声明一个字典,然后使用map

例如:

import pandas as pd

mapVal = {'United States of America': 'United States', 'Republic of Korea': 'South Korea', 'United Kingdom of Great Britain and Northern Ireland': 'United Kingdom', 'China': 'Hong Kong', 'Hong Kong Special Administrative Region': 'Hong Kong'}    #Sample Mapping

df = pd.DataFrame({'Country': ['United States of America', 'Republic of Korea', 'United Kingdom of Great Britain and Northern Ireland', 'China', 'Hong Kong Special Administrative Region']})
df["newVal"] = df["Country"].map(mapVal)          #df["Country"] = df["Country"].map(mapVal)
print(df)

输出:

                                             Country          newVal
0                           United States of America   United States
1                                  Republic of Korea     South Korea
2  United Kingdom of Great Britain and Northern I...  United Kingdom
3                                              China       Hong Kong
4            Hong Kong Special Administrative Region       Hong Kong

答案 2 :(得分:0)

您可以使用Pandas replace()方法:

energy
                                             Country
0                           United States of America
1                                  Republic of Korea
2  United Kingdom of Great Britain and Northern I...
3     China, Hong Kong Special Administrative Region

energy.replace(rep_map)
          Country
0   United States
1     South Korea
2  United Kingdom
3       Hong Kong

请注意,replace()将替换数据帧中这些字符串的所有实例。

数据:

countries = ["United States of America", 
             "Republic of Korea", 
             "United Kingdom of Great Britain and Northern Ireland", 
             "China, Hong Kong Special Administrative Region"]
replacements = ["United States", "South Korea", "United Kingdom", "Hong Kong"]
rep_map = {k:v for k, v in zip(countries, replacements)}
energy = pd.DataFrame({"Country": countries})