替换缺失值&使用Numpy和Pandas更新数据框中的旧值

时间:2017-10-21 23:32:26

标签: python pandas numpy dataframe

我试图替换' ...'反映的缺失值。在我的数据框中,np.nan值。 我还想更新一些旧值,但我的方法似乎不起作用。

这是我的代码:

import numpy as np 
import pandas as pd 


def func():
    energy=pd.ExcelFile('Energy Indicators.xls').parse('Energy')
    energy=energy.iloc[16:][['Environmental Indicators: Energy','Unnamed: 3','Unnamed: 4','Unnamed: 5']].copy()
    energy.columns=['Country', 'Energy Supply', 'Energy Supply per Capita', '% Renewable']
    o="..."
    n=np.NaN

    # Trying to replace missing values with np.nan values 
    energy[energy['Energy Supply']==o]=n


    energy['Energy Supply']=energy['Energy Supply']*1000000


    # Here, I want to replace old values by new ones ==> Same problem 
    old=["Republic of Korea","United States of America","United Kingdom of " 
                                +"Great Britain and Northern Ireland","China, Hong "
                                +"Kong Special Administrative Region"]
    new=["South Korea","United States","United Kingdom","Hong Kong"]
    for i in range(0,4):


        energy[energy['Country']==old[i],'Country']=new[i]


    return energy

以下是我正在处理的.xls文件:https://drive.google.com/file/d/0B80lepon1RrYeDRNQVFWYVVENHM/view?usp=sharing

1 个答案:

答案 0 :(得分:1)

我使用基于正则表达式的energy = energy.replace(r'\s*\.+\s*', np.nan, regex=True) 执行此操作:

energy = energy.replace('...', np.nan, regex=False)

MaxU提出了一个alternative,如果你的单元格除了点之外不包含任何特殊/空白字符,那么它将起作用。

mysql_select_db("login")