在pandas

时间:2017-01-25 08:16:32

标签: python pandas

我有一个像这样的pandas数据框

    Data Source   World Development Indicators  Unnamed: 2                         Unnamed: 3        Unnamed: 4        Unnamed: 5
    Country Name         Country Code         Indicator Name                     Indicator Code     1.960000e+03      1.961000e+03  
    Aruba                    ABW         GDP at market prices (constant 2010 US$)   NY.GDP.MKTP.KD           NaN             NaN    

要将第一行转换为其列,我使用代码

data.columns = data.iloc [0]

结果,数据数据框被修改为

Country Name    Country Code    Indicator Name  Indicator Code     1960.0         1961.0        1962.0
Country Name    Country Code    Indicator Name  Indicator Code  1.960000e+03    1.961000e+03
Aruba   ABW GDP at market prices (constant 2010 US$)    NY.GDP.MKTP.KD  NaN           NaN

现在我的主要问题是对于带有多年作为标题的列,我希望得到1960.0,我想成为一个sintegers即1960.任何有关此的帮助将不胜感激

2 个答案:

答案 0 :(得分:1)

选项1

def rn(x):
    try:
        return '{:0.0f}'.format(x)
    except:
        return x

df.T.set_index(0).rename_axis(rn).T

enter image description here

答案 1 :(得分:1)

如果从skiprows创建header,则另一种可能的解决方案是将参数DataFramecsv添加到read_csv

import pandas as pd
import numpy as np
from pandas.compat import StringIO

temp=u"""Data Source;World Development Indicators;Unnamed: 2;Unnamed: 3;Unnamed: 4;Unnamed: 5
Country Name;Country Code;Indicator Name;Indicator Code;1960;1961
Aruba;ABW;GDP at market prices (constant 2010 US$);NY.GDP.MKTP.KD;NaN;NaN"""
#after testing replace 'StringIO(temp)' to 'filename.csv'
df = pd.read_csv(StringIO(temp), sep=";", skiprows=1)
print (df)
  Country Name Country Code                            Indicator Name  \
0        Aruba          ABW  GDP at market prices (constant 2010 US$)   

   Indicator Code  1960  1961  
0  NY.GDP.MKTP.KD   NaN   NaN 

df = pd.read_csv(StringIO(temp), sep=";", header=1)
print (df)
  Country Name Country Code                            Indicator Name  \
0        Aruba          ABW  GDP at market prices (constant 2010 US$)   

   Indicator Code  1960  1961  
0  NY.GDP.MKTP.KD   NaN   NaN  

如果无法做到,请检查完美MaxU solution并添加df = df[1:]以从数据中删除第一行。