我看到此网站here中的要求。从那个帖子借来的想法,但在我的情况下不起作用。我正在从Excel工作表中读取一些数据,并尝试将其转换为具有列和行索引的Pandas数据框。
第一行是Excel的年份标题,我尝试通过进行df.columns=df.iloc[0]
因此,当我运行df.columns
时,它会重新运行:Index([None, 2014.0, 2015.0, 2016.0, 2017.0, 2018.0], dtype='object', name=0)
我现在的问题是转换以Month名称作为行名称的列。我尝试过
df.set_index('None',inplace=True)
但这返回KeyError: 'None'
编辑: 添加示例数据here
更新:我通过df.columns = ['Month', 2014, 2015, 2016, 2017, 2018]
和df.drop(df.index[0])
答案 0 :(得分:1)
对于我来说,工作很好,添加2个参数-index_col=[0]
用于将第一列转换为index
,而usecols
与range
一起用于选择所有不包含Unnamed
列的列:
df = pd.read_excel('sample.xlsx', usecols=range(1, 100))
print (df)
Unnamed: 0 2014 2015 2016 2017 2018
0 Jan 42.9 47.2 43.000000 43.00 48.98
1 Feb 36.6 45.0 40.300000 43.00 45.92
2 Mar 37.8 42.8 44.805668 43.00 43.00
3 Apr 40.9 44.4 43.900000 41.30 44.46
4 May 40.5 47.1 44.200000 41.97 42.31
5 Jun 41.8 46.9 44.600000 45.70 NaN
6 Jul 40.5 45.0 43.500000 45.49 NaN
7 Aug 44.3 45.0 43.800000 44.59 NaN
8 Sep 43.8 47.3 47.600000 47.25 NaN
9 Oct 44.2 47.0 47.600000 50.08 NaN
10 Nov 44.2 43.7 50.078663 50.93 NaN
11 Dec 48.8 45.5 46.500000 48.37 NaN
df = pd.read_excel('sample.xlsx', index_col=[0], usecols = range(1, 100))
print (df)
2014 2015 2016 2017 2018
Jan 42.9 47.2 43.000000 43.00 48.98
Feb 36.6 45.0 40.300000 43.00 45.92
Mar 37.8 42.8 44.805668 43.00 43.00
Apr 40.9 44.4 43.900000 41.30 44.46
May 40.5 47.1 44.200000 41.97 42.31
Jun 41.8 46.9 44.600000 45.70 NaN
Jul 40.5 45.0 43.500000 45.49 NaN
Aug 44.3 45.0 43.800000 44.59 NaN
Sep 43.8 47.3 47.600000 47.25 NaN
Oct 44.2 47.0 47.600000 50.08 NaN
Nov 44.2 43.7 50.078663 50.93 NaN
Dec 48.8 45.5 46.500000 48.37 NaN
或选择第二列作为索引并删除列Unnamed: 0
:
df = pd.read_excel('sample.xlsx', index_col=[1])
print (df)
Unnamed: 0 2014 2015 2016 2017 2018
Jan NaN 42.9 47.2 43.000000 43.00 48.98
Feb NaN 36.6 45.0 40.300000 43.00 45.92
Mar NaN 37.8 42.8 44.805668 43.00 43.00
Apr NaN 40.9 44.4 43.900000 41.30 44.46
May NaN 40.5 47.1 44.200000 41.97 42.31
Jun NaN 41.8 46.9 44.600000 45.70 NaN
Jul NaN 40.5 45.0 43.500000 45.49 NaN
Aug NaN 44.3 45.0 43.800000 44.59 NaN
Sep NaN 43.8 47.3 47.600000 47.25 NaN
Oct NaN 44.2 47.0 47.600000 50.08 NaN
Nov NaN 44.2 43.7 50.078663 50.93 NaN
Dec NaN 48.8 45.5 46.500000 48.37 NaN
df = pd.read_excel('sample.xlsx', index_col=[1]).drop('Unnamed: 0', axis=1)
print (df)
2014 2015 2016 2017 2018
Jan 42.9 47.2 43.000000 43.00 48.98
Feb 36.6 45.0 40.300000 43.00 45.92
Mar 37.8 42.8 44.805668 43.00 43.00
Apr 40.9 44.4 43.900000 41.30 44.46
May 40.5 47.1 44.200000 41.97 42.31
Jun 41.8 46.9 44.600000 45.70 NaN
Jul 40.5 45.0 43.500000 45.49 NaN
Aug 44.3 45.0 43.800000 44.59 NaN
Sep 43.8 47.3 47.600000 47.25 NaN
Oct 44.2 47.0 47.600000 50.08 NaN
Nov 44.2 43.7 50.078663 50.93 NaN
Dec 48.8 45.5 46.500000 48.37 NaN
答案 1 :(得分:0)
您可以通过以下方式重命名列:
df.columns = ['None',2014.0,2015.0,2016.0,2017.0,2018.0]
现在您的命令应该可以使用
答案 2 :(得分:0)
尝试这种方式
df.set_index(df.None)
答案 3 :(得分:-1)
将列名设置为“ 无”时,您无法将其设置为索引,因此要将该列设置为索引,请首先重命名该列。
df.columns.values[0]='First'
然后将其设置为-:
df.set_index('First')