我有excel试算表,每年都会添加一个新列。我想使用Pandas从电子表格中选择关键列,并创建一个显示最近X年结果的表格。我的代码运行,并且包含文本的列似乎按预期交换,但是数字数据丢失并替换为NaN。
从解决方案到stackoverflow等相关问题,似乎我应该将所需列的列表发送到数据帧的reindex方法。
def yearTable2(filename='5years.xlsx',SheetName='PartA',interactive=True,A_year=2018,nyears=3,debug=False):
"""Outputs latex code of table of nyears years of results for a given part's
module results
Input:filename is the excel file with the data in,
sheetname contains the data for the part to be tablularised
A_year is the current academic year"""
xl=pd.ExcelFile(filename)
df=xl.parse(SheetName)
df2=df.round(1) # rounds numeric data to 1 decimal place
if debug: print(df.head())
#Have data in df2, it probably has more years of data than really needed
# extract just the needed ones
# Build up list of column names in required order
column_list=["Module Name","Module Code"] # these are standard
# now generate the years required
for year in list(range(A_year,A_year-nyears,-1)):
list_item=str(year*1)
column_list.append(list_item)
print(column_list)
df3=df2.reindex(columns=column_list)
return (df3) # outputs pretty Jupyter table
我这样称呼:yearTable2(filename='Test.xlsx',SheetName='PartC',debug=True)
其中Test.xlsx是一个示例文件,具有以下内容:
|Module Code|Module Name|2013|2014|2015|2016|2017|2018|
______________________________________________________
|abc |Harry | 23 | 45 | 32 | 54 | 56 | 12 |
|fgr |Jannice | 28 | 65 | 21 | 34 | 21 | 54 |
我希望获得以下列:模块名称,模块代码,2018、2017、2016
前两列都可以,但是数字(年份)列仅包含NaN
Module Name Module Code 2018 2017 2016
0 Harry abc NaN NaN NaN
1 Jannice fgr NaN NaN NaN
答案 0 :(得分:0)
代替:
list_item=str(year*1)
写:
list_item=year