如何在不丢失值的情况下重新排列数据框中的列?

时间:2019-07-02 10:36:48

标签: python-3.x pandas

我有excel试算表,每年都会添加一个新列。我想使用Pandas从电子表格中选择关键列,并创建一个显示最近X年结果的表格。我的代码运行,并且包含文本的列似乎按预期交换,但是数字数据丢失并替换为NaN。

从解决方案到stackoverflow等相关问题,似乎我应该将所需列的列表发送到数据帧的reindex方法。

def yearTable2(filename='5years.xlsx',SheetName='PartA',interactive=True,A_year=2018,nyears=3,debug=False):
    """Outputs latex code of table of nyears years of results for a given part's
    module results
    Input:filename is the excel file with the data in, 
    sheetname contains the data for the part to be tablularised
    A_year is the current academic year"""
    xl=pd.ExcelFile(filename)
    df=xl.parse(SheetName)
    df2=df.round(1) # rounds numeric data to 1 decimal place
    if debug: print(df.head())
    #Have data in df2, it probably has more years of data than really needed 
    # extract just the needed ones
    # Build up list of column names in required order
    column_list=["Module Name","Module Code"] # these are standard
    # now generate the years required
    for year in list(range(A_year,A_year-nyears,-1)):
        list_item=str(year*1)
        column_list.append(list_item)
    print(column_list)
    df3=df2.reindex(columns=column_list)
    return (df3) # outputs pretty Jupyter table

我这样称呼:yearTable2(filename='Test.xlsx',SheetName='PartC',debug=True)

其中Test.xlsx是一个示例文件,具有以下内容:

|Module Code|Module Name|2013|2014|2015|2016|2017|2018|
______________________________________________________
|abc        |Harry      | 23 | 45 | 32 | 54 | 56 | 12 |
|fgr        |Jannice    | 28 | 65 | 21 | 34 | 21 | 54 |

我希望获得以下列:模块名称,模块代码,2018、2017、2016

前两列都可以,但是数字(年份)列仅包含NaN

    Module Name Module Code 2018    2017    2016
0   Harry        abc         NaN    NaN     NaN
1   Jannice      fgr         NaN    NaN     NaN

1 个答案:

答案 0 :(得分:0)

代替:

list_item=str(year*1)

写:

list_item=year