pandas read_csv不能使用dtype对列名不起作用

时间:2017-07-15 17:01:54

标签: python-3.x pandas

我有以下数据

PERMNO Names,Date,Ticker Symbol,Company Name,CUSIP Header
10000,19851231,,,68391610
10000,19860331,OMFGA,OPTIMUM MANUFACTURING INC,68391610
10001,19851231,,,36720410
10001,19860131,GFGC,GREAT FALLS GAS CO,36720410
10001,19860228,GFGC,GREAT FALLS GAS CO,36720410

我有以下数据

PERMNO Names,Date,Ticker Symbol,Company Name,CUSIP Header
10000,19851231,,,68391610
10000,19860331,OMFGA,OPTIMUM MANUFACTURING INC,68391610
10001,19851231,,,36720410
10001,19860131,GFGC,GREAT FALLS GAS CO,36720410
10001,19860228,GFGC,GREAT FALLS GAS CO,36720410

我要来这个命令

pd.read_csv(csv_file_path, index_col=["CUSIP Header"],
            dtype = {"CUSIP Header": str}, usecols =["Date", "CUSIP Header"], 
            parse_dates=['Date'])

然而,似乎CUSIP标题不是作为str解析而是作为浮点数。的确,当我试着打电话时

print (actual.xs("68391610"))

我收到了一个关键错误。

1 个答案:

答案 0 :(得分:1)

它是bug 9435,因此请删除index_col参数并使用set_index

df = pd.read_csv(csv_file_path,
            dtype = {'CUSIP Header': str}, usecols =["Date", "CUSIP Header"], 
            parse_dates=['Date']).set_index('CUSIP Header')

print (df)
                   Date
CUSIP Header           
68391610     1985-12-31
68391610     1986-03-31
36720410     1985-12-31
36720410     1986-01-31
36720410     1986-02-28

print (df.index)
Index(['68391610', '68391610', '36720410', '36720410', '36720410'],
       dtype='object', name='CUSIP Header')