Pandas Read_CSV错误地读取数字

时间:2018-01-04 21:42:09

标签: python python-3.x python-2.7 pandas

方法1

def getAndBuildDatafrmeFromCsvBasic(filename):
    colTypes = {'Open': 'float64', 'High': 'float64', 'Low': 'float64', 'Close': 'float64', 'Volume': 'float64'}
    dfEurUsd2017 = pd.read_csv(filename, delimiter=",", index_col='Gmt time', dtype=colTypes, parse_dates=['Gmt time'])
    return dfEurUsd2017

方法2

def getAndBuildDatafrmeFromCsv(filename):
    df = pd.read_csv(filename, header=None)
    df.columns = ['date', 'Open', 'High', 'Low', 'Close', 'Volume']
    df.date = pd.to_datetime(df.date, format='%d.%m.%Y %H:%M:%S.%f')
    df.index = df['date']
    df = df[['Open', 'High', 'Low', 'Close', 'Volume']]
    return df

结果方法1

                        Open     High      Low    Close   Volume
Gmt time                                                        
2017-12-04 23:00:00  1.06672  1.06699  1.06636  1.06698  1889.56

结果方法2

                        Open     High      Low    Close   Volume
Gmt time                                                        
2017-12-04 23:00:00  1.18686  1.18699  1.18666  1.18682  2004.46

为什么method1错误地解析了Open,High,Low,Close,Volume的值? Method2生成正确的输出。我担心为什么两种方法输出完全不同的数值,即使体积不同。但是csv文件是一样的。

CSV行

04.12.2017 23:00:00.000,1.18686,1.18699,1.18666,1.18682,2004.4599999999998
04.12.2017 23:30:00.000,1.18682,1.18706,1.18652,1.18681,1242.68
05.12.2017 00:00:00.000,1.18681,1.18691,1.18639,1.18653,2666.81
05.12.2017 00:30:00.000,1.18653,1.18726,1.18650,1.18709,3567.2400000000007
05.12.2017 01:00:00.000,1.18708,1.18750,1.18707,1.18738,3105.4699999999993
05.12.2017 01:30:00.000,1.18738,1.18744,1.18691,1.18732,3561.5
05.12.2017 02:00:00.000,1.18732,1.18766,1.18704,1.18740,2706.6400000000003

1 个答案:

答案 0 :(得分:1)

我添加了dayfirst = True,你的代码运行正常。

您使用的是什么熊猫版本?这些有价值的数据来自哪里?

import pandas as pd

data = '''\
Gmt time,Open,High,Low,Close,Volume
04.12.2017 23:00:00.000,1.18686,1.18699,1.18666,1.18682,2004.4599999999998
04.12.2017 23:30:00.000,1.18682,1.18706,1.18652,1.18681,1242.68
05.12.2017 00:00:00.000,1.18681,1.18691,1.18639,1.18653,2666.81
05.12.2017 00:30:00.000,1.18653,1.18726,1.18650,1.18709,3567.2400000000007
05.12.2017 01:00:00.000,1.18708,1.18750,1.18707,1.18738,3105.4699999999993
05.12.2017 01:30:00.000,1.18738,1.18744,1.18691,1.18732,3561.5
05.12.2017 02:00:00.000,1.18732,1.18766,1.18704,1.18740,2706.6400000000003
'''

with open('test.csv', 'w') as f:
    f.write(data)

def getAndBuildDatafrmeFromCsvBasic(filename):
    colTypes = {'Open': 'float64', 'High': 'float64', 'Low': 'float64', 'Close': 'float64', 'Volume': 'float64'}
    dfEurUsd2017 = pd.read_csv(filename, delimiter=",", index_col='Gmt time', dtype=colTypes, parse_dates=['Gmt time'], dayfirst=True)
    return dfEurUsd2017

print(getAndBuildDatafrmeFromCsvBasic('test.csv'))

返回:

                        Open     High      Low    Close   Volume
Gmt time                                                        
2017-12-04 23:00:00  1.18686  1.18699  1.18666  1.18682  2004.46
2017-12-04 23:30:00  1.18682  1.18706  1.18652  1.18681  1242.68
2017-12-05 00:00:00  1.18681  1.18691  1.18639  1.18653  2666.81
2017-12-05 00:30:00  1.18653  1.18726  1.18650  1.18709  3567.24
2017-12-05 01:00:00  1.18708  1.18750  1.18707  1.18738  3105.47
2017-12-05 01:30:00  1.18738  1.18744  1.18691  1.18732  3561.50
2017-12-05 02:00:00  1.18732  1.18766  1.18704  1.18740  2706.64