即使数据看起来不错,也无法读取CSV文件

时间:2018-12-12 12:46:32

标签: python pandas

  

大家好,我知道之前已经有人问过了,我已经尝试了所有   堆栈溢出的答案,但到目前为止没有用。我已经写了   MT5以CSV格式归档,其中包含价格数据。我已经检查了   使用记事本++的文件的结构,我无法确定是否存在   在数据中是一个奇怪的空值。

     

有人可以向我解释我要去哪里错了。代码   如下:

##Regressions
import pandas as pd;
import numpy as np
import codecs
CurrencyPairs=['AUDUSD','EURCHF','EURGBP','EURJPY','EURUSD','GBPUSD','USDCHF','USDJPY'];
dataFilePath = "H:\\MARKETDATA\\MetaTraderMT5FXPro\\";
nameHeaders = ('Time','Open','High','Low','Close','RealVolume','Spread','TickVolume')

%config IPCompleter.greedy=True

DataTableAudUsd = pd.read_csv(dataFilePath +'fileout_AUDUSD.csv',sep=',', engine='c');

编译器的输出为:

--------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-3-3713f1583685> in <module>()
----> 1 DataTableAudUsd = pd.read_csv(dataFilePath +'fileout_AUDUSD.csv',sep=',', engine='c');
      2 #DataTableEurChf = pd.read_csv(dataFilePath+'fileout_EURCHF.csv');
      3 #DataTableEurGbp = pd.read_csv(dataFilePath+'fileout_EURGBP.csv');
      4 #DataTableEurJpy = pd.read_csv(dataFilePath+'fileout_EURJPY.csv');
      5 #DataTableEurUSd = pd.read_csv(dataFilePath+'fileout_EURUSD.csv');

C:\Program Files (x86)\Microsoft Visual Studio\Shared\Anaconda3_64\lib\site-packages\pandas\io\parsers.py in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, escapechar, comment, encoding, dialect, tupleize_cols, error_bad_lines, warn_bad_lines, skipfooter, skip_footer, doublequote, delim_whitespace, as_recarray, compact_ints, use_unsigned, low_memory, buffer_lines, memory_map, float_precision)
    653                     skip_blank_lines=skip_blank_lines)
    654 
--> 655         return _read(filepath_or_buffer, kwds)
    656 
    657     parser_f.__name__ = name

C:\Program Files (x86)\Microsoft Visual Studio\Shared\Anaconda3_64\lib\site-packages\pandas\io\parsers.py in _read(filepath_or_buffer, kwds)
    403 
    404     # Create the parser.
--> 405     parser = TextFileReader(filepath_or_buffer, **kwds)
    406 
    407     if chunksize or iterator:

C:\Program Files (x86)\Microsoft Visual Studio\Shared\Anaconda3_64\lib\site-packages\pandas\io\parsers.py in __init__(self, f, engine, **kwds)
    762             self.options['has_index_names'] = kwds['has_index_names']
    763 
--> 764         self._make_engine(self.engine)
    765 
    766     def close(self):

C:\Program Files (x86)\Microsoft Visual Studio\Shared\Anaconda3_64\lib\site-packages\pandas\io\parsers.py in _make_engine(self, engine)
    983     def _make_engine(self, engine='c'):
    984         if engine == 'c':
--> 985             self._engine = CParserWrapper(self.f, **self.options)
    986         else:
    987             if engine == 'python':

C:\Program Files (x86)\Microsoft Visual Studio\Shared\Anaconda3_64\lib\site-packages\pandas\io\parsers.py in __init__(self, src, **kwds)
   1603         kwds['allow_leading_cols'] = self.index_col is not False
   1604 
-> 1605         self._reader = parsers.TextReader(src, **kwds)
   1606 
   1607         # XXX

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader.__cinit__ (pandas\_libs\parsers.c:6175)()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._get_header (pandas\_libs\parsers.c:9691)()

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte

DataTableAudUsd.head()

正如我提到的,我尝试使用编解码器,但我认为这不是我的问题所在。

数据示例为:

Time,Open,High,Low,Close,RealVolume,Spread,TickVolume
2010.01.04 00:00:00,0.89938,0.8995300000000001,0.8970900000000001,0.8971100000000001,0.0,30.0,1144.0
2010.01.04 01:00:00,0.89712,0.89795,0.89612,0.8963200000000001,0.0,35.0,1735.0
2010.01.04 02:00:00,0.89634,0.8964500000000001,0.8937200000000001,0.895,0.0,30.0,1771.0
2010.01.04 03:00:00,0.89502,0.89653,0.89502,0.8961300000000001,0.0,35.0,1242.0
2010.01.04 04:00:00,0.8961100000000001,0.8964800000000001,0.8947900000000001,0.8963300000000001,0.0,30.0,663.0
2010.01.04 05:00:00,0.89636,0.89724,0.8960900000000001,0.8968100000000001,0.0,30.0,678.0
2010.01.04 06:00:00,0.8968300000000001,0.8984200000000001,0.8965400000000001,0.8977400000000001,0.0,18.0,949.0
2010.01.04 07:00:00,0.8977100000000001,0.8982500000000001,0.8966900000000001,0.89822,0.0,15.0,1117.0
2010.01.04 08:00:00,0.8982100000000001,0.9006700000000001,0.8975400000000001,0.9004700000000001,0.0,21.0,1846.0
2010.01.04 09:00:00,0.90046,0.9044300000000001,0.9003700000000001,0.9037600000000001,0.0,22.0,2347.0

0 个答案:

没有答案