我试图通过csv文件中的python脚本读取大数据(数千行),如下所示:
.....
2015-11-03 20:16:28,000;63,62;
2015-11-03 20:16:29,000;63,75;
2015-11-03 20:16:30,000;63,86;
2015-11-03 20:16:31,000;64,25;
但似乎其中一个文件有额外的空行,其中包含196541465个空格 - 然后在使用pandas lib的read_csv读取代码时代码崩溃。
File "/usr/lib/python2.7/dist-packages/pandas/core/frame.py", line 4221, in append
elif isinstance(other, list) and not isinstance(other[0], DataFrame):
IndexError: list index out of range
我正在使用以下命令:
data = pd.read_csv(input_file,skiprows = [0],usecols=[0,1,2],delimiter=';',decimal=',', names = [ 'date','angle','Unnamed'],na_filter = False,parse_dates = [0],date_parser = reformat_date,error_bad_lines = False,skip_blank_lines=True)#,nrows = 8191)
罪魁祸首是第8192行,当限制行数(rows = 8191
)时,它的工作正常。我已经从文档中尝试了很多选项,但它似乎无法工作!有什么想法吗?
答案 0 :(得分:3)
我收到此错误是因为我试图读取一个CSV文件,其标题与列数相比太少(例如10列,但只有8个标题。如果设置index_col=False
,则pandas不会&#39 ; t知道如何处理额外的列)
答案 1 :(得分:2)
我遇到了同样的问题,index_col = False
无效。我有19列,只有17个标题。解决方法是分别读取列和标题,然后添加标题名称。
dfcolumns = pd.read_csv('file.csv',
nrows = 1)
df = pd.read_csv('file.csv',
header = None,
skiprows = 1,
usecols = list(range(17)),
names = dfcolumns.columns)
答案 2 :(得分:-2)
别忘了签出pd.read_csv选项:?pd.read_csv
签名:pd.read_csv( filepath_or_buffer:Union [str,pathlib.Path,IO [〜AnyStr]], sep =',', delimiter = None, header ='infer', 名称=无, index_col =无, usecols =无, squeeze = False, prefix = None, mangle_dupe_cols =真实, dtype = None, engine = None, 转换器=无, true_values =无, false_values =无, skipinitialspace = False, skiprows =无, skipfooter = 0, nrows = None, na_values =无, keep_default_na =是, na_filter =真, verbose = False, skip_blank_lines =是, parse_dates = False, infer_datetime_format = False, keep_date_col =假, date_parser =无, dayfirst = False, cache_dates =是, iterator = False, chunksize = None, 压缩='推断', 千=无 小数点:str ='。', lineterminator =无, quotechar ='“', 引用= 0, doublequote = True, escapechar =无, 评论=无, encoding = None, 方言=无, error_bad_lines =是, warn_bad_lines =是, delim_whitespace = False, low_memory =真, memory_map = False, float_precision =无,)