我一直在寻找堆栈交换等问题的解决方案,到目前为止我找不到一个。
我确定之前有人遇到过这个问题:我正在编写一个python脚本,它将从excel文件中提取并重新调整一些数据 - 这个问题就是excel文件充斥着不规则的格式和无关的数据。所以,在我能够找到我需要的数据表之前:
我必须经历这样的表格:
我的计划是使用某种正则表达式或字符串识别来知道在哪里拆分文件,这样我就能得到我需要的东西。但我现在遇到的问题是,每当我尝试在此文件上运行read_excel时,大熊猫都会感到害怕。
In [4]: df = pd.read_excel(open('data.xlsx','rb'), sheetname=0)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-4-21f5fee2b08d> in <module>()
----> 1 df = pd.read_excel(open('data.xlsx','rb'), sheetname=0)
/Users/Gus/anaconda2/lib/python2.7/site-packages/pandas/io/excel.pyc in read_excel(io, sheetname, header, skiprows, skip_footer, index_col, names, parse_cols, parse_dates, date_parser, na_values, thousands, convert_float, has_index_names, converters, engine, squeeze, **kwds)
168 """
169 if not isinstance(io, ExcelFile):
--> 170 io = ExcelFile(io, engine=engine)
171
172 return io._parse_excel(
/Users/Gus/anaconda2/lib/python2.7/site-packages/pandas/io/excel.pyc in __init__(self, io, **kwds)
223 # N.B. xlrd.Book has a read attribute too
224 data = io.read()
--> 225 self.book = xlrd.open_workbook(file_contents=data)
226 elif isinstance(io, compat.string_types):
227 self.book = xlrd.open_workbook(io)
/Users/Gus/anaconda2/lib/python2.7/site-packages/xlrd/__init__.pyc in open_workbook(filename, logfile, verbosity, use_mmap, file_contents, encoding_override, formatting_info, on_demand, ragged_rows)
420 formatting_info=formatting_info,
421 on_demand=on_demand,
--> 422 ragged_rows=ragged_rows,
423 )
424 return bk
/Users/Gus/anaconda2/lib/python2.7/site-packages/xlrd/xlsx.pyc in open_workbook_2007_xml(zf, component_names, logfile, verbosity, use_mmap, formatting_info, on_demand, ragged_rows)
831 x12sheet = X12Sheet(sheet, logfile, verbosity)
832 heading = "Sheet %r (sheetx=%d) from %r" % (sheet.name, sheetx, fname)
--> 833 x12sheet.process_stream(zflo, heading)
834 del zflo
835
/Users/Gus/anaconda2/lib/python2.7/site-packages/xlrd/xlsx.pyc in own_process_stream(self, stream, heading)
551 self.do_dimension(elem)
552 elif elem.tag == U_SSML12 + "mergeCell":
--> 553 self.do_merge_cell(elem)
554 self.finish_off()
555
/Users/Gus/anaconda2/lib/python2.7/site-packages/xlrd/xlsx.pyc in do_merge_cell(self, elem)
607 ref = elem.get('ref')
608 if ref:
--> 609 first_cell_ref, last_cell_ref = ref.split(':')
610 first_rowx, first_colx = cell_name_to_rowx_colx(first_cell_ref)
611 last_rowx, last_colx = cell_name_to_rowx_colx(last_cell_ref)
ValueError: need more than 1 value to unpack
我写这个程序的全部意义在于,我不必须进入这些文件的每一个并手动删除信息。但是如果python甚至不接受该文件,我怎么能自动化这个过程呢?我希望有人在这之前会遇到类似的问题。你的解决方案是什么?