使用Python xlrd将.xls文件转换为.xlsx时出现UnicodeDecodeError

时间:2019-08-06 18:38:04

标签: python-3.x openpyxl xlrd

我正在努力从网页上自动化我们的每月报告流程。下载文件为.xls格式,我正在尝试将其转换为.xlsx,以便可以使用openpyxl对其进行操作。该代码将Excel文件下载到我的计算机上,但是由于出现UnicodeDecodeError,我无法使用openpyxl或xlrd成功打开该文件。

读取thread at Github后,我尝试手动打开文件并重新运行代码,该文件能够成功打开。但是,正如他在线程中所说的那样,必须手动打开文件会破坏自动化过程的目的。有谁知道我该如何克服?

这是引发错误的代码:

import xlrd, openpyxl

filePath = r'C:\Users\Daly_Llama'
downloadName = filePath + "All Endpoints and MCUs   " + today.strftime("%Y%m%d") + '.xls'

# open_xls_as_xlsx function adaptation, original code by Ray at https://stackoverflow.com/questions/9918646/how-to-convert-xls-to-xlsx 
def open_xls_as_xlsx(filename):
    # open xls file using xlrd
    xlsBook = xlrd.open_workbook(filename)
    index = 0
    nrows, ncols = 0, 0
    while nrows * ncols == 0:
        xlsSheet = xlsBook.sheet_by_index(index)
        nrows = xlsSheet.nrows
        ncols = xlsSheet.ncols
        index += 1
    # prepare a xlsx sheet
    xlsxBook = Workbook()
    xlsxSheet = xlsxBook.get_active_sheet()
    for row in xrange(0, nrows):
        for col in xrange(0, ncols):
            xlsxSheet.cell(row=row, column=col).value = xlsSheet.cell_value(row, col)
    return xlsxBook

workbook = open_xls_as_xlsx(downloadName)

这是我收到的错误:

Traceback (most recent call last):
  File "C:\Users\Me\MonthlyReport.py", line 100, in <module>
    workbook = open_xls_as_xlsx(downloadName)
  File "C:\Users\Me\MonthlyReport.py", line 81, in open_xls_as_xlsx
    xlsBook = xlrd.open_workbook(filename)
  File "C:\Program Files\Python37\lib\site-packages\xlrd\__init__.py", line 157, in open_workbook
    ragged_rows=ragged_rows,
  File "C:\Program Files\Python37\lib\site-packages\xlrd\book.py", line 117, in open_workbook_xls
    bk.parse_globals()
  File "C:\Program Files\Python37\lib\site-packages\xlrd\book.py", line 1227, in parse_globals
    self.handle_writeaccess(data)
  File "C:\Program Files\Python37\lib\site-packages\xlrd\book.py", line 1192, in handle_writeaccess
    strg = unpack_unicode(data, 0, lenlen=2)
  File "C:\Program Files\Python37\lib\site-packages\xlrd\biffh.py", line 284, in unpack_unicode
    strg = unicode(rawstrg, 'utf_16_le')
  File "C:\Program Files\Python37\lib\site-packages\xlrd\timemachine.py", line 31, in <lambda>
    unicode = lambda b, enc: b.decode(enc)
  File "C:\Program Files\Python37\lib\encodings\utf_16_le.py", line 16, in decode
    return codecs.utf_16_le_decode(input, errors, True)
UnicodeDecodeError: 'utf-16-le' codec can't decode byte 0x20 in position 108: truncated data

1 个答案:

答案 0 :(得分:1)

this link处的解决方法仍然是我找到的唯一可行的解​​决方案。我提出了一个输入命令,该命令将暂停执行,直到手动打开该文件为止,之后脚本可以继续执行。