读取xls文件时出现UnicodeDecodeError

时间:2019-08-11 21:06:57

标签: python parsing

我正尝试使用python读取许多XLS格式的excel文件并从中提取信息。当我运行代码时,遇到以下警告和错误:

警告***文件大小(89002)不是512 +扇区大小(512)的倍数

警告*** OLE2不一致:SSCS大小为0,但SSAT大小非零

UnicodeDecodeError:'utf-16-le'编解码器无法解码位置108:截断的数据中的字节0x20

有趣的是,一旦我手动打开文件然后运行代码,代码就可以很好地执行。

由于该文件夹中大约有500个文件,因此我想找出错误的原因,这样我就可以自动执行该过程,而不必打开每个文件。任何帮助将不胜感激!

(下面是xls文件类型的示例)

https://www.dropbox.com/s/w2r8br0nblbbr0x/A1-1a105800.XLS?dl=1

data_year = 2007

path = 'C:/Users/hard1/Desktop/CRA/' + str(data_year)



filenames = []


#count = 0
for filename in glob.glob(os.path.join(path, '*.xls')):
    #print(filename)
    #count = count+1
    filenames.append(filename)
#print(count)









respondent_id = []
bank_name = []
loan_amount = []
state = []
year = []





for filename in filenames:
    print(filename)
#    wb = xlrd.open_workbook(filename, encoding_override="utf_16_le")

    wb = xlrd.open_workbook(filename)



    sheet = wb.sheet_by_index(0)


    # Column M index is 
    msa_string = sheet.cell(2, 12).value 
    state_string = msa_string[len(msa_string)-2 : len(msa_string)]

    col_id = sheet.col_values(5)
    col_bank = sheet.col_values(0)
    col_loan = sheet.col_values(23)







    ### And then code that extracts information from the files follows

0 个答案:

没有答案