我正尝试使用python读取许多XLS格式的excel文件并从中提取信息。当我运行代码时,遇到以下警告和错误:
警告***文件大小(89002)不是512 +扇区大小(512)的倍数
警告*** OLE2不一致:SSCS大小为0,但SSAT大小非零
UnicodeDecodeError:'utf-16-le'编解码器无法解码位置108:截断的数据中的字节0x20
有趣的是,一旦我手动打开文件然后运行代码,代码就可以很好地执行。
由于该文件夹中大约有500个文件,因此我想找出错误的原因,这样我就可以自动执行该过程,而不必打开每个文件。任何帮助将不胜感激!
(下面是xls文件类型的示例)
https://www.dropbox.com/s/w2r8br0nblbbr0x/A1-1a105800.XLS?dl=1
data_year = 2007
path = 'C:/Users/hard1/Desktop/CRA/' + str(data_year)
filenames = []
#count = 0
for filename in glob.glob(os.path.join(path, '*.xls')):
#print(filename)
#count = count+1
filenames.append(filename)
#print(count)
respondent_id = []
bank_name = []
loan_amount = []
state = []
year = []
for filename in filenames:
print(filename)
# wb = xlrd.open_workbook(filename, encoding_override="utf_16_le")
wb = xlrd.open_workbook(filename)
sheet = wb.sheet_by_index(0)
# Column M index is
msa_string = sheet.cell(2, 12).value
state_string = msa_string[len(msa_string)-2 : len(msa_string)]
col_id = sheet.col_values(5)
col_bank = sheet.col_values(0)
col_loan = sheet.col_values(23)
### And then code that extracts information from the files follows