我需要将XLS
文件转换为CSV
以便包含在PostgreSQL
数据库中的数据,我使用以下代码进行转换:
import xlrd
import unicodecsv
def xls2csv (xls_filename, csv_filename):
# Converts an Excel file to a CSV file.
# If the excel file has multiple worksheets, only the first worksheet is converted.
# Uses unicodecsv, so it will handle Unicode characters.
# Uses a recent version of xlrd, so it should handle old .xls and new .xlsx equally well.
wb = xlrd.open_workbook(xls_filename)
sh = wb.sheet_by_index(0)
fh = open(csv_filename,"wb")
csv_out = unicodecsv.writer(fh, encoding='utf-8')
for row_number in xrange (sh.nrows):
csv_out.writerow(sh.row_values(row_number))
fh.close()
我使用的XLS
个文件包含212列和至少100行,当我只用4行测试代码时它工作正常,但是当nrows>5
解释器提出以下内容时错误:
xls2csv ('e:/t.xls', 'e:/wh.csv')
WARNING *** file size (353829) not 512 + multiple of sector size (512)
WARNING *** OLE2 inconsistency: SSCS size is 0 but SSAT size is non-zero
*** No CODEPAGE record, no encoding_override: will use 'ascii'
*** No CODEPAGE record, no encoding_override: will use 'ascii'
Traceback (most recent call last):
File "<ipython-input-14-ccae93f2d633>", line 1, in <module>
xls2csv ('e:/t.xls', 'e:/wh.csv')
File "C:/Users/hey/.spyder/temp.py", line 10, in xls2csv
wb = xlrd.open_workbook(xls_filename)
File "C:\Users\hey\Anaconda2\lib\site-packages\xlrd\__init__.py", line 441, in open_workbook
ragged_rows=ragged_rows,
File "C:\Users\hey\Anaconda2\lib\site-packages\xlrd\book.py", line 119, in open_workbook_xls
bk.get_sheets()
File "C:\Users\hey\Anaconda2\lib\site-packages\xlrd\book.py", line 678, in get_sheets
self.get_sheet(sheetno)
File "C:\Users\hey\Anaconda2\lib\site-packages\xlrd\book.py", line 669, in get_sheet
sh.read(self)
File "C:\Users\hey\Anaconda2\lib\site-packages\xlrd\sheet.py", line 804, in read
strg = unpack_string(data, 6, bk.encoding or bk.derive_encoding(), lenlen=2)
File "C:\Users\hey\Anaconda2\lib\site-packages\xlrd\biffh.py", line 269, in unpack_string
return unicode(data[pos:pos+nchars], encoding)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xb2 in position 2: ordinal not in range(128)
答案 0 :(得分:1)
看起来错误不是因为行数,而是因为处理源文件中的unicode字符时出现问题。
我建议您尝试Pandas:
import pandas as pd
df = pd.read_excel('input.xls')
df.to_csv('output.csv', encoding='utf-8')
请注意(如果您不在Postgres部分展开),如果这是将数据导入Postgres的第一步,那么将数据加载到Pandas数据框后you can send it straight to Postgres。
答案 1 :(得分:1)
打开xls文件时出现解码问题,我怀疑xls文件的第5行有特殊字符,基于xlrd documentation,您可以使用"
转换为Unicode:
"