我正在尝试使用csv
模块读取带有.xlsx格式的excel文件,但是即使我指定了方言和编码,使用excel文件时我也没有运气。下面,我用我尝试的不同编码显示了不同的尝试和错误结果。如果有人能指出我可以用Python读取.xlsx文件的正确编码,语法或模块,我会很感激。
使用以下代码,我收到以下错误:_csv.Error: line contains NULL byte
#!/usr/bin/python
import sys, csv
with open('filelocation.xlsx', "r+", encoding="Latin1") as inputFile:
csvReader = csv.reader(inputFile, dialect='excel')
for row in csvReader:
print(row)
使用以下代码,我收到以下错误:UnicodeDecodeError: 'utf-8' codec can't decode byte 0xcc in position 16: invalid continuation byte
#!/usr/bin/python
import sys, csv
with open('filelocation.xlsx', "r+", encoding="Latin1") as inputFile:
csvReader = csv.reader(inputFile, dialect='excel')
for row in csvReader:
print(row)
当我在utf-16
中使用encoding
时,出现以下错误:UnicodeDecodeError: 'utf-16-le' codec can't decode bytes in position 570-571: illegal UTF-16 surrogate
答案 0 :(得分:17)
You cannot use Python's csv
library for reading xlsx
formatted files. You need to install and use a different library. For example, you could use xlrd
as follows:
import xlrd
workbook = xlrd.open_workbook("filelocation.xlsx")
sheet = workbook.sheet_by_index(0)
for rowx in range(sheet.nrows):
cols = sheet.row_values(rowx)
print(cols)
This would display all of the rows in the file as lists of columns. The Python Excel website gives other possible examples.
答案 1 :(得分:3)
这是仅使用标准库的非常非常粗糙的实现。
def xlsx(fname, sheet=1):
import zipfile
from xml.etree.ElementTree import iterparse
z = zipfile.ZipFile(fname)
strings = [el.text for e, el in iterparse(z.open('xl/sharedStrings.xml')) if el.tag.endswith('}t')]
rows = []
row = {}
value = ''
for e, el in iterparse(z.open('xl/worksheets/sheet%s.xml' % sheet)):
if el.tag.endswith('}v'): # <v>84</v>
value = el.text
if el.tag.endswith('}c'): # <c r="A3" t="s"><v>84</v></c>
if el.attrib.get('t') == 's':
value = strings[int(value)]
column_name = ''.join(x for x in el.attrib['r'] if not x.isdigit()) # AZ22
row[column_name] = value
value = ''
if el.tag.endswith('}row'):
rows.append(row)
row = {}
return rows
(这是从已删除的问题:https://stackoverflow.com/questions/4371163/reading-xlsx-files-using-python复制而来的)
答案 2 :(得分:1)
这是仅使用标准库的非常非常粗糙的实现。
def xlsx(fname):
import zipfile
from xml.etree.ElementTree import iterparse
z = zipfile.ZipFile(fname)
strings = [el.text for e, el in iterparse(z.open('xl/sharedStrings.xml')) if el.tag.endswith('}t')]
rows = []
row = {}
value = ''
for e, el in iterparse(z.open('xl/worksheets/sheet1.xml')):
if el.tag.endswith('}v'): # <v>84</v>
value = el.text
if el.tag.endswith('}c'): # <c r="A3" t="s"><v>84</v></c>
if el.attrib.get('t') == 's':
value = strings[int(value)]
letter = el.attrib['r'] # AZ22
while letter[-1].isdigit():
letter = letter[:-1]
row[letter] = value
value = ''
if el.tag.endswith('}row'):
rows.append(row)
row = {}
return rows
此答案是从已删除的问题中复制的:https://stackoverflow.com/a/22067980/131881
答案 3 :(得分:-1)
您不能使用 Python 的 csv 库来读取 .xlsx 格式的文件。您也不能使用“pd.read_excel”,这是一种讽刺(它仅支持 .xls)。下面是我创建的用于导入 .xlsx 的函数。它在您导入的文件的第一行分配列名称。很直接。
def import_xlsx(filepath):
wb=openpyxl.load_workbook(filename=filepath, data_only=True)
ws = wb.active
df = list(ws.iter_rows(values_only=True))
new=pd.DataFrame(data=df)
new1=new[1:]
new1.columns=new[0:1].values[0].tolist()
return(new1)
示例:
new_df=import_xlsx('C:\\Users\big_boi\\documents\\my_file.xlsx')