我有一个我导出到Excel的数据框,人们想要它在.xlsx中。我使用to_excel
,但是当我将扩展名从.xls更改为.xlsx时,导出步骤大约需要9秒而不是1秒。导出到.csv的速度更快,我认为这是因为它只是一个特殊格式的文本文件。
也许.xlsx文件只添加了更多功能,因此写入它们需要更长的时间,但我希望我能做些什么来防止这种情况。
答案 0 :(得分:4)
Pandas默认使用OpenPyXL编写xlsx文件,这可能比用于编写xls文件的xlwt模块慢。
尝试使用XlsxWriter作为xlsx输出引擎:
df.to_excel('file.xlsx', sheet_name='Sheet1', engine='xlsxwriter')
它应该与xls引擎一样快。
答案 1 :(得分:1)
根据不同的Python到Excel模块基准,pyexcelerate具有更好的性能。 下面的代码用于将sqlite表数据导入xlsx文件数据表。除非原始大小小于1000000 raw,否则表不会存储在xlsx文件中。在这种情况下,信息存储在csv文件中。
def passfile(datb, tables):
"""copy to xlsx or csv files tables from query results"""
import sqlite3
import pandas as pd
import timeit
import csv
from pyexcelerate import Workbook
from pathlib import Path
from datetime import date
dat_dir = Path("C:/XML")
db_path = dat_dir / datb
start_time = timeit.default_timer()
conn = sqlite3.connect(db_path) # database connection
c = conn.cursor()
today = date.today()
tablist = []
with open(tables, 'r') as csv_file: # tables to be collected file
csv_reader = csv.DictReader(csv_file)
for line in csv_reader:
tablist.append(line['table']) #column header
xls_file = "Param" + today.strftime("%y%m%d") + ".xlsx"
xls_path = dat_dir / xls_file # xls file path-name
csv_path = dat_dir / "csv" # csv path to store big data
wb = Workbook() # excelerator file init
for line in tablist:
try:
df = pd.read_sql_query("select * from " + line + ";", conn) # pandas dataframe from sqlite
if len(df) > 1000000: # excel not supported
print('save to csv')
csv_loc = line + today.strftime("%y%m%d") + '.csv.gz' # compressed csv file name
df.to_csv(csv_path / csv_loc, compression='gzip')
else:
data = [df.columns.tolist()] + df.values.tolist()
data = [[index] + row for index, row in zip(df.index, data)]
wb.new_sheet(line, data=data)
except sqlite3.Error as error: # sqlite error handling
print('SQLite error: %s' % (' '.join(error.args)))
print("saving workbook")
wb.save(xls_path)
end_time = timeit.default_timer()
delta = round(end_time - start_time, 2)
print("Took " + str(delta) + " secs")
c.close()
conn.close()
passfile("20200522_sqlite.db", "tablesSQL.csv")