Question

我正在尝试将pandas DataFrame写入.xlsx文件，其中不同的数字列将具有不同的格式。例如，有些只会显示两个小数位，有些会显示无，有些会被格式化为带有“％”符号的百分比等。

我注意到DataFrame.to_html()有一个formatters参数，允许人们做到这一点，将不同的格式映射到不同的列。但是，DataFrame.to_excel()方法上没有类似的参数。我们拥有的最多的是float_format，它对所有数字都是全局的。

我已经阅读了很多与我的问题至少部分相关的SO帖子，例如：

Use the older openpyxl engine to apply formats one cell at a time。这是我取得最大成功的方法。但这意味着编写循环以逐个单元格应用格式，记住偏移等。
Render percentages by changing the table data itself into strings。改变实际数据的路线激发了我在写入Excel之前通过在每列上调用round()来尝试处理小数位格式化 - 这也有用，但我想避免更改数据。
其他人，主要是日期格式

在pandas API中是否还有其他更方便的与Excel相关的函数/属性，可以在openpyxl处提供帮助，或者在DataFrame上有类似的东西，或者可能某种方式直接在{{{}}中的每一列上指定输出格式元数据1}}然后由不同的输出器在下游解释？

Answer 1

您可以通过访问基础工作簿和工作表对象，使用Pandas 0.16和XlsxWriter引擎执行此操作：

import pandas as pd

# Create a Pandas dataframe from some data.
df = pd.DataFrame(zip(
    [1010, 2020, 3030, 2020, 1515, 3030, 4545],
    [.1, .2, .33, .25, .5, .75, .45],
    [.1, .2, .33, .25, .5, .75, .45],
))

# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter('test.xlsx', engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1')

# Get the xlsxwriter objects from the dataframe writer object.
workbook  = writer.book
worksheet = writer.sheets['Sheet1']

# Add some cell formats.
format1 = workbook.add_format({'num_format': '#,##0.00'})
format2 = workbook.add_format({'num_format': '0%'})
format3 = workbook.add_format({'num_format': 'h:mm:ss AM/PM'})

# Set the column width and format.
worksheet.set_column('B:B', 18, format1)

# Set the format but not the column width.
worksheet.set_column('C:C', None, format2)

worksheet.set_column('D:D', 16, format3)

# Close the Pandas Excel writer and output the Excel file.
writer.save()

输出：

enter image description here

另见Working with Python Pandas and XlsxWriter。

Answer 2

正如您正确指出将格式应用于单个单元格效率极低。

openpyxl 2.4包括对Pandas Dataframes和命名样式的本机支持。

https://openpyxl.readthedocs.io/en/latest/changes.html#id7

将pandas DataFrame写入Excel，使用不同列的不同格式

2 个答案: