我想将pandas中的数据透视表写入excel表,但我丢失了一个单元级别的信息,而且在浏览网页时我找不到解决方案。
以下是我在数据框中创建的数据透视表中的内容:
T-Class <00.5 <01.0
ZIP
0 1375.0 762.0
1 2177.0 913.0
当我把它写入excel时,我丢失了单元格'T-Class'及其对应的“ZIP”空行,这就是我使用xlsx编写器的原因:
ZIP <00.5 <01.0
0 1375 762
1 2177 913
写入excel的示例代码:
writer = pd.ExcelWriter('data.xlsx', engine='xlsxwriter')
df.to_excel(writer, sheet_name='pivottable',header = True,index=True)
writer.save()
如何解决此问题?
答案 0 :(得分:-1)
现在我回到了pandas数据帧的数据透视表的导出主题,我发现了一个更好的导出库。 Openpyxl!使用openpyxl可以打开预定义的excel模板,将数据帧数据写入预定义的漂亮的桌面下,这样就不必处理不必要的xlsxwriter错误了。以下是openpyxl中的示例代码:
import openpyxl
from openpyxl import load_workbook
workbook.active = 0
worksheet = workbook.active
worksheet.title = 'XYZ'
#check length of df
depth_df_2 = len(merged_plz_all)
#call special method to comfortably write the dataframe below your
#predefined header
update_range(workbook.active,merged_plz_all,cell_range =
'A18:'+str(spaltenindex[len(merged_plz_all.columns)])+str(depth_df_2+17))
workbook.save('yourNicelyLookingPivotTable.xlsx')
这是我在另一个stackoverflow线程中找到的必需的update_range方法。遗憾的是我没有将它加入书签,所以我要求宽恕不提供update_range方法的来源。我个人觉得这个方法应该是openpyxl库本身的一部分!
def update_range(worksheet, data, cell_range=None, named_range=None):
"""
Updates an excel worksheet with the given data.
:param worksheet: an excel worksheet
:param data: data used to update the worksheet cell range (list, tuple, np.ndarray, pd.Dataframe)
:param cell_range: a string representing the cell range, e.g. 'AB12:XX23'
:param named_range: a string representing an excel named range
"""
def clean_data(data):
if not isinstance(data, (list, tuple, np.ndarray, pd.DataFrame)):
raise TypeError('Invalid data, data should be an array type iterable.')
if not len(data):
raise ValueError('You need to provide data to update the cells')
if isinstance(data, pd.DataFrame):
data = data.values
elif isinstance(data, (list, tuple)):
data = np.array(data)
return np.hstack(data)
def clean_cells(worksheet, cell_range, named_range):
# check that we can access a cell range
if not any((cell_range, named_range) or all((cell_range, named_range))):
raise ValueError('`cell_range` or `named_range` should be provided.')
# get the cell range
if cell_range:
try:
cells = np.hstack(worksheet[cell_range])
except (CellCoordinatesException, AttributeError):
raise ValueError('The cell range provided is invalid, cell range must be in the form XX--[:YY--]')
else:
try:
cells = worksheet.get_named_range(named_range)
except (TypeError):
raise ValueError('The current worksheet {} does not contain any named range {}.'.format(
worksheet.title,
named_range))
# checking that we have cells to update, and data
if not len(cells):
raise ValueError('You need to provide cells to update.')
return cells
cells = clean_cells(worksheet, cell_range, named_range)
data = clean_data(data)
# check that the data has the same dimension as cells
if len(cells) != data.size:
raise ValueError('Cells({}) should have the same dimension as the data({}).'.format(len(cells), data.size))
for i, cell in enumerate(cells):
cell.value = data[i]