在Python中使用OpenPyXL包写入数据后,如何保持样式格式不变?

时间:2018-10-18 16:29:38

标签: python-2.7 openpyxl

我正在使用openpyxl库软件包来读取和写入一些数据到现有的Excel文件test.xlsx

在向其中写入数据之前,文件的内容如下所示:

enter image description here

  • 单元格A1包含高棉Unicode字符,英语字符为 Bold 样式。

  • 单元格A3使用lemons1字体,并且英文字符为斜体样式。

我正在使用以下脚本在此excel文件的单元格B2中读​​取和写入数据“是我”:

from openpyxl import load_workbook
import os
FILENAME1 = os.path.dirname(__file__)+'/test.xlsx'
from flask import make_response
from openpyxl.writer.excel import save_virtual_workbook
from app import app

@app.route('/testexel', methods=['GET'])
def testexel():
    with app.app_context():
        try:
            filename = 'test'
            workbook = load_workbook(FILENAME1, keep_links=False)
            sheet = workbook['Sheet1']
            sheet['B2']='It is me'

            response = make_response(save_virtual_workbook(workbook))
            response.headers['Cache-Control'] = 'no-cache'
            response.headers["Content-Disposition"] = "attachment; filename=%s.xlsx" % filename
            response.headers["Content-type"] = "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet; charset=utf-8"

            return response

        except Exception as e:

            raise

然后将结果excel文件的格式修改为这样,我从不希望它像这样:

enter image description here

在将数据写入原始文件之前,其格式设置样式与原始文件完全不同:

  • 单元格A1中的所有数据均为英文字符的粗体格式

  • 单元格B3的英语字符成为一种常规样式,并且将字体从前面的高棉字符更改为字体limons1

我要完成的工作是将文件的现有内容保持为原来的格式(样式和字体),同时向其中写入其他数据。

请告知我的脚本有什么问题,如何在执行上述脚本后如何保持现有样式和字体不变?谢谢。

3 个答案:

答案 0 :(得分:2)

Excel文件(扩展名为.xlsx)实际上是zip存档。 (您实际上可以使用7-zip或其他类似程序打开excel文件。)因此excel文件包含一堆xml文件,其中存储了数据。 openpyxl的作用是,在打开excel文件时从这些xml文件中读取数据,并在保存excel文件时使用xml文件创建zip存档。很难过,openpyxl读取一些xml文件,然后解析该数据,然后您可以使用openpyxl库中的功能来更改和添加数据,最后,当您保存工作簿时,openpyxl将创建xml文件,向其中写入数据并保存它们作为zip存档(这是Excel文件)。这些xml文件包含存储在excel文件中的所有数据(一个xml文件包含来自excel文件的公式,其他将包含样式,其他将包含有关excel主题的数据,依此类推)。我们只关心存储在两个xml文件中的excel文件中的字符串:

  • sharedStrings.xml

    此文件包含excel文件中的所有字符串以及这些字符串的格式,下面是一个示例:

    <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
    <sst xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" count="2" uniqueCount="2">
        <si>
            <r>
                <rPr>
                    <b/>
                    <sz val="22"/>
                    <color theme="1"/>
                    <rFont val="Calibri"/>
                    <family val="2"/>
                    <scheme val="minor"/>
                </rPr>
                <t>Hello</t>
            </r>
            <r>
                <rPr>
                    <sz val="22"/>
                    <color theme="1"/>
                    <rFont val="Calibri"/>
                    <family val="2"/>
                    <scheme val="minor"/>
                </rPr>
                <t xml:space="preserve"> ត</t>
            </r>
        </si>
        <si>
            <r>
                <t xml:space="preserve">ណ </t>
            </r>
            <r>
                <rPr>
                    <i/>
                    <sz val="24"/>
                    <color theme="1"/>
                    <rFont val="Calibri"/>
                    <family val="2"/>
                    <scheme val="minor"/>
                </rPr>
                <t>sike</t>
            </r>
        </si>
    </sst>
    
  • sheet1.xml

    此文件包含您的字符串的位置(哪个单元格包含哪个字符串)。 (在此excel文件中,每张纸都有一个文件,但就本示例而言,假设您的文件中只有一张纸。)这是一个示例:

    <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
    <worksheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" mc:Ignorable="x14ac xr xr2 xr3" xmlns:x14ac="http://schemas.microsoft.com/office/spreadsheetml/2009/9/ac" xmlns:xr="http://schemas.microsoft.com/office/spreadsheetml/2014/revision" xmlns:xr2="http://schemas.microsoft.com/office/spreadsheetml/2015/revision2" xmlns:xr3="http://schemas.microsoft.com/office/spreadsheetml/2016/revision3" xr:uid="{00000000-0001-0000-0000-000000000000}">
        <dimension ref="A1:C3"/>
        <sheetViews>
            <sheetView tabSelected="1" zoomScaleNormal="100" workbookViewId="0">
                <selection activeCell="A3" sqref="A3"/>
            </sheetView>
        </sheetViews>
        <sheetFormatPr defaultRowHeight="15" x14ac:dyDescent="0.25"/>
        <cols>
            <col min="1" max="1" width="20.140625" customWidth="1"/>
            <col min="2" max="2" width="10.7109375" customWidth="1"/>
        </cols>
        <sheetData>
            <row r="1" spans="1:3" ht="60.75" customHeight="1" x14ac:dyDescent="0.45">
                <c r="A1" s="4" t="s">
                    <v>0</v>
                </c>
            </row>
            <row r="2" spans="1:3" ht="19.5" customHeight="1" x14ac:dyDescent="0.35">
                <c r="A2" s="1"/>
                <c r="B2" s="3"/>
            </row>
            <row r="3" spans="1:3" ht="62.25" customHeight="1" x14ac:dyDescent="0.5">
                <c r="A3" s="5" t="s">
                    <v>1</v>
                </c>
                <c r="C3" s="2"/>
            </row>
        </sheetData>
        <pageMargins left="0.75" right="0.75" top="1" bottom="1" header="0.5" footer="0.5"/>
        <pageSetup paperSize="9" orientation="portrait" r:id="rId1"/>
    </worksheet>
    

如果您使用openpyxl打开此excel,然后保存(不更改任何数据),则sharedStrings.xml将会是这样:

<sst xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" uniqueCount="2">
    <si>
        <t>Hello &#6031;</t>
    </si>
    <si>
        <t>&#6030; sike</t>
    </si>
</sst>

如您所见,您将丢失所有单元格(字符串)的原始格式,而将获得某种形式的单元格合并格式(因此,如果单元格中的某些字符为粗体而另一些则不是,那么在保存文件时,整个单元格将变为粗体,或者整个单元格将变为正常)。现在人们已经要求开发人员实现这种富文本选项(link1link2),但他们感到遗憾的是,实现这样的东西会很复杂。我同意这并非易事,但我们可以做些更简单的事情:打开excel文件时可以从sharedStrings.xml获取数据,而要保存excel文件时可以使用该xml代码,但是仅适用于打开文件时存在的单元格。这可能不容易理解,所以让我们看下面的例子:

假设您有这样的excel文件: enter image description here

对于此excel文件,sharedStrings.xml将是这样:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<sst xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" count="1" uniqueCount="1">
    <si>
        <r>
            <rPr>
                <b/>
                <sz val="22"/>
                <color theme="1"/>
                <rFont val="Calibri"/>
                <family val="2"/>
                <scheme val="minor"/>
            </rPr>
            <t>Hello</t>
        </r>
        <r>
            <rPr>
                <sz val="22"/>
                <color theme="1"/>
                <rFont val="Calibri"/>
                <family val="2"/>
                <scheme val="minor"/>
            </rPr>
            <t xml:space="preserve"> ត</t>
        </r>
    </si>
</sst>

如果您运行以下python代码:

from openpyxl import load_workbook
workbook = load_workbook(FILENAME1, keep_links=False)
sheet = workbook.active
sheet['A2'] = 'It is me'
workbook.save('out.xlsx')

文件out.xlsx如下所示:

enter image description here

对于out.xlsx文件,sharedStrings.xml将是这样:

<sst xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" uniqueCount="2">
    <si>
        <t>Hello &#6031;</t>
    </si>
    <si>
        <t>It is me</t>
    </si>
</sst>

所以我们要做的是使用以下xml代码:

<si>
    <r>
        <rPr>
            <b/>
            <sz val="22"/>
            <color theme="1"/>
            <rFont val="Calibri"/>
            <family val="2"/>
            <scheme val="minor"/>
        </rPr>
        <t>Hello</t>
    </r>
    <r>
        <rPr>
            <sz val="22"/>
            <color theme="1"/>
            <rFont val="Calibri"/>
            <family val="2"/>
            <scheme val="minor"/>
        </rPr>
        <t xml:space="preserve"> ត</t>
    </r>
</si>

对于包含Hello ត和以下xml代码的旧单元格A1:

<si>
    <t>It is me</t>
</si>

对于包含It is me的新单元格A2。

因此,我们可以将这个xml部分组合成这样的xml文件:

<sst xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" uniqueCount="2">
    <si>
        <r>
            <rPr>
                <b/>
                <sz val="22"/>
                <color theme="1"/>
                <rFont val="Calibri"/>
                <family val="2"/>
                <scheme val="minor"/>
            </rPr>
            <t>Hello</t>
        </r>
        <r>
            <rPr>
                <sz val="22"/>
                <color theme="1"/>
                <rFont val="Calibri"/>
                <family val="2"/>
                <scheme val="minor"/>
            </rPr>
            <t xml:space="preserve"> ត</t>
        </r>
    </si>
    <si>
        <t>It is me</t>
    </si>
</sst>

我写了一些函数来做到这一点。 (有很多代码,但是大多数只是从openpyxl复制的。如果您要更改openpyxl库,则可以用10或20行代码来完成,但这绝不是好主意,所以我宁愿复制整个函数需要先更改然后再更改一小部分。)

您可以将以下代码保存在单独的文件extendedopenpyxl.py中:

from openpyxl import load_workbook as openpyxlload_workbook
from openpyxl.reader.excel import _validate_archive, _find_workbook_part
from openpyxl.reader.worksheet import _get_xml_iter
from openpyxl.xml.functions import fromstring, iterparse, safe_iterator, tostring, Element, xmlfile, SubElement
from openpyxl.xml.constants import ARC_CONTENT_TYPES, SHEET_MAIN_NS, SHARED_STRINGS, ARC_ROOT_RELS, ARC_APP, ARC_CORE, ARC_THEME, ARC_SHARED_STRINGS, ARC_STYLE, ARC_WORKBOOK, ARC_WORKBOOK_RELS
from openpyxl.packaging.manifest import Manifest
from openpyxl.packaging.relationship import get_dependents, get_rels_path
from openpyxl.packaging.workbook import WorkbookParser
from openpyxl.packaging.extended import ExtendedProperties
from openpyxl.utils import coordinate_to_tuple
from openpyxl.cell.text import Text
from openpyxl.writer.excel import ExcelWriter as openpyxlExcelWriter
from openpyxl.writer.workbook import write_root_rels, write_workbook_rels, write_workbook
from openpyxl.writer.theme import write_theme
from openpyxl.writer.etree_worksheet import get_rows_to_write
from openpyxl.styles.stylesheet import write_stylesheet
from zipfile import ZipFile, ZIP_DEFLATED
from operator import itemgetter
from io import BytesIO
from xml.etree.ElementTree import tostring as xml_tostring
from xml.etree.ElementTree import register_namespace
from lxml.etree import fromstring as lxml_fromstring

register_namespace('', 'http://schemas.openxmlformats.org/spreadsheetml/2006/main')

def get_value_cells(workbook):
    value_cells = []
    for idx, worksheet in enumerate(workbook.worksheets, 1):
        all_rows = get_rows_to_write(worksheet)
        for row_idx, row in all_rows:
            row = sorted(row, key=itemgetter(0))
            for col, cell in row:
                if cell._value is not None:
                    if cell.data_type == 's':
                        value_cells.append((worksheet.title,(cell.row, cell.col_idx)))
    return value_cells

def check_if_lxml(element):
    if type(element).__module__ == 'xml.etree.ElementTree':
        string = xml_tostring(element)
        el = lxml_fromstring(string)
        return el
    return element

def write_string_table(workbook):
    string_table = workbook.shared_strings
    workbook_data = workbook.new_interal_value_workbook_data
    data_strings = workbook.new_interal_value_data_strings
    value_cells = get_value_cells(workbook)
    out = BytesIO()
    i = 0
    with xmlfile(out) as xf:
        with xf.element("sst", xmlns=SHEET_MAIN_NS, uniqueCount="%d" % len(string_table)):            
            for i, key in enumerate(string_table):
                sheetname, coordinates = value_cells[i]
                if coordinates in workbook_data[sheetname]:
                    value = workbook_data[sheetname][coordinates]
                    xml_el = data_strings[value]
                    el = check_if_lxml(xml_el)
                else:
                    el = Element('si')
                    text = SubElement(el, 't')
                    text.text = key
                    if key.strip() != key:
                        text.set(PRESERVE_SPACE, 'preserve')
                xf.write(el)

    return  out.getvalue()


class ExcelWriter(openpyxlExcelWriter):

    def write_data(self):
        """Write the various xml files into the zip archive."""
        # cleanup all worksheets
        archive = self._archive

        archive.writestr(ARC_ROOT_RELS, write_root_rels(self.workbook))
        props = ExtendedProperties()
        archive.writestr(ARC_APP, tostring(props.to_tree()))

        archive.writestr(ARC_CORE, tostring(self.workbook.properties.to_tree()))
        if self.workbook.loaded_theme:
            archive.writestr(ARC_THEME, self.workbook.loaded_theme)
        else:
            archive.writestr(ARC_THEME, write_theme())

        self._write_worksheets()
        self._write_chartsheets()
        self._write_images()
        self._write_charts()

        string_table_out = write_string_table(self.workbook)
        self._archive.writestr(ARC_SHARED_STRINGS, string_table_out)
        self._write_external_links()

        stylesheet = write_stylesheet(self.workbook)
        archive.writestr(ARC_STYLE, tostring(stylesheet))

        archive.writestr(ARC_WORKBOOK, write_workbook(self.workbook))
        archive.writestr(ARC_WORKBOOK_RELS, write_workbook_rels(self.workbook))

        self._merge_vba()

        self.manifest._write(archive, self.workbook)

        return

    def save(self, filename):
        self.write_data()
        self._archive.close()
        return


def get_coordinates(cell, row_count, col_count):
    coordinate = cell.get('r')
    if coordinate:
        row, column = coordinate_to_tuple(coordinate)
    else:
        row, column = row_count, col_count
    return row, column

def parse_cell(cell):
    VALUE_TAG = '{%s}v' % SHEET_MAIN_NS
    value = cell.find(VALUE_TAG)
    if value is not None:
        value = int(value.text)
    return value

def parse_row(row, row_count):
    CELL_TAG = '{%s}c' % SHEET_MAIN_NS
    if row.get('r'):
        row_count = int(row.get('r'))
    else:
        row_count += 1
    col_count = 0
    data = dict()
    for cell in safe_iterator(row, CELL_TAG):
        col_count += 1
        value = parse_cell(cell)
        if value is not None:
            coordinates = get_coordinates(cell, row_count, col_count)
            data[coordinates] = value
    return data


def parse_sheet(xml_source):
    dispatcher = ['{%s}mergeCells' % SHEET_MAIN_NS, '{%s}col' % SHEET_MAIN_NS, '{%s}row' % SHEET_MAIN_NS, '{%s}conditionalFormatting' % SHEET_MAIN_NS, '{%s}legacyDrawing' % SHEET_MAIN_NS, '{%s}sheetProtection' % SHEET_MAIN_NS, '{%s}extLst' % SHEET_MAIN_NS, '{%s}hyperlink' % SHEET_MAIN_NS, '{%s}tableParts' % SHEET_MAIN_NS]
    row_count = 0
    stream = _get_xml_iter(xml_source)
    it = iterparse(stream, tag=dispatcher)
    row_tag = '{%s}row' % SHEET_MAIN_NS
    data = dict()
    for _, element in it:
        tag_name = element.tag
        if tag_name == row_tag:
            row_data = parse_row(element, row_count)
            data.update(row_data)
            element.clear()
    return data


def get_workbook_parser(archive):
    src = archive.read(ARC_CONTENT_TYPES)
    root = fromstring(src)
    package = Manifest.from_tree(root)
    wb_part = _find_workbook_part(package)
    workbook_part_name = wb_part.PartName[1:]
    parser = WorkbookParser(archive, workbook_part_name)
    parser.parse()
    return parser, package

def get_data_strings(xml_source):
    STRING_TAG = '{%s}si' % SHEET_MAIN_NS
    strings = []
    src = _get_xml_iter(xml_source)
    for _, node in iterparse(src):
        if node.tag == STRING_TAG:
            strings.append(node)

    return strings

def load_workbook(filename, *args, **kwargs):
    workbook = openpyxlload_workbook(filename, *args, **kwargs)

    archive = _validate_archive(filename)
    parser, package = get_workbook_parser(archive)

    workbook_data = dict()
    for sheet, rel in parser.find_sheets():
        sheet_name = sheet.name
        worksheet_path = rel.target
        fh = archive.open(worksheet_path)
        sheet_data = parse_sheet(fh)
        workbook_data[sheet_name] = sheet_data

    data_strings = []
    ct = package.find(SHARED_STRINGS)
    if ct is not None:
        strings_path = ct.PartName[1:]
        strings_source = archive.read(strings_path)
        data_strings = get_data_strings(strings_source)

    workbook.new_interal_value_workbook_data = workbook_data
    workbook.new_interal_value_data_strings = data_strings
    return workbook

def save_workbook(workbook, filename,):
    archive = ZipFile(filename, 'w', ZIP_DEFLATED, allowZip64=True)
    writer = ExcelWriter(workbook, archive)
    writer.save(filename)
    return True

def save_virtual_workbook(workbook,):
    temp_buffer = BytesIO()
    archive = ZipFile(temp_buffer, 'w', ZIP_DEFLATED, allowZip64=True)
    writer = ExcelWriter(workbook, archive)
    try:
        writer.write_data()
    finally:
        archive.close()
    virtual_workbook = temp_buffer.getvalue()
    temp_buffer.close()
    return virtual_workbook

现在,如果您运行以下代码:

from extendedopenpyxl import load_workbook, save_workbook

workbook = load_workbook(FILENAME1, keep_links=False)
sheet = workbook['Sheet']
sheet['A2'] = 'It is me'
save_workbook(workbook, 'out.xlsx')

当我在上面的示例中使用的excel文件上运行此代码时,得到了以下结果:

enter image description here

如您所见,单元格A1中的文本的格式保持不变(Hello为粗体,而不是)。

答案 1 :(得分:1)

根据the answer to this question,您可以使用openpyxl在Excel中格式化单元格。

那里给出的答案只会将目标单元格更改为粗体,但也许您可以change the font face回到name

work.abbrev

但是,根据documentation,您只能将样式应用于整个单元格,而不能应用于单元格的一部分。因此,您需要将高棉语字符放在一个单元格中,并将英文字符放在另一个单元格中。

答案 2 :(得分:0)

我尝试了代码,但出现错误提示:

<块引用>

NameError: name 'CommentRecord' 未定义

CommentRecord 存在于文件 extendedopenpyxl.py 的第 192 行 如果添加:

,问题就解决了
from openpyxl.comments.comment_sheet import *