Question

同一页面中提到的oodocx模块将用户引用到似乎不存在的/ examples文件夹。
我已经阅读了python-docx 0.7.2的文档，以及我在Stackoverflow中可以找到的关于这个主题的所有内容，所以请相信我已经完成了我的“功课”。

Python是我所知道的唯一语言（初学者+，也许是中级），所以请不要假设任何C，Unix，xml等知识。

任务：打开一个包含单行文本的ms-word 2007+文档（为了简单起见），并用字典值替换该行中文本中出现的任何“关键”字。然后关闭文档，保持其他所有内容相同。

文本行（例如）“我们将在海中徘徊。”

from docx import Document

document = Document('/Users/umityalcin/Desktop/Test.docx')

Dictionary = {‘sea’: “ocean”}

sections = document.sections
for section in sections:
    print(section.start_type)

#Now, I would like to navigate, focus on, get to, whatever to the section that has my
#single line of text and execute a find/replace using the dictionary above.
#then save the document in the usual way.

document.save('/Users/umityalcin/Desktop/Test.docx')

我没有在文档中看到任何允许我这样做的内容 - 也许它就在那里但是我没有得到它，因为在我的关卡中没有拼写出来。

我已经关注了这个网站上的其他建议，并试图使用模块的早期版本（https://github.com/mikemaccana/python-docx），它应该有“像replace，advReplace这样的方法”，如下所示：我打开源代码python解释器，并在最后添加以下内容（这是为了避免与已安装的0.7.2版本发生冲突）：

document = opendocx('/Users/umityalcin/Desktop/Test.docx')
words = document.xpath('//w:r', namespaces=document.nsmap)
for word in words:
    if word in Dictionary.keys():
        print "found it", Dictionary[word]
        document = replace(document, word, Dictionary[word])
savedocx(document, coreprops, appprops, contenttypes, websettings,
    wordrelationships, output, imagefiledict=None)

运行此命令会产生以下错误消息：

NameError：名称'coreprops'未定义

也许我正在尝试做一些无法做到的事情 - 但如果我遗漏了一些简单的话，我将非常感谢你的帮助。

如果这很重要，我在OSX 10.9.3上使用64位版本的Enthought's Canopy

Answer 1

当前版本的python-docx没有search()函数或replace()函数。这些都是相当频繁的要求，但是对于一般情况的实现非常棘手，而且还没有成为积压的顶端。

有些人已经取得了成功，使用已经存在的设施完成了他们需要的工作。这是一个例子。顺便说一句，它与部分无关：）

for paragraph in document.paragraphs:
    if 'sea' in paragraph.text:
        print paragraph.text
        paragraph.text = 'new text containing ocean'

要在表格中搜索，您需要使用类似：

的内容

for table in document.tables:
    for cell in table.cells:
        for paragraph in cell.paragraphs:
            if 'sea' in paragraph.text:
               ...

如果你追求这条道路，你很快就会发现复杂性是什么。如果替换段落的整个文本，则会删除任何字符级格式，例如粗体或斜体的单词或短语。

顺便说一句，来自@ wnnmaw的答案的代码是针对旧版本的python-docx，并且在0.3.0之后的版本中根本不起作用。

Answer 2

我从较早版本的答案中获得了很多帮助，但是对我来说，以下代码的功能就像用word中的简单查找和替换功能一样。希望这可以帮助。

#!pip install python-docx
#start from here if python-docx is installed
from docx import Document
#open the document
doc=Document('./test.docx')
Dictionary = {"sea": "ocean", "find_this_text":"new_text"}
for i in Dictionary:
    for p in doc.paragraphs:
        if p.text.find(i)>=0:
            p.text=p.text.replace(i,Dictionary[i])
#save changed document
doc.save('./test.docx')

上述解决方案有局限性。 1）包含“ find_this_text”的段落将变成纯文本，不带任何格式，2）与“ find_this_text”在同一段落中的上下文控件将被删除，3）上下文控件或表中的“ find_this_text”将被删除不可更改。

Answer 3

我需要一些东西来替换docx中的正则表达式。我接受了scannys的回答。处理风格我用过以下答案： Python docx Replace string in paragraph while keeping style 添加了递归调用来处理嵌套表。想出了类似的东西：

import re
from docx import Document

def docx_replace_regex(doc_obj, regex , replace):

    for p in doc_obj.paragraphs:
        if regex.search(p.text):
            inline = p.runs
            # Loop added to work with runs (strings with same style)
            for i in range(len(inline)):
                if regex.search(inline[i].text):
                    text = regex.sub(replace, inline[i].text)
                    inline[i].text = text

    for table in doc_obj.tables:
        for row in table.rows:
            for cell in row.cells:
                docx_replace_regex(cell, regex , replace)



regex1 = re.compile(r"your regex")
replace1 = r"your replace string"
filename = "test.docx"
doc = Document(filename)
docx_replace_regex(doc, regex1 , replace1)
doc.save('result1.docx')

迭代字典：

for word, replacement in dictionary.items():
    word_re=re.compile(word)
    docx_replace_regex(doc, word_re , replacement)

请注意，只有当整个正则表达式在文档中具有相同的样式时，此解决方案才会替换正则表达式。

此外，如果在保存相同样式文本后编辑文本，则可能在单独的运行中。例如，如果您打开具有“testabcd”字符串的文档并将其更改为“test1abcd”并保存，即使面团采用相同的样式，还有3个单独的运行“test”，“1”和“abcd”，在这种情况下替换test1将无法正常工作。

这用于跟踪文档中的更改。要将其发送到一次运行，在Word中您需要转到“选项”，“信任中心”和“隐私选项”，不要“存储随机数以提高组合准确度”并保存文档。

Answer 4

分享我编写的一个小脚本 - 帮助我生成带有变量的合法 .docx 合约，同时保留原始样式。

pip install python-docx

示例：

from docx import Document
import os


def main():
    template_file_path = 'employment_agreement_template.docx'
    output_file_path = 'result.docx'

    variables = {
        "${EMPLOEE_NAME}": "Example Name",
        "${EMPLOEE_TITLE}": "Software Engineer",
        "${EMPLOEE_ID}": "302929393",
        "${EMPLOEE_ADDRESS}": "דרך השלום מנחם בגין דוגמא",
        "${EMPLOEE_PHONE}": "+972-5056000000",
        "${EMPLOEE_EMAIL}": "example@example.com",
        "${START_DATE}": "03 Jan, 2021",
        "${SALARY}": "10,000",
        "${SALARY_30}": "3,000",
        "${SALARY_70}": "7,000",
    }

    template_document = Document(template_file_path)

    for variable_key, variable_value in variables.items():
        for paragraph in template_document.paragraphs:
            replace_text_in_paragraph(paragraph, variable_key, variable_value)

        for table in template_document.tables:
            for col in table.columns:
                for cell in col.cells:
                    for paragraph in cell.paragraphs:
                        replace_text_in_paragraph(paragraph, variable_key, variable_value)

    template_document.save(output_file_path)


def replace_text_in_paragraph(paragraph, key, value):
    if key in paragraph.text:
        inline = paragraph.runs
        for item in inline:
            if key in item.text:
                item.text = item.text.replace(key, value)


if __name__ == '__main__':
    main()

Answer 5

Office开发人员中心有一个条目，开发人员已经发布了（此时获得MIT许可）一些算法的描述，这些算法似乎为此提出了解决方案（尽管在C＃中，并且需要移植）：＆＃ 34; MS Dev Centre posting

Answer 6

您第二次尝试的问题是您还没有定义savedocx所需的参数。您需要在保存之前执行之类的操作：

relationships = docx.relationshiplist() title = "Document Title" subject = "Document Subject" creator = "Document Creator" keywords = [] coreprops = docx.coreproperties(title=title, subject=subject, creator=creator, keywords=keywords) app = docx.appproperties() content = docx.contenttypes() web = docx.websettings() word = docx.wordrelationships(relationships) output = r"path\to\where\you\want\to\save"

Answer 7

他再次更改了docx py中的API ...

为了让来到这里的每个人保持理智：

react-native run-android

测试用例：

import datetime
import os
from decimal import Decimal
from typing import NamedTuple

from docx import Document
from docx.document import Document as nDocument


class DocxInvoiceArg(NamedTuple):
  invoice_to: str
  date_from: str
  date_to: str
  project_name: str
  quantity: float
  hourly: int
  currency: str
  bank_details: str


class DocxService():
  tokens = [
    '@INVOICE_TO@',
    '@IDATE_FROM@',
    '@IDATE_TO@',
    '@INVOICE_NR@',
    '@PROJECTNAME@',
    '@QUANTITY@',
    '@HOURLY@',
    '@CURRENCY@',
    '@TOTAL@',
    '@BANK_DETAILS@',
  ]

  def __init__(self, replace_vals: DocxInvoiceArg):
    total = replace_vals.quantity * replace_vals.hourly
    invoice_nr = replace_vals.project_name + datetime.datetime.strptime(replace_vals.date_to, '%Y-%m-%d').strftime('%Y%m%d')
    self.replace_vals = [
      {'search': self.tokens[0], 'replace': replace_vals.invoice_to },
      {'search': self.tokens[1], 'replace': replace_vals.date_from },
      {'search': self.tokens[2], 'replace': replace_vals.date_to },
      {'search': self.tokens[3], 'replace': invoice_nr },
      {'search': self.tokens[4], 'replace': replace_vals.project_name },
      {'search': self.tokens[5], 'replace': replace_vals.quantity },
      {'search': self.tokens[6], 'replace': replace_vals.hourly },
      {'search': self.tokens[7], 'replace': replace_vals.currency },
      {'search': self.tokens[8], 'replace': total },
      {'search': self.tokens[9], 'replace': 'asdfasdfasdfdasf'},
    ]
    self.doc_path_template = os.path.dirname(os.path.realpath(__file__))+'/docs/'
    self.doc_path_output = self.doc_path_template + 'output/'
    self.document: nDocument = Document(self.doc_path_template + 'invoice_placeholder.docx')


  def save(self):
    for p in self.document.paragraphs:
      self._docx_replace_text(p)
    tables = self.document.tables
    self._loop_tables(tables)
    self.document.save(self.doc_path_output + 'testiboi3.docx')

  def _loop_tables(self, tables):
    for table in tables:
      for index, row in enumerate(table.rows):
        for cell in table.row_cells(index):
          if cell.tables:
            self._loop_tables(cell.tables)
          for p in cell.paragraphs:
            self._docx_replace_text(p)

        # for cells in column.
        # for cell in table.columns:

  def _docx_replace_text(self, p):
    print(p.text)
    for el in self.replace_vals:
      if (el['search'] in p.text):
        inline = p.runs
        # Loop added to work with runs (strings with same style)
        for i in range(len(inline)):
          print(inline[i].text)
          if el['search'] in inline[i].text:
            text = inline[i].text.replace(el['search'], str(el['replace']))
            inline[i].text = text
        print(p.text)

有文件夹 from django.test import SimpleTestCase from docx.table import Table, _Rows from toggleapi.services.DocxService import DocxService, DocxInvoiceArg class TestDocxService(SimpleTestCase): def test_document_read(self): ds = DocxService(DocxInvoiceArg(invoice_to=""" WAW test1 Multi myfriend """,date_from="2019-08-01", date_to="2019-08-30", project_name='WAW', quantity=10.5, hourly=40, currency='USD',bank_details=""" Paypal to: bippo@bippsi.com""")) ds.save() 和 docs 在您拥有docs/output/

的同一文件夹中

例如

确保参数化并替换内容

Answer 8

对于表的情况，我不得不将@scanny的答案修改为：

for table in doc.tables:
    for col in table.columns:
        for cell in col.cells:
            for p in cell.paragraphs:

使其起作用。确实，这似乎不适用于API的当前状态：

for table in document.tables:
    for cell in table.cells:

此处的代码存在相同的问题：https://github.com/python-openxml/python-docx/issues/30#issuecomment-38658149

如何使用python-docx替换Word文档中的文本并保存

8 个答案: