Question

我正在尝试在Python中替换.docx文件中表中的文本。我对Python相当陌生，所以这是我稍后将解释的代码...

from typing import List, Any
from docx import Document
import re
import sys


label_name = sys.argv[1:][0]

file_name = "MyDocFile.docx"
doc = Document(file_name)
cell_text_array = []
target_index = 0


def index_cells(doc_obj):
    global cell_text_array
    for table in doc_obj.tables:
        for row in table.rows:
            for cell in row.cells:
                cell_text_array.append(cell.text)


def docx_replace_regex(doc_obj, regex, replace):
    global cell_text_array
    for p in doc_obj.paragraphs:
        if regex.search(p.text):
            inline = p.runs
            # Loop added to work with runs (strings with same style)
            for i in range(len(inline)):
                if regex.search(inline[i].text):
                    text = regex.sub(replace, inline[i].text)
                    inline[i].text = text

    for table in doc_obj.tables:
        for row in table.rows:
            for cell in row.cells:
                docx_replace_regex(cell, regex, replace)


# index the cells in the document
index_cells(doc)


# everything after: /myregex/
target_index = cell_text_array.index('myregex')

# the text that I actually need is 3 spots after 'myregex'
target_index += 3 

former_label = cell_text_array[target_index]

# find regex and replace
regex1 = re.compile(re.escape(r"" + former_label))
replace1 = r"" + label_name
print(regex1)
print(replace1)

# call the replace function and save what has been replaced

docx_replace_regex(doc, regex1, replace1)
doc.save('result1.docx')

第一个函数'index_cells（）'基本打开'MyDocFile.docx'，并从.docx文件所具有的表中搜索每个字符串，并将其保存在cell_text_array []中。我之所以从互联网上获取下一个功能，是因为我通常不使用Python编写代码，但是在这种情况下，我不得不这样做（出于各种原因，我无法使用Ruby的'docx'模块）。因此docx_replace_regex（）确实按照其名称的含义进行操作：打开.docx文件，找到需要替换的文本，然后将其替换为“替换”（即使在表或其他段落中找到了需要替换的文本））。

我想做的基本上是将新名称/标签/标签（无论您要调用什么）作为参数传递给文件，并使用.docx文件更改旧名称/标签/标签。参数并将新编辑的.docx文件保存到另一个新的.docx文件中。

如果我要替换的名称/标签/标签没有任何点，则此代码可以正常工作。实际上，我在表中的其他字符串上对其进行了测试，并且效果很好。由于此名称/标签/标签包含点，因此我不得不使用re.compile（re.escape（）），因此这些点将不被视为特殊字符，我认为它应该可以工作，但是由于新文件生成后的某些原因，一切都没有改变。

我打印出了'regex1'和'replace1'看看有什么用。 'Regex1'具有以下格式：re.compile（'tag \ .name \ .label'）而'replace1'只是tag.name.label，不带任何“或”。我认为这可能是行为不当的问题，但是我不确定，因为我是Python的新手。

有人可以帮我吗？有什么我想念的吗？

尝试替换基于正则表达式的.docx文件中的表中的文本

0 个答案: