将带有.sub()的\ u00A0替换为Word文档

时间:2017-11-29 17:31:48

标签: python regex docx

我想创建一个/(我的第一个)脚本,用Word文档中的不间断空格(\ u00A0)替换字符后面的空格,然后更改文本并保存更改的文档。

tldr 问:为什么' p'在for循环中评估为' space' +字符+'空格'而不是不间断的空间?

#Replace a space behind an unwanted expressions with a non-breaking space from a Word document to a new Word document.
import docx, re, sys

#get document name from command line
#if len(sys.argv) > 1:
    #name = ' '.join(sys.argv[1:])
#doc = docx.Document(name + '.docx')
doc = docx.Document('Kajla.docx')  #used this particular file for testing, will delete this line afterwards

#regex of unwanted expressions
regex = re.compile(r'''
(\s)                  #space
([aivkszuAIVKSZU])    #unwanted char
\s                    #space
''', re.VERBOSE)

#goes through each paragraph, replaces a match and saves it
for paragraph in range(len(doc.paragraphs)):
    #keeps the space and the unwated character but replaces the last space
    p = regex.sub(r'\1\2'+'\u00A0', doc.paragraphs[paragraph].text)
    #doc.paragraphs[paragraph].text = p #commented out since it wasnt working
    #print(p)      #for testing, will delete

#saves document as a copy
#doc.save(name + '2.docx')
doc.save('Kajla2.docx')

0 个答案:

没有答案