我想创建一个/(我的第一个)脚本,用Word文档中的不间断空格(\ u00A0)替换字符后面的空格,然后更改文本并保存更改的文档。
tldr 问:为什么' p'在for循环中评估为' space' +字符+'空格'而不是不间断的空间?
#Replace a space behind an unwanted expressions with a non-breaking space from a Word document to a new Word document.
import docx, re, sys
#get document name from command line
#if len(sys.argv) > 1:
#name = ' '.join(sys.argv[1:])
#doc = docx.Document(name + '.docx')
doc = docx.Document('Kajla.docx') #used this particular file for testing, will delete this line afterwards
#regex of unwanted expressions
regex = re.compile(r'''
(\s) #space
([aivkszuAIVKSZU]) #unwanted char
\s #space
''', re.VERBOSE)
#goes through each paragraph, replaces a match and saves it
for paragraph in range(len(doc.paragraphs)):
#keeps the space and the unwated character but replaces the last space
p = regex.sub(r'\1\2'+'\u00A0', doc.paragraphs[paragraph].text)
#doc.paragraphs[paragraph].text = p #commented out since it wasnt working
#print(p) #for testing, will delete
#saves document as a copy
#doc.save(name + '2.docx')
doc.save('Kajla2.docx')