我想把这些句子改成xml
I will meet you at 1st.
5th... OK, 5th?
today is 2nd\n
Aug.3rd
像这样:
<Text VAlign="top" VPosition="85.00">
I will meet you at 1<Font Script="super">st</Font>.
</Text>
<Text VAlign="top" VPosition="85.00">
5<Font Script="super">th</Font>... OK, 5<Font Script="super">th</Font>
</Text>
<Text VAlign="top" VPosition="85.00">
today is 2<Font Script="super">nd</Font>\n
</Text>
<Text VAlign="top" VPosition="85.00">
Aug.3<Font Script="super">rd</Font>\n
</Text>
我正在使用minidom,但经过多次发布和回答后,我不介意用其他解析器重写我的代码。一开始,我认为这很简单,只需用{/ 1>替换st|nd|rd|th
即可
<Font Script="super">st|nd|rd|th</Font>
然后使用这个新字符串createTextNode()。
但是,#{1}}符号被writexml()方法变为<, > and "
。它适用于XML规范,但不适合阅读。
我该怎么做?非常感谢。
答案 0 :(得分:1)
以下是您可以使用标准库中的xml.etree.ElementTree执行的操作:
import re
import xml.etree.ElementTree as ET
data = """I will meet you at 1st.
5th... OK, 5th?
today is 2nd
Aug.3rd"""
endings = ['st', 'th', 'nd', 'rd']
pattern = re.compile('(%s)' % "|".join(endings))
root = ET.Element('root')
for line in data.split('\n'):
items = []
for item in re.split(pattern, line):
if item in endings:
items.append('<Font Script="super">%s</Font>' % item)
else:
items.append(item)
element = ET.fromstring("""<Text VAlign="top" VPosition="85.00">%s</Text>""" % ''.join(items))
root.append(element)
print ET.tostring(root)
它生成以下xml:
<root>
<Text VAlign="top" VPosition="85.00">I will meet you at 1<Font Script="super">st</Font>.
</Text>
<Text VAlign="top" VPosition="85.00">5<Font Script="super">th</Font>... OK, 5<Font Script="super">th</Font>?
</Text>
<Text VAlign="top" VPosition="85.00">today is 2
<Font Script="super">nd</Font>
</Text>
<Text VAlign="top" VPosition="85.00">Aug.3
<Font Script="super">rd</Font>
</Text>
</root>
答案 1 :(得分:0)
要使用推进缩进和换行进行输出,我需要lxml,我把它放在alecxe的代码中。
from lxml import etree as ET
print ET.tostring(root, pretty_print=True)