对象:将文件中的字符串与XML中的字符串相匹配。更换 与评论匹配
cat File.txt
RHO_BID_RT
RHO_ASK_RT
XML文件内容
<field name="RHO_BID_RT" type="float" id="0x01D3" sequence="1"/>
<field name="RHO_ASK_RT" type="float" id="0x01D4" sequence="1"/>
XML内容中的预期结果
<!-- Removed RHO_BID_RT-->
<!-- Removed RHO_ASK_RT-->
CODE
import re
word_file = 'File.txt'
xml_file = '../file.xml'
with open(word_file) as words:
regex = r'<[^>]+ *field name="({})"[^>]+>'.format(
'|'.join(word.rstrip() for word in words)
)
with open(xml_file) as xml:
for line in xml:
line = re.sub(regex, r'<!!-- REMOVED \1 -->', line.rstrip())
print(line)
答案 0 :(得分:1)
使用 XML解析器,例如lxml
。
这个想法是读取一个单词列表并构造一个xpath表达式,它将name
属性检查为这些单词之一。然后,通过调用replace()
:
from lxml import etree
from lxml.etree import Comment
with open('words.txt') as f:
words = [line.strip() for line in f]
xpath = '//field[{}]'.format(" or ".join(['@name = "%s"' % word for word in words]))
tree = etree.parse('input.xml')
root = tree.getroot()
for element in tree.xpath(xpath):
root.replace(element, Comment('REMOVED'))
print etree.tostring(tree)
对于input.xml
的以下内容:
<fields>
<field name="RHO_BID_RT" type="float" id="0x01D3" sequence="1"/>
<field name="RHO_ASK_RT" type="float" id="0x01D4" sequence="1"/>
</fields>
和words.txt
:
RHO_BID_RT
RHO_ASK_RT
打印:
<fields>
<!--REMOVED-->
<!--REMOVED-->
</fields>
或者,构造一组单词并在循环中检查name
属性值:
from lxml import etree
from lxml.etree import Comment
with open('words.txt') as f:
words = set([line.strip() for line in f])
tree = etree.parse('input.xml')
root = tree.getroot()
for element in tree.xpath('//field[@name]'):
if element.attrib['name'] in words:
root.replace(element, Comment('REMOVED'))
print etree.tostring(tree)