我有一个XML文档,如下所示:
<file>
<name>NAME_OF_FILE</name>
</file>
<file>
<name>NAME_OF_FILE</name>
</file>
我正在尝试编写一个Python脚本,用','替换标记之间的所有换行符,标记和空格(即不是元素本身)。
上述文件的输出应如下所示:
NAME_OF_FILE','NAME_OF_FILE','NAME_OF_FILE','
这是我到目前为止所得到的。我无法准确理解Python如何处理换行符:
import sys
import os
import re
source = r'c:\A\grepper.txt'
f = open(source,'r')
out = open(r'c:\A\bout.txt', 'a')
for line in f:
one = re.sub(r"\n", '', line)
two = re.sub(r"\r", '', one)
three = re.sub(r'</name>.*<name>', '\',\'', two)
out.write(three)
out.close()
答案 0 :(得分:2)
删除r
,因为它们按字面意思引用字符串。
one = re.sub("\n", '', line)
two = re.sub("\r", '', one)
您还可以使用string.replace()
进行这些简单的替换,并将它们合并为一行。
line = re.sub('r</name>.*<name>', "','", line.replace('\n', '').replace('\r', ''))
out.write(line)
然而,这仍然无法解决获得所需输出的问题。我建议做以下事情:
results = []
for line in f:
match = re.search(r'<name>(.*)</name>', line)
if match:
results.append(match.group(1))
print >>out, "','".join(results)
这是有效的:http://ideone.com/ik48G
答案 1 :(得分:0)
而不是替换你可能想要考虑匹配你想要的东西:
tag_re = re.compile('''
<(?P<tag>[a-z]+)> # First match the tag, must be a-z enclosed in <>
(?P<value>[^<>]+) # Match the value, anything but <>
</(?P=tag)> # Match the same tag we got earlier, but the closing version
''', re.VERBOSE)
print "','".join(m.group('value') for m in tag_re.finditer(data))
答案 2 :(得分:0)
正则表达式是错误的。使用xml.sax.handler
模块。
未测试:
import xml.sax
from xml.sax.handler import ContentHandler
class CharactersOnlyContentHandler(ContentHandler):
def __init__(self):
ContentHandler.__init__(self)
self.text = ""
self.texts = []
def characters(self, content):
self.text += content
def endElement(self, name):
if self.text:
self.texts.append(self.text)
self.text = ""
handler = CharactersOnlyContentHandler()
xml.sax.parse(xml_file_name, handler)
print ",".join("'%s'" % s for s in handler.texts)
答案 3 :(得分:0)
import lxml.etree
myxml = """
<filelist>
<file>
<name>FIRST FILE NAME</name>
</file>
<file>
<name>SECOND FILE NAME</name>
</file>
</filelist>
"""
root = lxml.etree.fromstring(myxml)
filenames = root.xpath('//file/name/text()')
print ', '.join(filenames)
结果
FIRST FILE NAME, SECOND FILE NAME