我正在尝试使用elementtree解析XML文件。我试图读取的XML文件是从MySql导出的,当创建XML文件时,如果我在数据库中有一个条目,如:c:cygwin \ bin,它将'\ b'翻译为退格。无论如何,我试图从XML文件中删除'\ b'的所有条目,以便我可以通过elementtree.parse()方法发送它。出于某种原因,删除'\ b'的所有条目后,我不会写出整个文件。
这是我正在做的事情:
def preprocess(file):
#exporting from MySQL query browser adds a weird
#character to the result set, remove it
#so the XML parser can read the data
print "in preprocess"
lines = map(lambda line: line.replace("\b", " "), file)
#go to the beginning of the file
file.seek(0);
#overwrite with correct data
file.writelines(lines)
sys.exit()
'''Entry into the program'''
#test the file to see if processing is needed before parsing
for line in xml_file:
p = re.compile("\\b") #search for '\b'
if(p.match(line)):
processing = True
break #only one match needed
if processing:
preprocess(xml_file)
结果是我最终得到了一个标题被截断的XML文件,因此当传递给解析器时它会失败。
这是从XML文件中删除的内容:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE ROOT SYSTEM "diskreport.dtd">
<ROOT>
<row>
<field name="buildid">26960</field>
<field name="cast(status as char)">Filesystem 1K-blocks Used Available Use% Mounted on
C:cygwinin 285217976 88055920 197162056 31% /usr/bin
任何帮助/想法都会很棒, 感谢
答案 0 :(得分:1)
我想出了问题,当我真的需要使用p.search时,我正在使用p.match查找'\ b'的匹配项,p.match只查看从行的开头,搜索查找在整个生产线上出现。
解决方案:
def preprocess(file):
#exporting from MySQL query browser adds a weird
#character to the result set, remove it
#so the XML parser can read the data
print "in preprocess"
lines = map(lambda line: line.replace("\b", ""), file)
#go to the beginning of the file
file.seek(0);
#overwrite with correct data
file.writelines(lines)
sys.exit()
'''Entry into the program'''
#test the file to see if processing is needed before parsing
for line in xml_file:
p = re.compile("\\b")
if(p.search(line)): ####Changed to p.search here
processing = True
break #only one match needed
if processing:
preprocess(xml_file)