Question

我正在尝试使用elementtree解析XML文件。我试图读取的XML文件是从MySql导出的，当创建XML文件时，如果我在数据库中有一个条目，如：c：cygwin \ bin，它将'\ b'翻译为退格。无论如何，我试图从XML文件中删除'\ b'的所有条目，以便我可以通过elementtree.parse（）方法发送它。出于某种原因，删除'\ b'的所有条目后，我不会写出整个文件。

这是我正在做的事情：

def preprocess(file):
    #exporting from MySQL query browser adds a weird
    #character to the result set, remove it
    #so the XML parser can read the data
    print "in preprocess"
    lines = map(lambda line: line.replace("\b", " "), file)

    #go to the beginning of the file
    file.seek(0);

    #overwrite with correct data
    file.writelines(lines)
    sys.exit()


'''Entry into the program'''
#test the file to see if processing is needed before parsing
for line in xml_file:
    p = re.compile("\\b") #search for '\b'
    if(p.match(line)):
        processing = True
        break #only one match needed

if processing:
    preprocess(xml_file)

结果是我最终得到了一个标题被截断的XML文件，因此当传递给解析器时它会失败。

这是从XML文件中删除的内容：

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE ROOT SYSTEM "diskreport.dtd">
<ROOT>
    <row>
      <field name="buildid">26960</field>
      <field name="cast(status as char)">Filesystem           1K-blocks      Used Available Use% Mounted on
C:cygwinin        285217976  88055920 197162056  31% /usr/bin

任何帮助/想法都会很棒，感谢

Answer 1

我想出了问题，当我真的需要使用p.search时，我正在使用p.match查找'\ b'的匹配项，p.match只查看从行的开头，搜索查找在整个生产线上出现。

解决方案：

def preprocess(file):
    #exporting from MySQL query browser adds a weird
    #character to the result set, remove it
    #so the XML parser can read the data
    print "in preprocess"
    lines = map(lambda line: line.replace("\b", ""), file)

    #go to the beginning of the file
    file.seek(0);

    #overwrite with correct data
    file.writelines(lines)
    sys.exit()


'''Entry into the program'''
#test the file to see if processing is needed before parsing
for line in xml_file:
    p = re.compile("\\b")
    if(p.search(line)): ####Changed to p.search here
        processing = True
        break #only one match needed

if processing:
    preprocess(xml_file)

覆盖XML文件

1 个答案: