用于从文件中提取内容的Python脚本

时间:2014-09-20 12:28:47

标签: python xml text-files

StackOverflow有类似的问题,但我的查询无法以某种方式解决。我想提取标签内容,文本中的内容以及正确的反馈和不正确的反馈

<content ID="0">Which of the following are objectives of Sourcing?</content>
     <cyu ID="1">
        <text id="1" Type="true">Simplify the management of the procurement process</text>
        <text id="2" Type="false">Perform long-term contract management</text>
        <text id="3" Type="true">Select, develop, and maintain sources of supply</text>
        <text id="4" Type="false">Calculate maintenance and servicing costs</text>
        <text id="5" Type="true">Enable maintenance of inventory for continuous production</text>
        <correctFeedback>Great! You made the correct choice. </correctFeedback>
        <incorrectFeedback>You made an incorrect choice. </incorrectFeedback>
     </cyu>

我使用的代码是open(“m01_004_000.xml”)as infile:

with open("whole.txt","w") as outfile:
    collector = []
    for line in infile:
        if line.startswith("<content ID ="">"):
            collector = []
        collector.append(line)
        if line.startswith("<correctFeedback>"):
            for outline in collector:
                outfile.write(outline)

但是这显示了一个空白的整个.txt。可能有什么不对?还有其他办法吗?

1 个答案:

答案 0 :(得分:0)

有几个问题:

  1. 您的行未按预期启动,文本前面有空格。所以你应该这样做:

    line = line.strip()
    

    for line in infile:
    

    以删除空格。

  2. 语法:"<content ID ="">"因为重复引号而创建字符串<content ID =>,并且您没有那样的行。

  3. 您应该使用模块xml来解析xml文件。