Question

StackOverflow有类似的问题，但我的查询无法以某种方式解决。我想提取标签内容，文本中的内容以及正确的反馈和不正确的反馈

<content ID="0">Which of the following are objectives of Sourcing?</content>
     <cyu ID="1">
        <text id="1" Type="true">Simplify the management of the procurement process</text>
        <text id="2" Type="false">Perform long-term contract management</text>
        <text id="3" Type="true">Select, develop, and maintain sources of supply</text>
        <text id="4" Type="false">Calculate maintenance and servicing costs</text>
        <text id="5" Type="true">Enable maintenance of inventory for continuous production</text>
        <correctFeedback>Great! You made the correct choice. </correctFeedback>
        <incorrectFeedback>You made an incorrect choice. </incorrectFeedback>
     </cyu>

我使用的代码是open（“m01_004_000.xml”）as infile：

with open("whole.txt","w") as outfile:
    collector = []
    for line in infile:
        if line.startswith("<content ID ="">"):
            collector = []
        collector.append(line)
        if line.startswith("<correctFeedback>"):
            for outline in collector:
                outfile.write(outline)

但是这显示了一个空白的整个.txt。可能有什么不对？还有其他办法吗？

Answer 1

有几个问题：

您的行未按预期启动，文本前面有空格。所以你应该这样做：
```
line = line.strip()
```
后
```
for line in infile:
```
以删除空格。
语法："<content ID ="">"因为重复引号而创建字符串<content ID =>，并且您没有那样的行。
您应该使用模块xml来解析xml文件。

用于从文件中提取内容的Python脚本

1 个答案: