StackOverflow有类似的问题,但我的查询无法以某种方式解决。我想提取标签内容,文本中的内容以及正确的反馈和不正确的反馈
<content ID="0">Which of the following are objectives of Sourcing?</content>
<cyu ID="1">
<text id="1" Type="true">Simplify the management of the procurement process</text>
<text id="2" Type="false">Perform long-term contract management</text>
<text id="3" Type="true">Select, develop, and maintain sources of supply</text>
<text id="4" Type="false">Calculate maintenance and servicing costs</text>
<text id="5" Type="true">Enable maintenance of inventory for continuous production</text>
<correctFeedback>Great! You made the correct choice. </correctFeedback>
<incorrectFeedback>You made an incorrect choice. </incorrectFeedback>
</cyu>
我使用的代码是open(“m01_004_000.xml”)as infile:
with open("whole.txt","w") as outfile:
collector = []
for line in infile:
if line.startswith("<content ID ="">"):
collector = []
collector.append(line)
if line.startswith("<correctFeedback>"):
for outline in collector:
outfile.write(outline)
但是这显示了一个空白的整个.txt。可能有什么不对?还有其他办法吗?
答案 0 :(得分:0)
有几个问题:
您的行未按预期启动,文本前面有空格。所以你应该这样做:
line = line.strip()
后
for line in infile:
以删除空格。
语法:"<content ID ="">"
因为重复引号而创建字符串<content ID =>
,并且您没有那样的行。
您应该使用模块xml
来解析xml文件。