非常规格式化/解析XML

时间:2012-09-12 18:50:54

标签: xml

我有以下XML:

<XMLResults><ConfMess><RCode>0</RCode><MId>0</MId></ConfMess><COURSE_DATA><THEHEADING>Review Engagements: Inquiry and Analytical Review Procedures and Reporting</THEHEADING><ABSTRACT><!--this file has been generated by v.3.2.1 8/9/2012 8:50:14 AM by JHancock (and called from 'A G&Q Database')--><html><head><title>Course Abstract</title><link rel='stylesheet' href='https://www.thelearningcenter.org/cserver/case1/css/theabstract.css' type='text/css'></head><body><div style='text-align: center;' class=h2banner>Course Abstract</div><div id="tableContainer" class="tableContainer"><table class="abstract"><tbody class="scrollContent"><tr class="abstract"><td class="abstractCaptions">Main Title</td><td class="abstract" id=courseAbstractTitle>Initial Review: Find Out About Additional Reporting Procedures</td></tr><tr class="abstract"><td class="abstractCaptions">Writer(s)</td><td class="abstract" id=authorsAbstract>Karl Booker<br>Harriet Johnson</td></tr><tr class="abstract"><td class="abstractCaptions">Current Field(s) of Study<sup>1</sup></td><td class="abstract" id=fosAbstract>4.0 study hours in 'History'</td></tr><tr class="abstract"><td class="abstractCaptions">Area Of Study</td><td class="abstract" id=courseLevelAbstract>Medium</td></tr><tr class="abstract"><td class="abstractCaptions">Value (30 min.sec.)<sup>1</sup></td><td class="abstract" id=creditHoursAbstract>3.5</td></tr><tr class="abstract"><td class="abstractCaptions">Must Haves</td><td class="abstract" id=prerequisitesAbstract>None</td></tr><tr class="abstract"><td class="abstractCaptions">Description</td><td class="abstract" id=descriptionAbstract>This topic revolves around discussing important topics in the history field and how they relate to our current situation.</td></tr><tr class="abstract"><td class="abstractCaptions">TheObjective</td><td class="abstract" id=objectivesAbstract><ul><li>Learn more about history and how our modern times have been shaped by it.<li>Plan for the future<li>Help mankind to learn from the past<li>Provide valuable input to others<li>Be greatful for what we have<li>Gain credit for all the hard work we put in<li>Pass this course and move on with our lives.<li>Get a good job and raise a family.<li>Get a vacation home and relax on the beach<li>Soak up the sun and get a tan</ul></td></tr><tr class="abstract" id=idExpirationRow><td class="abstractCaptions">Expires</td><td class="abstract" id=expirationAbstract>This topic is reviewed monthly for value and modified where needed.</td></tr><tr class="abstract"><td class="abstractCaptions">Item ID</td><td class="abstract" id=courseIDabstract>odt</td></tr></tbody></table></div><div id=footnote1ID class="sylFNote"><sup>1</sup>Consult your instructor for infornation on this particular topic</div><div id="idCopyright" class="copyright">© 2004 THIS SCHOOL BOARD</div></body></html></ABSTRACT></COURSE_DATA><STUDY_AREA><SUBJECT>AuditField</SUBJECT><NUMBER_HOURS>3.0</NUMBER_HOURS></FIELD_OF_STUDY></XMLResults>

我似乎无法找到一个例程来解析XML <ABSTRACT>stuff</ABSTRACT>部分中的“东西”。我想这可能是由于特殊字符或类似的东西。有人可以帮我解决一个可以解决这个问题而不会失败的例程吗?

2 个答案:

答案 0 :(得分:2)

这不是XML。这是一堆带尖括号的文字。

您不仅在<ABSTRACT>元素中遇到问题,还有<STUDY_AREA></FIELD_OF_STUDY>

你是如何解决这个问题的?你没有。你得到任何向你发送此垃圾邮件的人都会向你发送有效的XML。它不像那里没有很多XML编辑器。他们应该使用这样的工具来创建和/或验证他们的“XML”。

答案 1 :(得分:0)

可能是因为<!-- -->是XML中的注释。它本身并没有失败。

Comments in XML

The syntax for writing comments in XML is similar to that of HTML.

<!-- This is a comment -->

以下是reference链接。

你如何解决这个问题取决于你正在使用的库。某些库可能支持获取该元素的原始文本。他们也可能会返回评论元素。

我可能只是grep <ABSTRACT>(.*)</ABSTRACT>的纯文本。如果每个文档有多个记录,可能会出现问题,因此您可能需要先将其隔离到每个文档中。