我有Microsoft Office生成的HTML,如下所示:
<p class="MsoListParagraph" style="text-indent:-.25in;mso-list:l0 level1 lfo1"><span style="font-family:Symbol">
<span style="mso-list:Ignore">·<span style="font:7.0pt "Times New Roman"">
</span></span>
</span>It’s a media conglomerate, need to understand the parts<o:p/></p>
<p class="MsoListParagraph" style="text-indent:-.25in;mso-list:l0 level1 lfo1"><![if !supportLists]><span style="font-family:Symbol">
<span style="mso-list:Ignore">·<span style="font:7.0pt "Times New Roman"">
</span></span>
</span><![endif]>Largest TV broadcaster in Mexico<o:p/></p>
<p class="MsoListParagraph" style="margin-left:1.0in;text-indent:-.25in;mso-list:l0 level2 lfo1">
<![if !supportLists]><span style="font-family:"Courier New"">
<span style="mso-list:Ignore">o<span style="font:7.0pt "Times New Roman"">
</span></span>
</span><![endif]>There’s 7 free air channels in Mexico and they have 4<o:p/></p>
<p class="MsoListParagraph" style="margin-left:1.0in;text-indent:-.25in;mso-list:l0 level2 lfo1">
<![if !supportLists]><span style="font-family:"Courier New"">
<span style="mso-list:Ignore">o<span style="font:7.0pt "Times New Roman"">
</span></span>
</span><![endif]>70% of citizens watch their channels<o:p/></p>
我想使用HXT来转换DOM结构,以便
我将所有<p>
的样式为“mso-list:l0 level1”转换为<ul><li class="level1">
并转换<p>
样式为“mso-list:l0 level2” “进入<ul><li class="level2">
将第一个level1项目中的连续level2项目嵌套在它们之前。
我已尝试使用Control.Arrow.ArrowNavigatableTree
函数和来自getXPathTrees
的{{1}}对HXT进行各种实验,但4小时后没有运气。
有什么建议吗?我怀疑解决方案涉及折叠兄弟Text.XML.HXT.XPath.Arrows
XmlTrees列表。
修改的
这是我到目前为止提出的解决方案:
<p>