尝试处理div
元素中的不规则内容。即h3
标题之后的内容。 h3标题下没有固定的内容。但是,我需要将任何文本与标题相关联。可能有一个ul或只是一个跨度或两者兼而有之。主要的是没有结合h3标题下的所有文本。
我已经能够使用.css运算符导航到我的div
。如果有多个评论,则每个div
包含4个h3
标题中的一个或多个,后跟评论或列表。
如何分隔在下一个标记之前结束的h3
标记之后的任何内容(如果有的话)?
你可以看到我在这里工作的div
的样本(我可以抓住h2
之间的任何内容,因为每个div
都是一样的):
<div class="inspection_container">
<h2 class="inspection_date_title">
<div class="calendar_list">
<span>Mar</span><strong>4</strong>
</div>Routine Inspection<small>Inspected Mar. 4, 2014</small>
</h2>
<h3>Actions taken by inspector</h3>
<ul>
<li class="Comment">
<strong>Consultation / Technical Assistance</strong><p>Instructions are given to the owner/operator to assist them with taking the proper actions to meet regulations.</p>
</li>
</ul>
</div>
<div class="inspection_container">
<h2 class="inspection_date_title">
<div class="calendar_list">
<span>Sep</span><strong>4</strong>
</div>Re-inspection<small>Inspected Sep. 4, 2013</small>
</h2>
<h3>Not in compliance</h3>
<ul>
<li class="X">
<strong>Premise is clean/sanitary</strong><p>Food premise is to be maintained in a clean and sanitary condition.</p>
</li>
</ul>
<h3>Actions taken by inspector</h3>
<ul>
<li class="Comment">
<strong>Consultation / Technical Assistance</strong><p>Instructions are given to the owner/operator to assist them with taking the proper actions to meet regulations.</p>
</li>
</ul>
</div>
<div class="inspection_container">
<h2 class="inspection_date_title">
<div class="calendar_list">
<span>Aug</span><strong>30</strong>
</div>Routine Inspection<small>Inspected Aug. 30, 2013</small>
</h2>
<h3>Not in compliance</h3>
<ul>
<li class="X">
<strong>Washrooms are cleaned regularly</strong><p>Washrooms are to be kept clean, sanitary, in good repair and must be supplied with liquid soap in a dispenser, single service/paper towels, cloth roller towel or hot air dryer and hot and cold running water.</p>
</li>
<li class="X">
<strong>Building interior is well-maintained</strong><p>Walls, floors and ceilings are to be maintained and in good repair.</p>
</li>
<li class="X">
<strong>Premise is clean/sanitary</strong><p>Food premise is to be maintained in a clean and sanitary condition.</p>
</li>
</ul>
<h3>Actions taken by inspector</h3>
<ul>
<li class="Comment">
<strong>Consultation / Technical Assistance</strong><p>Instructions are given to the owner/operator to assist them with taking the proper actions to meet regulations.</p>
</li>
</ul>
</div>
答案 0 :(得分:0)
提供:
h3
和ul
元素,直到包装div结束ul
h3
并且您的示例具有代表性,这应该可以解决问题。
//ul[count(following-sibling::h3) = count(following-sibling::ul)]
如果其他元素与ul
位于同一位置,但h3
之间只有一个元素,则可以使用此表达式
//ul[count(following-sibling::h3) = count(following-sibling::*[not(local-name() = 'h3')])]
至于立即对h3
元素和ul
元素进行分组,我不认为这在单独的XPath中是可行的。你需要在Ruby中做到这一点。我建议搜索div
元素并强制解析它们,同时计算节点并将奇数和偶数h3
和ul
组合在一起