网站上有<div>
个类&#34; entrytext&#34;。
这会返回75 <p>
个代码:
Document doc = Jsoup.connect(mUrl).get();
Elements p = doc.select("div[class=entrytext] > p");
我想要的是以下内容:
第一个<p>
看起来像这样:
<font>SBS</font>
之后有12个<p>
标签,有:
<font>KBS2</font>
如何在<p>
和<font>SBS</font>
之间归还所有<font>KBS2</font>
代码?
编辑:
<p><font color="#ff0000" size="6">SBS</font></p>
<p><img src="http://www.koreandrama.org/wp-content/uploads/2015/09/Six-Flying-Dragons-04-105x150.jpg" alt="Six Flying Dragons 04" width="105" height="150" class="alignnone size-thumbnail wp-image-46033"> </p>
<p><a href="http://www.koreandrama.org/six-flying-dragons/" target="_blank">육룡이 나르샤 / Six Flying Dragons / 六龍飛天</a><br> Broadcast period: 2015-Oct-05 to 2016-March-22<br> Air time: Monday & Tuesday 22:00</p>
repeats another 10 times
<p><font color="#ff0000" size="6">KBS2</font></p>
答案 0 :(得分:2)
您可以在此处使用CSS选择器:
div.entrytext p:has(font:containsOwn(SBS)) ~ p:not(p:has(font:containsOwn(KBS2)) ~ p):not(p:has(font:containsOwn(KBS2)))
http://try.jsoup.org/~sTCURuIFhPww_PiP1QZJE_s-WDE
div.entrytext /* Select div (A) with class entrytex */
p:has(font:containsOwn(SBS)) /* Select any p (B) descendant of (A) with <font>SBS</font> */
~ p /* Select all p preceded by (B)... */
:not(p:has(font:containsOwn(KBS2)) ~ p) /* ... but exclude any p preceded by <font>KBS2</font> */
:not(p:has(font:containsOwn(KBS2))) /* ... and exclude the p tag having <font>KBS2</font> */
String htmlpart = ""
+ "<div class=\"entrytext\">"
+ "<p>1</p>"
+ "<p><font>SBS</font></p>"
+ "<p>3</p>"
+ "<p>4</p>"
+ "<p>5</p>"
+ "<p><font>KBS2</font></p>"
+ "<p>7</p>"
+ "</div>"
;
Document doc = Jsoup.parse(htmlpart);
Elements allPs = doc.select("div.entrytext p:has(font:containsOwn(SBS)) ~ p:not(p:has(font:containsOwn(KBS2)) ~ p):not(p:has(font:containsOwn(KBS2)))");
System.out.println(allPs);
<p>3</p>
<p>4</p>
<p>5</p>
答案 1 :(得分:1)
我认为你不能把这种复杂的逻辑放在任何选择器中。我想你可能需要一点算法。这是我的解决方案:
String htmlpart = ""
+ "<div class=\"entrytext\">"
+ "<p>1</p>"
+ "<p><font>SBS</font></p>"
+ "<p>3</p>"
+ "<p>4</p>"
+ "<p>5</p>"
+ "<p><font>KBS2</font></p>"
+ "<p>7</p>"
+ "</div>"
;
Document doc = Jsoup.parse(htmlpart);
Elements allPs = doc.select("div[class=entrytext] > p");
boolean rem = true;
for (Iterator<Element> elemIter = allPs.iterator(); elemIter.hasNext();){
Element p = elemIter.next();
if (!p.select("font:matchesOwn(^SBS)").isEmpty() || !p.select("font:matchesOwn(^KBS2)").isEmpty()){
rem = !rem;
elemIter.remove(); //remove the font elements in question
}
else if (rem){
elemIter.remove();
}
}
System.out.println(allPs);
这将产生以下输出。
<p>3</p>
<p>4</p>
<p>5</p>