我正在使用jsoup来解析html元素
我正在做的是
Elements e = document.select(".doc-type-list li a");
System.out.println(e);
给了我这个
<a class="doc-type doc-type-mtm" href="/mtm/a-d-topical.html">A & D topical</a>
<a class="doc-type doc-type-cdi" href="/cdi/a-d-cracked-skin-relief-cream.html">A + D Cracked Skin Relief cream</a>
<a class="doc-type doc-type-mtm" href="/mtm/a-200-lice-treatment.html">A-200 Lice Treatment</a>
<a class="doc-type doc-type-mtm" href="/mtm/a-25.html">A-25</a>
<a class="doc-type doc-type-cons" href="/cons/a-caro-25.html">A-Caro-25</a>
<a class="doc-type doc-type-cons" href="/cons/a-g-profen.html">A-G Profen</a>
<a class="doc-type doc-type-pro" href="/pro/a-hydrocort.html">A-Hydrocort</a>
<a class="doc-type doc-type-mtm" href="/mtm/a-methapred-injection.html">A-Methapred injection</a>
<a class="doc-type doc-type-cons" href="/cons/a-methapred.html">A-Methapred</a>
<a class="doc-type doc-type-cdi" href="/cdi/a-methapred-solution.html">A-methapred solution</a>
<a class="doc-type doc-type-pro" href="/pro/a-methapred-injection.html">A-Methapred Injection</a>
<a class="doc-type doc-type-mtm" href="/mtm/a-phedrin.html">A-Phedrin</a>
<a class="doc-type doc-type-cdi" href="/cdi/a-spaz.html">A-Spaz</a>
<a class="doc-type doc-type-cdi" href="/cdi/a-tan-12x-suspension.html">A-Tan 12X suspension</a>
<a class="doc-type doc-type-mtm" href="/mtm/a-e-r-witch-hazel.html">A.E.R. Witch Hazel</a>
<a class="doc-type doc-type-cons" href="/cons/a-b-otic.html">A / B Otic</a>
<a class="doc-type doc-type-mtm" href="/mtm/a-fish-oil.html">A / Fish Oil</a>
<a class="doc-type doc-type-mtm" href="/mtm/a-t-s.html">A / T / S</a>
<a class="doc-type doc-type-cons" href="/cons/a-t-s-topical.html">A / T / S Topical</a>
<a class="doc-type doc-type-monograph" href="/monograph/a1-proteinase-inhibitor-human.html">a1-Proteinase Inhibitor (Human)</a>
<a class="doc-type doc-type-cons" href="/cons/a200-maximum-strength-topical.html">A200 Maximum Strength Topical</a>
<a class="doc-type doc-type-cons" href="/cons/a200-time-tested-formula-topical.html">A200 Time-Tested Formula Topical</a>
<a class="doc-type doc-type-mtm" href="/mtm/abacavir.html">abacavir</a>
<a class="doc-type doc-type-cons" href="/cons/abacavir.html">abacavir</a>
<a class="doc-type doc-type-cdi" href="/cdi/abacavir-solution.html">abacavir solution</a>
<a class="doc-type doc-type-cdi" href="/cdi/abacavir.html">abacavir</a>
<a class="doc-type doc-type-ppa" href="/ppa/abacavir.html">Abacavir</a>
<a class="doc-type doc-type-mtm" href="/mtm/abacavir-and-lamivudine.html">abacavir and lamivudine</a>
<a class="doc-type doc-type-cons" href="/cons/abacavir-and-lamivudine.html">abacavir and lamivudine</a>
<a class="doc-type doc-type-ppa" href="/ppa/abacavir-and-lamivudine.html">Abacavir and Lamivudine</a>
<a class="doc-type doc-type-pro" href="/pro/abacavir-and-lamivudine-tablets.html">Abacavir and Lamivudine Tablets</a>
<a class="doc-type doc-type-monograph" href="/monograph/abacavir-sulfate.html">Abacavir Sulfate</a>
<a class="doc-type doc-type-pro" href="/pro/abacavir-sulfate-tablets.html">Abacavir Sulfate Tablets</a>
<a class="doc-type doc-type-mtm" href="/mtm/abacavir-dolutegravir-and-lamivudine.html">abacavir, dolutegravir, and lamivudine</a>
<a class="doc-type doc-type-cons" href="/cons/abacavir-dolutegravir-and-lamivudine.html">abacavir, dolutegravir, and lamivudine</a>
<a class="doc-type doc-type-cdi" href="/cdi/abacavir-dolutegravir-and-lamivudine.html">abacavir, dolutegravir, and lamivudine</a>
<a class="doc-type doc-type-ppa" href="/ppa/abacavir-dolutegravir-and-lamivudine.html">Abacavir, Dolutegravir, and Lamivudine</a>
<a class="doc-type doc-type-pro" href="/pro/abacavir-lamivudine-and-zidovudinetablets.html">Abacavir, Lamivudine and ZidovudineTablets</a>
<a class="doc-type doc-type-mtm" href="/mtm/abacavir-lamivudine-and-zidovudine.html">abacavir, lamivudine, and zidovudine</a>
<a class="doc-type doc-type-cons" href="/cons/abacavir-lamivudine-and-zidovudine.html">abacavir, lamivudine, and zidovudine</a>
但我希望文本在{/ 1>的数组中位于{/ 1}}
a
答案 0 :(得分:2)
您可以使用方法转换为列表。
cStringIO.StringIO(file)
答案 1 :(得分:1)
您需要循环遍历所有元素并获取内部HTML:
final Elements e = document.select(".doc-type-list li a");
for (final Element elem : e)
{
System.out.println(elem.html());
}
答案 2 :(得分:-1)
获取元素自己的文本
for (Element element : e) {
System.out.println(element.ownText());
}