无法从JSoup中提取资源

时间:2017-08-09 22:45:15

标签: java jsoup

出于某种原因,我无法从HTML代码中提取出我想要的文本。作为参考,我试图从<获取“title”属性。 a class =“a-link-normal s-access-detail-page。

HTML代码:

<div id="resultsCol" class='showRightCol'>
  <div id="centerMinus" class='leftCol'>
    <div id="atfResults" class="a-row s-result-list-parent-container">
      <ul id="s-results-list-atf" class="s-result-list s-col-1 s-col-ws-1 s-result-list-hgrid s-height-equalized s-list-view s-text-condensed">
        <li id="result_0" data-asin="B01KIZUF7Y" class="s-result-item celwidget ">
          <div class="s-item-container">
            <div class="a-fixed-left-grid">
              <div class="a-fixed-left-grid-inner" style="padding-left:218px">
                <div class="a-fixed-left-grid-col a-col-left" style="width:218px;margin-left:-218px;_margin-left:-109px;float:left;">
                  <div class="a-row">
                    <div aria-hidden="true" class="a-column a-span12 a-text-center">
                      <a class="a-link-normal a-text-normal" href="http://rads.stackoverflow.com/amzn/click/B01KIZUF7Y">
                        <img src="https://images-na.ssl-images-amazon.com/images/I/51LKyGJpYJL._AC_US218_.jpg" srcset="https://images-na.ssl-images-amazon.com/images/I/51LKyGJpYJL._AC_US218_.jpg 1x, https://images-na.ssl-images-amazon.com/images/I/51LKyGJpYJL._AC_US327_FMwebp_QL65_.jpg 1.5x, https://images-na.ssl-images-amazon.com/images/I/51LKyGJpYJL._AC_US436_FMwebp_QL65_.jpg 2x, https://images-na.ssl-images-amazon.com/images/I/51LKyGJpYJL._AC_US500_FMwebp_QL65_.jpg 2.2935x"
                          width="218" height="218" alt="Product Details" class="s-access-image cfMarker" data-search-image-load>
                      </a>
                      <div class="a-section a-spacing-none a-text-center">
                      </div>
                    </div>
                  </div>
                </div>
                <div class="a-fixed-left-grid-col a-col-right" style="padding-left:2%;*width:97.6%;float:left;">
                  <div class="a-row a-spacing-small">
                    <div class="a-row a-spacing-none scx-truncate-medium sx-line-clamp-3 s-list-title-long">
                      (want to get the title attribute from here)
                      <a class="a-link-normal s-access-detail-page  s-color-twister-title-link a-text-normal" title="MSI GAMING Radeon RX 480 GDDR5 4GB CrossFire VR Ready FinFET DirectX 12 Graphics Card (RX 480 GAMING X 4G)" href="http://rads.stackoverflow.com/amzn/click/B01KIZUF7Y">
                        <h2 data-attribute="MSI GAMING Radeon RX 480 GDDR5 4GB CrossFire VR Ready FinFET DirectX 12 Graphics Card (RX 480 GAMING X 4G)" data-max-rows="3" class="a-size-medium s-inline  s-access-title  a-text-normal">MSI GAMING Radeon RX 480 GDDR5 4GB CrossFire VR Ready FinFET DirectX 12 Graphics Card (RX 480 GAMING X 4G)
                        </h2>
                      </a>
                    </div>
                    <div class="a-row a-spacing-none">
                      <span class="a-size-small a-color-secondary">by </span>
                      <span class="a-size-small a-color-secondary">MSI</span>
                    </div>
                  </div>
                  <div class="a-row">
                    <div class="a-column a-span7">
                      <div class="a-row a-spacing-mini">
                        <div class="a-row a-spacing-none">
                          <a class="a-size-small a-link-normal a-text-normal" href="http://rads.stackoverflow.com/amzn/click/B01KIZUF7Y">
                            <span class="a-color-secondary a-text-strike"></span>
                            <span class="a-size-base a-color-base">$349.99</span>
                            <span class="a-letter-space"></span>(6 used &amp new offers)</a>
                        </div>
                      </div>
                    </div>
                    <div class="a-column a-span5 a-span-last">
                      <div class="a-row a-spacing-mini">
                        <span name="B01KIZUF7Y">

Java代码:

Elements basicLink = doc.select("div.showRightCol")
                        .select("div.leftCol")
                        .select("div.a-row.s-result-list-parent-container")
                        .select("ul.s-result-list.s-col-1.s-col-ws-1.s-result-list-hgrid.s-height-equalized.s-list-view.s-text-condensed")
                        .select("li.s-result-item.celwidget")
                        .select("div.s-item-container")
                        .select("div.a-fixed-left-grid")
                        .select("div.a-fixed-left-grid-inner");//start here to get to everything

title = basicLink.select("div.a-fixed-left-grid-col.a-col-right")
                  .select("div.a-row.a-spacing-small")
                  .select("div.a-row.a-spacing-none.scx-truncate-medium.sx-line-clamp-3.s-list-title-long")
                  .select("a.a-link-normal.s-access-detail-page.s-color-twister-title-link.a-text-normal")
                  .attr("title");

有趣的是,我实际上已经使用了这段代码,但由于某些原因它在我更改了一行代码之后停止了运行,但之后又恢复了原来的代码。它应该工作,但我不知道我的线路是否效率低下或者某些事情确实是错误的。谢谢你的时间!

1 个答案:

答案 0 :(得分:1)

元素a有多个类。您必须用点替换空格:

Element element = doc.select("a.a-link-normal.s-access-detail-page.s-color-twister-title-link.a-text-normal").first();
String title = element.attr("title");

为了完整起见,因为没有其他元素具有title属性,您可以这样做:

Element element = doc.select("[title]").first();