JSoup提取类标题

时间:2016-11-09 23:30:45

标签: java jsoup

我试图从此链接中提取一些数据: https://myanimelist.net/anime/season

我特别想要的是每个链接图像文本。本节中的IE:

<div class="seasonal-anime js-seasonal-anime"
 data-genre="7,42,37"><div>
<div class="title"><a href="https://myanimelist.net/anime/32867/Bungou_Stray_Dogs_2nd_Season/video" class="icon-watch-pv fl-r" title="Watch Promotional Video">Watch Promotional Video</a><p class="title-text">
    <a href="https://myanimelist.net/anime/32867/Bungou_Stray_Dogs_2nd_Season" class="link-title">Bungou Stray Dogs 2nd Season</a>
  </p>
</div>

<div class="prodsrc">
  <span class="producer"><a href="/anime/producer/4/Bones" title="Bones">Bones</a></span>
  <div class="eps">
    <span id="32867" data-eps="12" class="fl-l icon-add-episode js-btn-add-episode" title="Click to increase your watched ep number by one"></span>        <a href="https://myanimelist.net/anime/32867/Bungou_Stray_Dogs_2nd_Season/episode"><span class="js-episode-num">6</span>/<span>12 eps</span>
    </a>
  </div>

  <span class="source">Manga</span>

  <a href="https://myanimelist.net/ownlist/anime/32867/edit?hideLayout=1" title="Watching" class="Lightbox_AddEdit button_edit btn-anime-watch-status js-anime-watch-status watching">CW</a>
</div>

    <div class="genres js-genre" id="32867">
      <div class="genres-inner js-genre-inner"><span class="genre">
        <a href="/anime/genre/7/Mystery" title="Mystery">Mystery</a>
      </span><span class="genre">
        <a href="/anime/genre/42/Seinen" title="Seinen">Seinen</a>
      </span><span class="genre">
        <a href="/anime/genre/37/Supernatural" title="Supernatural">Supernatural</a>
      </span></div>
    </div>
  </div>

      <div class="image lazyload" data-bg="https://myanimelist.cdn-dena.com/images/anime/4/82293.webp">
        <a href="https://myanimelist.net/anime/32867/Bungou_Stray_Dogs_2nd_Season" class="link-image">Bungou Stray Dogs 2nd Season</a>
      </div>

      <div class="synopsis js-synopsis">
        <span class="preline">Nakajima Atsushi was kicked out of his orphanage, and now he has no place to go and no food. While he is standing by a river, on the brink of starvation, he rescues a man whimsically attempting suicide. That man is Dazai Osamu, and he and his partner Kunikida are members of a very special detective agency. They have supernatural powers and deal with cases that are too dangerous for the police or the military. They&#039;re tracking down a tiger that has appeared in the area recently, around the time Atsushi came to the area. The tiger seems to have a connection to Atsushi, and by the time the case is solved, it is clear that Atsushi&#039;s future will involve much more of Dazai and the rest of the detectives!

    (Source: MangaHelpers)</span>
        <p class="licensors" data-licensors=""></p>
      </div>

      <div class="information">
        <div class="info">
          TV -
          <span class="remain-time">
                      Oct 6, 2016, 22:30 (JST)              </span>
        </div>
        <div class="scormem">
          <span class="member fl-r" title="Members">
            72,011
          </span>
          <span class="score" title="Score">
            8.34
          </span>
        </div>
      </div>

    </div>

我想获得Bungou Stray Dogs第二季的文字。我还希望获得data-bg值(https://myanimelist.cdn-dena.com/images/anime/4/82293.webp)并得到分数(8.34),但是对于每对数据都是如此。

我不确定使用Jsoup运行什么查询,因为我对HTML仍然很新,并且不太了解它。

运行此代码并不能解决任何问题:

Document doc = Jsoup.connect("https://myanimelist.net/anime/season").get();
Elements shows = doc.select("div:contains(image.lazyload)");

int i = 0;
for(Element show : shows){
    System.out.println(i+". "+show.text());
    i++;
}

1 个答案:

答案 0 :(得分:1)

您只需遵循记录hereselect语法。

例如,要获取链接图像文本:

Elements imageLinks = doc.select("a.link-image");

另外两个是相似的。我相信你能搞清楚。