使用硒从网站上获取跨度文本

时间:2018-08-22 19:27:24

标签: python selenium web-scraping automation

我要抓取的网站看起来像这样:

<div align="center" class="movietable">
    <span style="width:45px;height:47px;vertical-align:middle;display:table-cell;">
        <a href="browse.php?cat=19"><img border="0" src="styles/images/cat/hd.png" alt="HdO"></a>
    </span>
</div>
<div align="left" class="movietable">
    <span style="padding:0px 5px;width:455px;height:47px;vertical-align:middle;display:table-cell;">
        <a data-toggle="tooltip" data-placement="bottom" data-html="true" title="" href="details.php?id=578197" data-original-title="<img src='https://trasd.tmdb.org//tqistSlQGQVlvDZHweD.jpg'>">
            <b>GET THIS TEXT</b></a><br><font class="small">[Action, Horror, Sci-Fi]</font>
        </span>
    </div>

如何提取:

  1. <b>标记中的文本-在这种情况下为GET THIS TEXT
  2. font_class ='small'的内容-在这种情况下为Action, Horror, Sci-Fi
    .movi​​etable b很棒!

  3. img_scr链接-在这种情况下为https://trasd.tmdb.org//tqistSlQGQVlvDZHweD.jpg

我不知道该怎么做

3 个答案:

答案 0 :(得分:2)

以下是您可以使用的CSS选择器:

  1. override fun onBindViewHolder( holder: OperationsViewHolder, position: Int ) { var card: CardView = holder.cardView if (operations.get(position).selected!!) { holder.test.visibility = View.VISIBLE; } else { holder.test.visibility = View.GONE; } card.setOnClickListener { if (!operations.get(position).selected!!) { operations.get(position) .selected = true } else { operations.get(position) .selected = false } notifyItemChanged(position) } }
  2. notifyItemChanged
  3. driver.find_element_by_css_selector('div[align=left] b')

答案 1 :(得分:1)

您可以使用xpath访问所有这些文件:

1) [parents before this div]/div[2]/span/a/b 

2) [parents before this div]/div[2]/span/font

3) [parents before this div]/div[1]/span/a/img

[parents before this div] should be /html/body/...

答案 2 :(得分:1)

根据您共享的 HTML 提取项目,可以使用以下解决方案:

  • 获取此文本

    driver.find_element_by_xpath("//div[@class='movietable' and @align='left']/span/a[@data-toggle='tooltip' and @data-placement='bottom']/b").get_attribute("innerHTML")
    
  • [动作,恐怖,科幻小说]

    driver.find_element_by_xpath("//div[@class='movietable' and @align='left']/span//font[@class='small']").get_attribute("innerHTML")
    
  • https://trasd.tmdb.org//tqistSlQGQVlvDZHweD.jpg

    img_src = driver.find_element_by_xpath("//div[@class='movietable' and @align='left']/span/a[@data-toggle='tooltip' and @data-placement='bottom']").get_attribute("data-original-title")
    src = img_src.replace("'", "-").split("-")
    print(src[1])