Question

我不确定如何从中获得特定结果：

<div class="videoPlayer">
    <div class="border-radius-player">
        <div id="allplayers" style="position:relative;width:100%;height:100%;overflow: hidden;">
            <div id="box">
                <div id="player_content" class="todo" style="text-align: center; display: block;">
                     <div id="player" class="jwplayer jew-reset jew-skin-seven jw-state-paused jw-flag-user-inactive" tabindex="0">
                         <div class="jw-media jw-reset">
                              <video class="jw-video jw-reset" x-webkit-playsinline="" src="https:EXAMPLE-URL-HERE" preload="metadata"></video>
                         </div">

我如何在src中获得<video class="jw-video jw-reset" x-webkit-playsinline="" src="https:EXAMPLE-URL-HERE" preload="metadata"></video>

这是我到目前为止尝试过的：

import urllib.request
from bs4 import BeautifulSoup

url = "https://someurlhere"

a = urllib.request.Request(url, headers={'User-Agent' : "Cliqz"})
b = urllib.request.urlopen(a) # prevent "Permission denies"

soup = BeautifulSoup(b, 'html.parser')

for video_class in soup.select("div.videoPlayer"):
    print(video_class.text)

部分返回，但不返回video class

Answer 1

Requests是一个简单的html客户端，它无法执行javascript。

不过，您还可以尝试三种选择！

尝试遍历html源（b），查看网站中是否有任何javascript都具有您需要的数据。通常，页面会在某种类型的持有人（JavaScript代码或json对象）中包含您可以抓取的网址（我假设您要抓取）。
尝试查看该站点的XHR请求，看看是否有任何请求从外部源查询视频数据。在这种情况下，请查看是否可以模仿该请求来获取所需的数据。
（不得已而为之），您需要使用phantomjs +硒浏览器来下载网站（Link1，Link2）。您可以在以下SO帖子中找到有关如何使用硒的更多信息：https://stackoverflow.com/a/26440563/3986395

如何使用BeautifulSoup获取特定数据

1 个答案: