Python扩展了YouTube上传

时间:2016-04-01 23:44:01

标签: python ajax youtube

所以,我正在开发一个小程序,它将自动检查并从一堆特定的YouTube频道下载新音乐。我目前正致力于获取每个频道所有上传视频的链接,我这样做就像刮刀一样。 (是的,YouTube API可能是正确的方法,但我还不知道如何正确使用它。)

from __future__ import unicode_literals
from bs4 import BeautifulSoup
import urllib.request

ytlink = 'https://www.youtube.com/channel/UCUvoulvwzCnUVk7yoduI_Gw/videos'
r = urllib.request.urlopen(ytlink).read()
soup = BeautifulSoup(r, "html.parser")
links = soup.find_all('a', {"class": "yt-uix-sessionlink yt-uix-tile-link spf-link  yt-ui-ellipsis yt-ui-ellipsis-2"})

for tag in links:
    link = tag.get('href', None)
    if link is not None:
        print(link)

这就是我目前所拥有的,问题是,它目前只抓取前30个视频链接,因为这些是屏幕上唯一的视频链接。我已经看到,当按下"Load More"按钮时,它会执行一些由某些JavaScript启动的Ajax。我的问题是:如果所有上传都可见,我怎样才能让Python继续触发"Load More"按钮?

1 个答案:

答案 0 :(得分:1)

您可以轻松模仿ajax调用并解析返回的json输出,我们只需要提取/browse_ajax?action_continuation=... url并继续请​​求,直到它不再返回json:

from bs4 import BeautifulSoup
import requests
from urlparse import urljoin # python 3 -> from urllib.parse import urljoin


def get_links():
    # cretate all css selectors
    ytlink = 'https://www.youtube.com/channel/UCUvoulvwzCnUVk7yoduI_Gw/videos'
    ajax_css = "button[data-uix-load-more-href]"
    link_css = "a.yt-uix-sessionlink.yt-uix-tile-link.spf-link.yt-ui-ellipsis.yt-ui-ellipsis-2"
    base = "https://www.youtube.com/"


    r = requests.get(ytlink).content
    soup = BeautifulSoup(r, "lxml")

    # yield first visible links
    for link in soup.select(link_css):
        yield urljoin(base, link["href"])

    # Load more button
    ajax = soup.select(ajax_css)[0]["data-uix-load-more-href"]

    while True:
        print(ajax)
        r = requests.get(urljoin('https://www.youtube.com/', ajax))

        # next html is stored in the json.values()
        soup = BeautifulSoup("".join(r.json().values()), "lxml")
        for link in soup.select(link_css):
            yield urljoin(base, link["href"])

        ajax = soup.select(ajax_css)
        # if empty "Load more" button would be gone
        if not ajax:
            break
        ajax = ajax[0]["data-uix-load-more-href"]

这将为您提供所有87个链接。

In [26]: links = list(get_links())
/browse_ajax?action_continuation=1&continuation=4qmFsgJAEhhVQ1V2b3Vsdnd6Q25VVms3eW9kdUlfR3caJEVnWjJhV1JsYjNNZ0FEZ0JZQUZxQUhvQk1yZ0JBQSUzRCUzRA%253D%253D
/browse_ajax?action_continuation=1&continuation=4qmFsgJAEhhVQ1V2b3Vsdnd6Q25VVms3eW9kdUlfR3caJEVnWjJhV1JsYjNNZ0FEZ0JZQUZxQUhvQk03Z0JBQSUzRCUzRA%253D%253D

In [27]: len(links)
Out[27]: 87

In [28]: print(links)
['https://www.youtube.com/watch?v=kjmzIu4VJEY', 'https://www.youtube.com/watch?v=ecRpNV8Xob8', 'https://www.youtube.com/watch?v=mdHoaoAhnMo', 'https://www.youtube.com/watch?v=3oqBKEvdrqE', 'https://www.youtube.com/watch?v=VIbvfOd34-A', 'https://www.youtube.com/watch?v=x4G8ge1VO5s', 'https://www.youtube.com/watch?v=EkW0f2iUOCc', 'https://www.youtube.com/watch?v=Ex2NIeXfYl8', 'https://www.youtube.com/watch?v=XMd4pSX-aVs', 'https://www.youtube.com/watch?v=ZS7KjUjlLWA', 'https://www.youtube.com/watch?v=ZEq9sQJLOgg', 'https://www.youtube.com/watch?v=nSgaCowC5TY', 'https://www.youtube.com/watch?v=nV5Ive_zJT4', 'https://www.youtube.com/watch?v=snThWzMroaA', 'https://www.youtube.com/watch?v=Ud6YhBCucPg', 'https://www.youtube.com/watch?v=1nSfyivyxdg', 'https://www.youtube.com/watch?v=b7hf2wqpUY4', 'https://www.youtube.com/watch?v=cVBvxkVt9wc', 'https://www.youtube.com/watch?v=pcI25yU9yso', 'https://www.youtube.com/watch?v=EMIZZS8HY8A', 'https://www.youtube.com/watch?v=xWD3Zi23rIs', 'https://www.youtube.com/watch?v=M-IbllcTi64', 'https://www.youtube.com/watch?v=U_tW_UxG8bM', 'https://www.youtube.com/watch?v=vQd0mopVnQg', 'https://www.youtube.com/watch?v=mG8NJlsg4rI', 'https://www.youtube.com/watch?v=PsaNY6xpnKY', 'https://www.youtube.com/watch?v=839h3eZMSWA', 'https://www.youtube.com/watch?v=Q_yytPtWmP0', 'https://www.youtube.com/watch?v=oGESQfB9dYM', 'https://www.youtube.com/watch?v=mO5R-1uTJhg', 'https://www.youtube.com/watch?v=wgqLck9SFOc', 'https://www.youtube.com/watch?v=GCaFEsxd-Y8', 'https://www.youtube.com/watch?v=VlpMbnOqP20', 'https://www.youtube.com/watch?v=bj1QT5bxFlA', 'https://www.youtube.com/watch?v=SMtKCu6a7gQ', 'https://www.youtube.com/watch?v=RV6x33mf4WI', 'https://www.youtube.com/watch?v=WhlXuTtmNqE', 'https://www.youtube.com/watch?v=7TWN1G5e-tg', 'https://www.youtube.com/watch?v=jgjeYTkROyk', 'https://www.youtube.com/watch?v=0hFkFoOf-aA', 'https://www.youtube.com/watch?v=yH1u_KQapfw', 'https://www.youtube.com/watch?v=5-l-FGDsbjw', 'https://www.youtube.com/watch?v=sFSgyE64Jjw', 'https://www.youtube.com/watch?v=OhDBtfvv2BM', 'https://www.youtube.com/watch?v=uFgPFi04oTo', 'https://www.youtube.com/watch?v=58a45EfYv1g', 'https://www.youtube.com/watch?v=jtYl5TbK2nc', 'https://www.youtube.com/watch?v=TI-1qxoDRnw', 'https://www.youtube.com/watch?v=Q0M90HqibHI', 'https://www.youtube.com/watch?v=Llb19v7QiXU', 'https://www.youtube.com/watch?v=sqhL_Ms6vuY', 'https://www.youtube.com/watch?v=YFFRgAjXs1Y', 'https://www.youtube.com/watch?v=8eHFG5AACHI', 'https://www.youtube.com/watch?v=_eVOx8Sw9Jg', 'https://www.youtube.com/watch?v=9s_XvG3M-UI', 'https://www.youtube.com/watch?v=lzdO01_tKFo', 'https://www.youtube.com/watch?v=uA2KkxfSW_U', 'https://www.youtube.com/watch?v=29Lt1LQtp5k', 'https://www.youtube.com/watch?v=nfJ9p5iJGz8', 'https://www.youtube.com/watch?v=cjMHd1xVlS0', 'https://www.youtube.com/watch?v=tkZ0FISTxkk', 'https://www.youtube.com/watch?v=bkhD8kYi4MI', 'https://www.youtube.com/watch?v=_bQajpTnOrY', 'https://www.youtube.com/watch?v=XglzEbcjP8c', 'https://www.youtube.com/watch?v=KBszbh6Qwag', 'https://www.youtube.com/watch?v=rVGWndVjCYg', 'https://www.youtube.com/watch?v=AgJxj2cUoyQ', 'https://www.youtube.com/watch?v=TaEVwakp_rI', 'https://www.youtube.com/watch?v=-YnpS-IaYCw', 'https://www.youtube.com/watch?v=sEFSFU2a9CY', 'https://www.youtube.com/watch?v=Jc2aVD4pwnk', 'https://www.youtube.com/watch?v=aY1dOJEv4j4', 'https://www.youtube.com/watch?v=bwjXt2pWoBE', 'https://www.youtube.com/watch?v=Dqn26tWxNsI', 'https://www.youtube.com/watch?v=wiv6JqGhcCU', 'https://www.youtube.com/watch?v=IFi47HLPqoM', 'https://www.youtube.com/watch?v=N1zdWugNdy0', 'https://www.youtube.com/watch?v=ngOBscDs3T4', 'https://www.youtube.com/watch?v=RT5dQVZ-VQY', 'https://www.youtube.com/watch?v=bifExgZW7k0', 'https://www.youtube.com/watch?v=fBEbaEgox1Y', 'https://www.youtube.com/watch?v=wDy9aGFngkY', 'https://www.youtube.com/watch?v=i06Iv0k5fVY', 'https://www.youtube.com/watch?v=2NaRXV7uyPE', 'https://www.youtube.com/watch?v=Hl0nIoLJUU0', 'https://www.youtube.com/watch?v=iXo0T4dRdgA', 'https://www.youtube.com/watch?v=i-7H5Wq0_2Y']

我离开了print(ajax)来电,这样您就可以看到它的变化。

您可以将seleniumPhantomJs结合使用,如下所示:

from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException,StaleElementReferenceException


ytlink = 'https://www.youtube.com/channel/UCUvoulvwzCnUVk7yoduI_Gw/videos'
hrefs = "a.yt-uix-sessionlink.yt-uix-tile-link.spf-link.yt-ui-ellipsis.yt-ui-ellipsis-2"
ajax= "button[data-uix-load-more-href]"
dr = webdriver.PhantomJS()
dr.get(ytlink)

while True:
    try:
        load_mode_b = dr.find_element_by_css_selector(ajax)
        load_mode_b.click()
    except StaleElementReferenceException as e:
        print(e)
    except NoSuchElementException as e:
        print(e)
        break

如果我们运行,我们会看到完全相同的输出:

In [32]: l = [a.get_attribute("href") for a in dr.find_elements_by_css_selector(hrefs)]

In [33]: len(l)
Out[33]: 87

In [34]: print(l)
[u'https://www.youtube.com/watch?v=kjmzIu4VJEY', u'https://www.youtube.com/watch?v=ecRpNV8Xob8', u'https://www.youtube.com/watch?v=mdHoaoAhnMo', u'https://www.youtube.com/watch?v=3oqBKEvdrqE', u'https://www.youtube.com/watch?v=VIbvfOd34-A', u'https://www.youtube.com/watch?v=x4G8ge1VO5s', u'https://www.youtube.com/watch?v=EkW0f2iUOCc', u'https://www.youtube.com/watch?v=Ex2NIeXfYl8', u'https://www.youtube.com/watch?v=XMd4pSX-aVs', u'https://www.youtube.com/watch?v=ZS7KjUjlLWA', u'https://www.youtube.com/watch?v=ZEq9sQJLOgg', u'https://www.youtube.com/watch?v=nSgaCowC5TY', u'https://www.youtube.com/watch?v=nV5Ive_zJT4', u'https://www.youtube.com/watch?v=snThWzMroaA', u'https://www.youtube.com/watch?v=Ud6YhBCucPg', u'https://www.youtube.com/watch?v=1nSfyivyxdg', u'https://www.youtube.com/watch?v=b7hf2wqpUY4', u'https://www.youtube.com/watch?v=cVBvxkVt9wc', u'https://www.youtube.com/watch?v=pcI25yU9yso', u'https://www.youtube.com/watch?v=EMIZZS8HY8A', u'https://www.youtube.com/watch?v=xWD3Zi23rIs', u'https://www.youtube.com/watch?v=M-IbllcTi64', u'https://www.youtube.com/watch?v=U_tW_UxG8bM', u'https://www.youtube.com/watch?v=vQd0mopVnQg', u'https://www.youtube.com/watch?v=mG8NJlsg4rI', u'https://www.youtube.com/watch?v=PsaNY6xpnKY', u'https://www.youtube.com/watch?v=839h3eZMSWA', u'https://www.youtube.com/watch?v=Q_yytPtWmP0', u'https://www.youtube.com/watch?v=oGESQfB9dYM', u'https://www.youtube.com/watch?v=mO5R-1uTJhg', u'https://www.youtube.com/watch?v=wgqLck9SFOc', u'https://www.youtube.com/watch?v=GCaFEsxd-Y8', u'https://www.youtube.com/watch?v=VlpMbnOqP20', u'https://www.youtube.com/watch?v=bj1QT5bxFlA', u'https://www.youtube.com/watch?v=SMtKCu6a7gQ', u'https://www.youtube.com/watch?v=RV6x33mf4WI', u'https://www.youtube.com/watch?v=WhlXuTtmNqE', u'https://www.youtube.com/watch?v=7TWN1G5e-tg', u'https://www.youtube.com/watch?v=jgjeYTkROyk', u'https://www.youtube.com/watch?v=0hFkFoOf-aA', u'https://www.youtube.com/watch?v=yH1u_KQapfw', u'https://www.youtube.com/watch?v=5-l-FGDsbjw', u'https://www.youtube.com/watch?v=sFSgyE64Jjw', u'https://www.youtube.com/watch?v=OhDBtfvv2BM', u'https://www.youtube.com/watch?v=uFgPFi04oTo', u'https://www.youtube.com/watch?v=58a45EfYv1g', u'https://www.youtube.com/watch?v=jtYl5TbK2nc', u'https://www.youtube.com/watch?v=TI-1qxoDRnw', u'https://www.youtube.com/watch?v=Q0M90HqibHI', u'https://www.youtube.com/watch?v=Llb19v7QiXU', u'https://www.youtube.com/watch?v=sqhL_Ms6vuY', u'https://www.youtube.com/watch?v=YFFRgAjXs1Y', u'https://www.youtube.com/watch?v=8eHFG5AACHI', u'https://www.youtube.com/watch?v=_eVOx8Sw9Jg', u'https://www.youtube.com/watch?v=9s_XvG3M-UI', u'https://www.youtube.com/watch?v=lzdO01_tKFo', u'https://www.youtube.com/watch?v=uA2KkxfSW_U', u'https://www.youtube.com/watch?v=29Lt1LQtp5k', u'https://www.youtube.com/watch?v=nfJ9p5iJGz8', u'https://www.youtube.com/watch?v=cjMHd1xVlS0', u'https://www.youtube.com/watch?v=tkZ0FISTxkk', u'https://www.youtube.com/watch?v=bkhD8kYi4MI', u'https://www.youtube.com/watch?v=_bQajpTnOrY', u'https://www.youtube.com/watch?v=XglzEbcjP8c', u'https://www.youtube.com/watch?v=KBszbh6Qwag', u'https://www.youtube.com/watch?v=rVGWndVjCYg', u'https://www.youtube.com/watch?v=AgJxj2cUoyQ', u'https://www.youtube.com/watch?v=TaEVwakp_rI', u'https://www.youtube.com/watch?v=-YnpS-IaYCw', u'https://www.youtube.com/watch?v=sEFSFU2a9CY', u'https://www.youtube.com/watch?v=Jc2aVD4pwnk', u'https://www.youtube.com/watch?v=aY1dOJEv4j4', u'https://www.youtube.com/watch?v=bwjXt2pWoBE', u'https://www.youtube.com/watch?v=Dqn26tWxNsI', u'https://www.youtube.com/watch?v=wiv6JqGhcCU', u'https://www.youtube.com/watch?v=IFi47HLPqoM', u'https://www.youtube.com/watch?v=N1zdWugNdy0', u'https://www.youtube.com/watch?v=ngOBscDs3T4', u'https://www.youtube.com/watch?v=RT5dQVZ-VQY', u'https://www.youtube.com/watch?v=bifExgZW7k0', u'https://www.youtube.com/watch?v=fBEbaEgox1Y', u'https://www.youtube.com/watch?v=wDy9aGFngkY', u'https://www.youtube.com/watch?v=i06Iv0k5fVY', u'https://www.youtube.com/watch?v=2NaRXV7uyPE', u'https://www.youtube.com/watch?v=Hl0nIoLJUU0', u'https://www.youtube.com/watch?v=iXo0T4dRdgA', u'https://www.youtube.com/watch?v=i-7H5Wq0_2Y']