因此,想象一下,我们有一个网站,例如YT主页pewdiepies https://www.youtube.com/channel/UC-lHJZR3Gqxm24_Vd_AJ5Yw。我想写一个脚本,给我他的人数。我必须要用漂亮的汤吗?
我知道,它保存在
yt格式的字符串id =“ subscriber-count” class =“ style-scope ytd-c4-tabbed-header-renderer”> 84,831,541个订阅者/ yt格式的字符串>
我与web-dev无关,所以这对我来说是一堆胡言乱语。但是,必须有一种方法可以让我获得这一价值,没有美丽的汤,就不能有汤吗?
import urllib.request
import json
import webbrowser
data = urllib.request.urlopen('https://www.youtube.com/channel/UC-lHJZR3Gqxm24_Vd_AJ5Yw')
print(data)
那是我到目前为止所拥有的。
答案 0 :(得分:2)
您正在做的是抓取网页。 Google进行了快速搜索,阐明了处理方法。您要查找的代码
import requests
from lxml import html
# Retrieve the web page
data = requests.get('https://www.youtube.com/channel/UC-lHJZR3Gqxm24_Vd_AJ5Yw')
# Parse the HTML
tree = html.fromstring(data.content)
# Find the subscriber count in the HTML tree
subscriber_count = tree.xpath('//*[contains(@class,"yt-subscription-button-subscriber-count-branded-horizontal")]/text()')[0]
# Convert to integer
subscriber_count = int(subscriber_count.replace(",",""))
print(subscriber_count)
撰写本文时的结果为:“ 84851474”
如果您想了解更多信息,可以深入研究san fransico choropleth map和web scraping in Python。
答案 1 :(得分:1)
从您要尝试执行的操作来看,是获得指定通道的子计数。为此,我将使用Google Youtube API,因为它比Web抓取更快,更可靠。下面是示例代码。
1)获取API密钥并启用此库
https://console.developers.google.com/apis/library/youtube.googleapis.com
2)获取YouTube频道的频道ID,例如PewDiePie为 UC-lHJZR3Gqxm24_Vd_AJ5Yw
https://www.youtube.com/channel/<channel_id>
3)使用指定的参数对下面的URL进行GET请求
https://www.googleapis.com/youtube/v3/channels?part=statistics&id={CHANNEL_ID}&key={YOUR_API_KEY}
3b)这将返回您需要解析的JSON响应
{
"kind": "youtube#channelListResponse",
"etag": "\"XpPGQXPnxQJhLgs6enD_n8JR4Qk/MlIT59Jru-h7AvGc09RB7HQI6qA\"",
"pageInfo": {
"totalResults": 1,
"resultsPerPage": 1
},
"items": [
{
"kind": "youtube#channel",
"etag": "\"XpPGQXPnxQJhLgs6enD_n8JR4Qk/a5p-d8soZS1kVL3A3QlzHsJFa44\"",
"id": "UC-lHJZR3Gqxm24_Vd_AJ5Yw",
"statistics": {
"viewCount": "20374094982",
"commentCount": "0",
"subscriberCount": "84859110",
"hiddenSubscriberCount": false,
"videoCount": "3744"
}
}
]
}
有关获取pewdiepies频道的子计数的示例代码
import requests
url = 'https://www.googleapis.com/youtube/v3/channels?part=statistics&id=<channel_id>&key=<your_api_key>'
resp = requests.get(url=url)
data = resp.json()
sub_count = data['items'][0]['statistics']['subscriberCount']
print(sub_count)