如何确定YouTube视频数据的抓取时间?

时间:2019-06-18 16:45:26

标签: python screen-scraping

我正在跟踪此存储库https://github.com/DataSnaek/Trending-YouTube-Scraper,以在youtube上抓取热门视频。

我已经正确配置了国家代码和API密钥。但是,我想将视频持续时间添加到我的数据文件中。我搜索了Youtube API并尝试了此编码(添加了一些与contentDetails和工期有关的代码):

...

# Used to identify columns, currently hardcoded order
header = ["video_id"] + snippet_features + ["trending_date", "tags", "length", "view_count", "likes", "dislikes",
                                            "comment_count", "thumbnail_link", "comments_disabled",
                                            "ratings_disabled", "description"]

...

def api_request(page_token, country_code):
    # Builds the URL and requests the JSON from it
    request_url = f"https://www.googleapis.com/youtube/v3/videos?part=id,statistics,contentDetails,snippet{page_token}chart=mostPopular&hl=vi&regionCode={country_code}&maxResults=50&key={api_key}"
    request = requests.get(request_url)
    if request.status_code == 429:
        print("Temp-Banned due to excess requests, please wait and continue later")
        sys.exit()
    return request.json()

...
        # Snippet, statistics and contentDetails are sub-dicts of video, containing the most useful info
        snippet = video['snippet']
        statistics = video['statistics']
        contentdetails = video['contentDetails']

...
        # The following are special case features which require unique processing, or are not within the snippet dict
        description = snippet.get("description", "")
        thumbnail_link = snippet.get("thumbnails", dict()).get("default", dict()).get("url", "")
        length = contentdetails.get("duration", "")
        trending_date = time.strftime("%y.%d.%m")
        tags = get_tags(snippet.get("tags", ["[none]"]))
        view_count = statistics.get("viewCount", 0)
...
        if 'duration' in contentdetails:
            length = contentdetails['duration']
        else:
            length = "0"

        # Compiles all of the various bits of info into one consistently formatted line
        line = [video_id] + features + [prepare_feature(x) for x in
                                        [trending_date, tags, length, view_count, likes, dislikes,
                                         comment_count, thumbnail_link, comments_disabled,
                                         ratings_disabled, description]]
        lines.append(",".join(line))
    return lines

...

但是实际输出只是标题:

  

video_id,标题,publishedAt,channelId,channelTitle,categoryId,trending_date,标签,长度,view_count,喜欢,不喜欢,comment_count,缩略图链接,comments_disabled,ratings_disabled,说明

非常感谢您!

0 个答案:

没有答案