Question

我正在尝试抓取Youtube来检索有关一组用户的信息（大约200人）。我有兴趣寻找用户之间的关系：

联系人
订户
订阅
他们评论了哪些视频
等

我已设法通过以下来源获取联系信息：

import gdata.youtube
import gdata.youtube.service
from gdata.service import RequestError
from pub_author import KEY, NAME_REGEX
def get_details(name):
    yt_service = gdata.youtube.service.YouTubeService()
    yt_service.developer_key = KEY
    contact_feed = yt_service.GetYouTubeContactFeed(username=name)
    contacts = [ e.title.text for e in contact_feed.entry ]
    return contacts

我似乎无法获得我需要的其他信息。 reference guide表示我可以从http://gdata.youtube.com/feeds/api/users/username/subscriptions?v=2获取XML Feed（对于某些任意用户）。但是，如果我尝试获取其他用户的订阅，我会收到403错误，并显示以下消息：

用户必须登录才能访问这些订阅。

如果我使用gdata API：

sub_feed = yt_service.GetYouTubeSubscriptionFeed(username=name)
sub = [ e.title.text for e in contact_feed.entry ]

然后我得到同样的错误。

如何在不登录的情况下获取这些订阅？应该可以，因为您无需登录Youtube网站即可访问此信息。

此外，似乎没有特定用户的订阅者的订阅源。这些信息是否可以通过API获得？

修改

因此，似乎无法通过API完成此操作。我必须以快速而肮脏的方式做到这一点：

for f in `cat users.txt`; do wget "www.youtube.com/profile?user=$f&view=subscriptions" --output-document subscriptions/$f.html; done

然后使用此脚本从下载的HTML文件中获取用户名：

"""Extract usernames from a Youtube profile using regex"""
import re
def main():
    import sys
    lines = open(sys.argv[1]).read().split('\n')
    #
    # The html files has two <a href="..."> tags for each user: once for an 
    # image thumbnail, and once for a text link.
    # 
    users = set()
    for l in lines:
        match = re.search('<a href="/user/(?P<name>[^"]+)" onmousedown', l)
        if match:
            users.add(match.group('name'))
    users = list(users)
    users.sort()
    print users
if __name__ == '__main__':
    main()

Answer 1

要在用户未登录的情况下访问用户的订阅Feed，用户必须选中Account Sharing settings下的“订阅频道”复选框。

目前，没有直接通过gdata API获取频道订阅者的方法。事实上，有一个突出的功能要求，它已经开放超过3年！请参阅Retrieving a list of a user's subscribers?。

抓取youtube用户信息

1 个答案: