我正在尝试通过他们的个人资料页面获取Facebook用户的个人资料图片(如果这些图片可以在他们的公开个人资料中找到)
我很难通过漂亮的汤来获得它。
目前我正在使用以下代码查找图片链接的位置:
from urllib import urlopen
import mechanize
from bs4 import BeautifulSoup
br = mechanize.Browser()
br.set_handle_robots(False)
page_open = br.open("https://www.facebook.com/zuck")
x= soup.find(id="u_0_6") #change sometime with "u_0_5"
strx = str(x)
strx[2469:2690] #really bad choice
从最后一行开始,只有在前一个代码没有改变且永远不会发生的情况下,我才能提取网址。 如何获取数据
"https://fbcdn-profile-a.akamaihd.net/hprofile-ak-xfa1/v/t1.0-1/c14.4.153.153/1939620_10101266232851011_437577509_n.jpg?oh=014037065b8baa2346444c66b16ddc25&oe=5547259F&__gda__=1429976114_ffd73e14776a391219e64a1ce6a4d1fb"
位于html的这一部分:
<code class="hidden_elem" id="u_0_6"><!-- <div class="timelineLoggedOutSignUp"><div class="_5h60" id="pagelet_loggedout_sign_up" data-referrer="pagelet_loggedout_sign_up"></div></div><div class="fbTimelineTopSectionBase _6-d _529n"><div class="_5h60" id="pagelet_above_header_timeline" data-referrer="pagelet_above_header_timeline"></div><div id="above_header_timeline_placeholder"></div><div class="fbTimelineSection mtm fbTimelineTopSection fbTimelineLoggedOutTopSection"><div id="fbProfileCover"><div class="cover" id="u_0_2"><a class="coverWrap coverImage" href="https://www.facebook.com/photo.php?fbid=10101026493146301&set=a.941146602501.2418915.4&type=1" rel="theater" ajaxify="https://www.facebook.com/photo.php?fbid=10101026493146301&set=a.941146602501.2418915.4&type=1&src=https%3A%2F%2Ffbcdn-sphotos-a-a.akamaihd.net%2Fhphotos-ak-frc3%2Ft31.0-8%2F1275272_10101026493146301_791186452_o.jpg&smallsrc=https%3A%2F%2Ffbcdn-sphotos-a-a.akamaihd.net%2Fhphotos-ak-xap1%2Fv%2Ft1.0-9%2F1186268_10101026493146301_791186452_n.jpg%3Foh%3Dfc0981d4a65c2e984cf5c43fdc1bcc88%26oe%3D55072936%26__gda__%3D1430325870_8783e46096a8a5456fc0e745fb89f303&size=1434%2C717&source=10&player_origin=profile" title="Photo de couverture" id="fbCoverImageContainer" data-cropped="1"><img class="coverPhotoImg photo img" src="https://fbcdn-sphotos-a-a.akamaihd.net/hphotos-ak-frc3/t31.0-8/q83/c0.93.1434.531/s851x315/1275272_10101026493146301_791186452_o.jpg" style="top:0px;width:100%" data-fbid="10101026493146301" alt="Photo de couverture" /><div class="coverBorder"></div><img class="coverChangeThrobber img" src="https://fbstatic-a.akamaihd.net/rsrc.php/v2/yk/r/LOOn0JtHNzb.gif" alt="" width="16" height="16" /></a></div><div id="fbTimelineHeadline" class="clearfix"><div class="actions"><div class="_5h60 actionsDropdown" id="pagelet_timeline_profile_actions" data-referrer="pagelet_timeline_profile_actions"></div></div><div class="name"><div class="photoContainer"><div><div class="profilePicThumb"><img class="profilePic img" alt="Mark Zuckerberg" src="https://fbcdn-profile-a.akamaihd.net/hprofile-ak-xfa1/v/t1.0-1/c14.4.153.153/1939620_10101266232851011_437577509_n.jpg?oh=014037065b8baa2346444c66b16ddc25&oe=5547259F&__gda__=1429976114_ffd73e14776a391219e64a1ce6a4d1fb" /></div></div><meta itemprop="image" content="https://fbcdn-profile-a.akamaihd.net/hprofile-ak-xfa1/v/t1.0-1/c14.4.153.153/s50x50/1939620_10101266232851011_437577509_n.jpg?oh=6b6cd8460210e1de160cf8a6056df416&oe=550D5F6C&__gda__=1429858477_b29a956770b6173d71cb28eb35fa99e6" /></div><h2 itemprop="name">Mark Zuckerberg<span data-hover="tooltip" data-tooltip-position="right" class="_56_f _5dzy _5d-1" id="u_0_4"></span></h2></div></div></div></div></div><div class="timelineLoggedOutPagelet"><div class="clearfix"><div class="timelineLoggedOutMain lfloat _ohe"><div class="_5h60 allFavorites" id="pagelet_all_favorites" data-referrer="pagelet_all_favorites"></div></div><div class="timelineLoggedOutRight rfloat _ohf"><div class="fbTimelineSection mtm fbTimelineCompactSection"><div class="_5h60" id="pagelet_search" data-referrer="pagelet_search"></div></div><div class="_5h60" id="pagelet_people_same_name" data-referrer="pagelet_people_same_name"></div><div class="_5h60" id="pagelet_contact" data-referrer="pagelet_contact"></div></div></div></div> --></code>
答案 0 :(得分:1)
或者不是刮Facebook,你可以通过他们的图形API以正确的方式做到这一点;)
import requests
url = "http://graph.facebook.com/{}".format("zuck")
params = { "fields": "picture" }
response = requests.get(url, params=params).json()
picture_url = response['picture']['data']['url']
print(picture_url)
# output:
# https://fbcdn-profile-a.akamaihd.net/hprofile-ak-xfa1/v/t1.0-1/c14.4.153.153/s50x50/1939620_10101266232851011_437577509_n.jpg?oh=6b6cd8460210e1de160cf8a6056df416&oe=550D5F6C&__gda__=1429858477_b29a956770b6173d71cb28eb35fa99e6
说明:个人资料图片网址是一个公共字段 - 您可以在不进行身份验证的情况下访问它。
<强>优点:强>
requests
比url lib
玩Facebook图表api:https://developers.facebook.com/tools/explorer
答案 1 :(得分:0)
我不确定这是多么可靠,因为<code><!-- <div...
对我来说看起来很奇怪,因为我对HTML知之甚少,但这段代码应该有效:
element= soup.find(id="u_0_6")
soup= BeautifulSoup(element.string)
image= soup.find('img', attrs={'class': ['profilePic']})
print image