我之前已经使用scrapy来取得一些成功的craiglist,但是现在我正在尝试任意使用用户名,我不断在scrapy shell中获得一个空白数组。
用户名元素(例如xempy)包含在:
中<a class="searchPersonaName" href="https://steamcommunity.com/id/zxZEmpy">xempy</a>
我用来从上面的URL中抓取实际用户名的命令是:
response.select('//*[@id="search_results"]/div[3]/div[3]/a/text()').extract()
我试图抓的网址是
https://steamcommunity.com/search/users/#filter=users&text=xempy
我使用Chrome来复制我感兴趣的元素的xpath,而不是手工输入以确保它没有拼写错误,但即使用绝对路径手动输入全部,我也是当我尝试使用用户名&#34; xempy&#34;来获取一个简单的字符串时,仍会得到一个空白数组。
我做错了什么?我已经使用相同的流程成功地抓取了craigslist,但是在Steam的网站上它似乎没有工作,我找不到蒸汽扫描脚本的任何实际例子。
答案 0 :(得分:0)
如果您在浏览器中查看实际来源,请右键单击并选择查看源,您将看不到结果的迹象,数据通过ajax请求动态添加到 https://steamcommunity.com/search/SearchCommunityAjax
你必须模仿ajax请求,我已经使用了请求,但scrapy的步骤是相同的:
import requests
headers = {
"User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.82 Safari/537.36",
"X-Requested-With": "XMLHttpRequest"}
params = {"text": "xempy", "filter": "users", "sessionid": "", "steamid_user": "false", "page": "1"}
ajax_url = "https://steamcommunity.com/search/SearchCommunityAjax"
with requests.Session() as s:
s.headers.update()
r = s.get("https://steamcommunity.com/search/users/#filter=users&text=xempy")
# need to update the session id which we get from the previous gets headers
params["sessionid"] = next(
c.split("=", 1)[1] for c in r.headers["set-cookie"].split(";") if c.startswith("sessionid"))
# need to update the session headers
s.headers.update(r.headers)
# and also the cookies from the previous request
s.cookies.update(r.cookies)
result = (s.get(ajax_url, params=params).json())
如果我们运行代码,您可以看到我们返回了一些json:
In [5]: with requests.Session() as s:
...: s.headers.update()
...: r = s.get("https://steamcommunity.com/search/users/#filter=users&text=xempy")
...: params["sessionid"] = next(
...: c.split("=", 1)[1] for c in r.headers["set-cookie"].split(";") if c.startswith("sessionid"))
...: s.headers.update(r.headers)
...: s.cookies.update(r.cookies)
...: result = (s.get(ajax_url, params=params).json())
...: print(result)
...:
{u'html': u'\t\t<div style="float: right; padding-bottom: 2px">\r\n\t\t\t\t\t\tShowing 1 - 11 of 11\t\t\t</div>\r\n\t<div style="clear: both"></div>\r\n\t\t\t\t\t\t<div class="search_row">\r\n\t<div class="search_result_friend">\r\n\t\t\t</div>\r\n\t<div class="mediumHolder_default" data-miniprofile="16183171" style="float:left;"><div class="avatarMedium"><a href="https://steamcommunity.com/id/zxZEmpy"><img src="https://steamcdn-a.akamaihd.net/steamcommunity/public/images/avatars/b9/b9c886a08cf17c4f1f31ea19148d8b3bbd748762_medium.jpg"></a></div></div>\r\n\t<div class="searchPersonaInfo">\r\n\t\t<a class="searchPersonaName" href="https://steamcommunity.com/id/zxZEmpy">xempy</a><br />\r\n\t\t\t\t\t\t\t\t \t\t\t</div>\r\n\t<div style="clear:left"></div>\r\n\r\n\t\t\t<div class="search_match_info">\r\n\t\t\t\t\t\t\t\t\t\t<div>Custom URL: steamcommunity.com/id/<span style="color: whitesmoke">zxZEmpy</span></div>\r\n\t\t\t\t\t\t\t\t\t\t<div>\r\n\t\t\t\t\tAlso known as: <span style="color: whitesmoke">trill</span>, <span style="color: whitesmoke">[TGIF] Mario Batali</span>, <span style="color: whitesmoke">[TGIF] Mario \xdfatali</span>, <span style="color: whitesmoke">Mario \xdfatali</span>, <span style="color: whitesmoke">[TGIF\'</span>, <span style="color: whitesmoke">[TGIF] Mario \u03b2atali</span>\t\t\t\t</div>\r\n\t\t\t\t\t</div>\r\n\t\t</div>\r\n\t\t\t\t\t\t\t\t<div class="search_row">\r\n\t<div class="search_result_friend">\r\n\t\t\t</div>\r\n\t<div class="mediumHolder_default" data-miniprofile="280326130" style="float:left;"><div class="avatarMedium"><a href="https://steamcommunity.com/id/xempyjecar"><img src="https://steamcdn-a.akamaihd.net/steamcommunity/public/images/avatars/89/8928b324ba9c12859283e8be3f11f19d9232033c_medium.jpg"></a></div></div>\r\n\t<div class="searchPersonaInfo">\r\n\t\t<a class="searchPersonaName" href="https://steamcommunity.com/id/xempyjecar">Xempy -A-</a><br />\r\n\t\t\t\t\tIgor<br />\t\t\tSerbia <img style="margin-bottom:-2px" src="https://steamcommunity-a.akamaihd.net/public/images/countryflags/rs.gif" border="0" />\t\t\t</div>\r\n\t<div style="clear:left"></div>\r\n\r\n\t\t\t<div class="search_match_info">\r\n\t\t\t\t\t\t\t\t\t\t<div>Custom URL: steamcommunity.com/id/<span style="color: whitesmoke">xempyjecar</span></div>\r\n\t\t\t\t\t\t\t\t\t\t<div>\r\n\t\t\t\t\tAlso known as: <span style="color: whitesmoke">Xempy -A- NEW SEASON HYPEE</span>, <span style="color: whitesmoke">Brekija</span>, <span style="color: whitesmoke">FAIRPLAY ORGANISATION</span>, <span style="color: whitesmoke">Xempy | csgoshit.com</span>, <span style="color: whitesmoke">Xempy | csgorage.com</span>, <span style="color: whitesmoke">\u2500\u2500\u2500\u2554\u2550\u2550\u2550\u2557</span>, <span style="color: whitesmoke">XempyTheCupcake</span>\t\t\t\t</div>\r\n\t\t\t\t\t</div>\r\n\t\t</div>\r\n\t\t\t\t\t\t\t\t<div class="search_row">\r\n\t<div class="search_result_friend">\r\n\t\t\t</div>\r\n\t<div class="mediumHolder_default" data-miniprofile="315139919" style="float:left;"><div class="avatarMedium"><a href="https://steamcommunity.com/id/filipppp"><img src="https://steamcdn-a.akamaihd.net/steamcommunity/public/images/avatars/ca/caa5747851b5255a2d76699d855bf20e709af3d1_medium.jpg"></a></div></div>\r\n\t<div class="searchPersonaInfo">\r\n\t\t<a class="searchPersonaName" href="https://steamcommunity.com/id/filipppp">Xempy -A-</a><br />\r\n\t\t\t\t\tIgor<br />\t\t\tSerbia <img style="margin-bottom:-2px" src="https://steamcommunity-a.akamaihd.net/public/images/countryflags/rs.gif" border="0" />\t\t\t</div>\r\n\t<div style="clear:left"></div>\r\n\r\n\t\t\t<div class="search_match_info">\r\n\t\t\t\t\t\t\t\t\t\t<div>Custom URL: steamcommunity.com/id/<span style="color: whitesmoke">filipppp</span></div>\r\n\t\t\t\t\t\t\t\t\t\t<div>\r\n\t\t\t\t\tAlso known as: <span style="color: whitesmoke">Extreeemeeee</span>, <span style="color: whitesmoke">Ratatatatatata</span>\t\t\t\t</div>\r\n\t\t\t\t\t</div>\r\n\t\t</div>\r\n\t\t\t\t\t\t\t\t<div class="search_row">\r\n\t<div class="search_result_friend">\r\n\t\t\t</div>\r\n\t<div class="mediumHolder_default" data-miniprofile="258386073" style="float:left;"><div class="avatarMedium"><a href="https://steamcommunity.com/id/lenyagoglov"><img src="https://steamcdn-a.akamaihd.net/steamcommunity/public/images/avatars/71/71ee8d0519c74cea0352836b188c747b36224f8f_medium.jpg"></a></div></div>\r\n\t<div class="searchPersonaInfo">\r\n\t\t<a class="searchPersonaName" href="https://steamcommunity.com/id/lenyagoglov">Xempys</a><br />\r\n\t\t\t\t\tTed<br />\t\t\tLuxembourg <img style="margin-bottom:-2px" src="https://steamcommunity-a.akamaihd.net/public/images/countryflags/lu.gif" border="0" />\t\t\t</div>\r\n\t<div style="clear:left"></div>\r\n\r\n\t\t\t<div class="search_match_info">\r\n\t\t\t\t\t\t\t\t\t\t<div>Custom URL: steamcommunity.com/id/<span style="color: whitesmoke">lenyagoglov</span></div>\r\n\t\t\t\t\t\t\t\t</div>\r\n\t\t</div>\r\n\t\t\t\t\t\t\t\t<div class="search_row">\r\n\t<div class="search_result_friend">\r\n\t\t\t</div>\r\n\t<div class="mediumHolder_default" data-miniprofile="257927191" style="float:left;"><div class="avatarMedium"><a href="https://steamcommunity.com/id/rostislavtseychuk85"><img src="https://steamcdn-a.akamaihd.net/steamcommunity/public/images/avatars/86/8641de85a283f0d23d1cbeb35ee0c0d5ca87a83b_medium.jpg"></a></div></div>\r\n\t<div class="searchPersonaInfo">\r\n\t\t<a class="searchPersonaName" href="https://steamcommunity.com/id/rostislavtseychuk85">Xempys</a><br />\r\n\t\t\t\t\tGabriel<br />\t\t\tLebanon <img style="margin-bottom:-2px" src="https://steamcommunity-a.akamaihd.net/public/images/countryflags/lb.gif" border="0" />\t\t\t</div>\r\n\t<div style="clear:left"></div>\r\n\r\n\t\t\t<div class="search_match_info">\r\n\t\t\t\t\t\t\t\t\t\t<div>Custom URL: steamcommunity.com/id/<span style="color: whitesmoke">rostislavtseychuk85</span></div>\r\n\t\t\t\t\t\t\t\t</div>\r\n\t\t</div>\r\n\t\t\t\t\t\t\t\t<div class="search_row">\r\n\t<div class="search_result_friend">\r\n\t\t\t</div>\r\n\t<div class="mediumHolder_default" data-miniprofile="252811169" style="float:left;"><div class="avatarMedium"><a href="https://steamcommunity.com/id/mochulskayaa"><img src="https://steamcdn-a.akamaihd.net/steamcommunity/public/images/avatars/76/76c10b0744403468aaf8090f56ca8ddd61338925_medium.jpg"></a></div></div>\r\n\t<div class="searchPersonaInfo">\r\n\t\t<a class="searchPersonaName" href="https://steamcommunity.com/id/mochulskayaa">Xempys</a><br />\r\n\t\t\t\t\tRichard<br />\t\t\tGuatemala <img style="margin-bottom:-2px" src="https://steamcommunity-a.akamaihd.net/public/images/countryflags/gt.gif" border="0" />\t\t\t</div>\r\n\t<div style="clear:left"></div>\r\n\r\n\t\t\t<div class="search_match_info">\r\n\t\t\t\t\t\t\t\t\t\t<div>Custom URL: steamcommunity.com/id/<span style="color: whitesmoke">mochulskayaa</span></div>\r\n\t\t\t\t\t\t\t\t</div>\r\n\t\t</div>\r\n\t\t\t\t\t\t\t\t<div class="search_row">\r\n\t<div class="search_result_friend">\r\n\t\t\t</div>\r\n\t<div class="mediumHolder_default" data-miniprofile="260028611" style="float:left;"><div class="avatarMedium"><a href="https://steamcommunity.com/id/katerukhina"><img src="https://steamcdn-a.akamaihd.net/steamcommunity/public/images/avatars/24/24241e97a6caf3bd932a01ea22afc6b3d758f1a1_medium.jpg"></a></div></div>\r\n\t<div class="searchPersonaInfo">\r\n\t\t<a class="searchPersonaName" href="https://steamcommunity.com/id/katerukhina">Xempys</a><br />\r\n\t\t\t\t\tChristian<br />\t\t\tFiji <img style="margin-bottom:-2px" src="https://steamcommunity-a.akamaihd.net/public/images/countryflags/fj.gif" border="0" />\t\t\t</div>\r\n\t<div style="clear:left"></div>\r\n\r\n\t\t\t<div class="search_match_info">\r\n\t\t\t\t\t\t\t\t\t\t<div>Custom URL: steamcommunity.com/id/<span style="color: whitesmoke">katerukhina</span></div>\r\n\t\t\t\t\t\t\t\t</div>\r\n\t\t</div>\r\n\t\t\t\t\t\t\t\t<div class="search_row">\r\n\t<div class="search_result_friend">\r\n\t\t\t</div>\r\n\t<div class="mediumHolder_default" data-miniprofile="292454844" style="float:left;"><div class="avatarMedium"><a href="https://steamcommunity.com/id/purdenkos"><img src="https://steamcdn-a.akamaihd.net/steamcommunity/public/images/avatars/5c/5c7f9d1b71a68ab8599ae0fe8f2c4e0445348eaa_medium.jpg"></a></div></div>\r\n\t<div class="searchPersonaInfo">\r\n\t\t<a class="searchPersonaName" href="https://steamcommunity.com/id/purdenkos">Xempys</a><br />\r\n\t\t\t\t\tPatrik<br />\t\t\tCote D\'ivoire (Ivory Coast) <img style="margin-bottom:-2px" src="https://steamcommunity-a.akamaihd.net/public/images/countryflags/ci.gif" border="0" />\t\t\t</div>\r\n\t<div style="clear:left"></div>\r\n\r\n\t\t\t<div class="search_match_info">\r\n\t\t\t\t\t\t\t\t\t\t<div>Custom URL: steamcommunity.com/id/<span style="color: whitesmoke">purdenkos</span></div>\r\n\t\t\t\t\t\t\t\t</div>\r\n\t\t</div>\r\n\t\t\t\t\t\t\t\t<div class="search_row">\r\n\t<div class="search_result_friend">\r\n\t\t\t</div>\r\n\t<div class="mediumHolder_default" data-miniprofile="56000172" style="float:left;"><div class="avatarMedium"><a href="https://steamcommunity.com/id/v2incent"><img src="https://steamcdn-a.akamaihd.net/steamcommunity/public/images/avatars/ac/ac45a256e0a14712efff255db0105fedd80a4f0e_medium.jpg"></a></div></div>\r\n\t<div class="searchPersonaInfo">\r\n\t\t<a class="searchPersonaName" href="https://steamcommunity.com/id/v2incent">Ext4ze ` ^0| \'Xempy^0\'</a><br />\r\n\t\t\t\t\tv2incent<br />\t\t\t \t\t\t</div>\r\n\t<div style="clear:left"></div>\r\n\r\n\t\t\t<div class="search_match_info">\r\n\t\t\t\t\t\t\t\t\t\t<div>Custom URL: steamcommunity.com/id/<span style="color: whitesmoke">v2incent</span></div>\r\n\t\t\t\t\t\t\t\t</div>\r\n\t\t</div>\r\n\t\t\t\t\t\t\t\t<div class="search_row">\r\n\t<div class="search_result_friend">\r\n\t\t\t</div>\r\n\t<div class="mediumHolder_default" data-miniprofile="297670812" style="float:left;"><div class="avatarMedium"><a href="https://steamcommunity.com/id/xempy"><img src="https://steamcdn-a.akamaihd.net/steamcommunity/public/images/avatars/62/62ea583f7f838562c73cb70e3993e27acd583aef_medium.jpg"></a></div></div>\r\n\t<div class="searchPersonaInfo">\r\n\t\t<a class="searchPersonaName" href="https://steamcommunity.com/id/xempy">xempsanity `\xb4</a><br />\r\n\t\t\t\t\tIgor<br />\t\t\tSerbia <img style="margin-bottom:-2px" src="https://steamcommunity-a.akamaihd.net/public/images/countryflags/rs.gif" border="0" />\t\t\t</div>\r\n\t<div style="clear:left"></div>\r\n\r\n\t\t\t<div class="search_match_info">\r\n\t\t\t\t\t\t\t\t\t\t<div>Custom URL: steamcommunity.com/id/<span style="color: whitesmoke">xempy</span></div>\r\n\t\t\t\t\t\t\t\t\t\t<div>\r\n\t\t\t\t\tAlso known as: <span style="color: whitesmoke">XEMPYKiNGOFNOTHiNG</span>, <span style="color: whitesmoke">X3MPY</span>, <span style="color: whitesmoke">X3MPY * brother\'s on acc</span>\t\t\t\t</div>\r\n\t\t\t\t\t</div>\r\n\t\t</div>\r\n\t\t\t\t\t\t\t\t<div class="search_row">\r\n\t<div class="search_result_friend">\r\n\t\t\t</div>\r\n\t<div class="mediumHolder_default" data-miniprofile="121633219" style="float:left;"><div class="avatarMedium"><a href="https://steamcommunity.com/id/Empyrk"><img src="https://steamcdn-a.akamaihd.net/steamcommunity/public/images/avatars/6b/6b87d7a04bf211a2665b828436ad34e549f2b193_medium.jpg"></a></div></div>\r\n\t<div class="searchPersonaInfo">\r\n\t\t<a class="searchPersonaName" href="https://steamcommunity.com/id/Empyrk">Empyrk</a><br />\r\n\t\t\t\t\tMatteo<br />\t\t\tToscana, Italy <img style="margin-bottom:-2px" src="https://steamcommunity-a.akamaihd.net/public/images/countryflags/it.gif" border="0" />\t\t\t</div>\r\n\t<div style="clear:left"></div>\r\n\r\n\t\t\t<div class="search_match_info">\r\n\t\t\t\t\t\t\t\t\t\t<div>Custom URL: steamcommunity.com/id/<span style="color: whitesmoke">Empyrk</span></div>\r\n\t\t\t\t\t\t\t\t</div>\r\n\t\t</div>\r\n\t\t\t\t<div style="clear: both"></div>\r\n\t\t<div style="float: right; padding-bottom: 2px">\r\n\t\t\t\t\t\tShowing 1 - 11 of 11\t\t\t</div>\r\n\t<div style="clear: both"></div>\r\n\r\n\r\n', u'search_filter': u'users', u'search_text': u'xempy', u'success': 1, u'search_page': 1}
您只需访问results["html"]
即可获取来源。