Scrapy shell返回空白阵列与蒸汽网站?

时间:2016-08-08 20:24:59

标签: python scrapy

我之前已经使用scrapy来取得一些成功的craiglist,但是现在我正在尝试任意使用用户名,我不断在scrapy shell中获得一个空白数组。

用户名元素(例如xempy)包含在:

<a class="searchPersonaName" href="https://steamcommunity.com/id/zxZEmpy">xempy</a>

我用来从上面的URL中抓取实际用户名的命令是:

response.select('//*[@id="search_results"]/div[3]/div[3]/a/text()').extract()

我试图抓的网址是

https://steamcommunity.com/search/users/#filter=users&text=xempy 

我使用Chrome来复制我感兴趣的元素的xpath,而不是手工输入以确保它没有拼写错误,但即使用绝对路径手动输入全部,我也是当我尝试使用用户名&#34; xempy&#34;来获取一个简单的字符串时,仍会得到一个空白数组。

我做错了什么?我已经使用相同的流程成功地抓取了craigslist,但是在Steam的网站上它似乎没有工作,我找不到蒸汽扫描脚本的任何实际例子。

1 个答案:

答案 0 :(得分:0)

如果您在浏览器中查看实际来源,请右键单击并选择查看源,您将看不到结果的迹象,数据通过ajax请求动态添加到 https://steamcommunity.com/search/SearchCommunityAjax

你必须模仿ajax请求,我已经使用了请求,但scrapy的步骤是相同的​​:

import requests

headers = {
    "User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.82 Safari/537.36",
    "X-Requested-With": "XMLHttpRequest"}
params = {"text": "xempy", "filter": "users", "sessionid": "", "steamid_user": "false", "page": "1"}
ajax_url = "https://steamcommunity.com/search/SearchCommunityAjax"
with requests.Session() as s:
    s.headers.update()
    r = s.get("https://steamcommunity.com/search/users/#filter=users&text=xempy")
    # need to update the session id which we get from the previous gets headers
    params["sessionid"] = next(
        c.split("=", 1)[1] for c in r.headers["set-cookie"].split(";") if c.startswith("sessionid"))
    # need to update the session headers
    s.headers.update(r.headers)
    # and also the cookies from the previous request
    s.cookies.update(r.cookies)
    result = (s.get(ajax_url, params=params).json())

如果我们运行代码,您可以看到我们返回了一些json:

In [5]: with requests.Session() as s:
   ...:         s.headers.update()
   ...:         r = s.get("https://steamcommunity.com/search/users/#filter=users&text=xempy")
   ...:         params["sessionid"] = next(
   ...:             c.split("=", 1)[1] for c in r.headers["set-cookie"].split(";") if c.startswith("sessionid"))
   ...:         s.headers.update(r.headers)
   ...:         s.cookies.update(r.cookies)
   ...:         result = (s.get(ajax_url, params=params).json())
   ...:         print(result)
   ...:     
{u'html': u'\t\t<div style="float: right; padding-bottom: 2px">\r\n\t\t\t\t\t\tShowing 1 - 11 of 11\t\t\t</div>\r\n\t<div style="clear: both"></div>\r\n\t\t\t\t\t\t<div class="search_row">\r\n\t<div class="search_result_friend">\r\n\t\t\t</div>\r\n\t<div class="mediumHolder_default" data-miniprofile="16183171" style="float:left;"><div class="avatarMedium"><a href="https://steamcommunity.com/id/zxZEmpy"><img src="https://steamcdn-a.akamaihd.net/steamcommunity/public/images/avatars/b9/b9c886a08cf17c4f1f31ea19148d8b3bbd748762_medium.jpg"></a></div></div>\r\n\t<div class="searchPersonaInfo">\r\n\t\t<a class="searchPersonaName" href="https://steamcommunity.com/id/zxZEmpy">xempy</a><br />\r\n\t\t\t\t\t\t\t\t&nbsp;\t\t\t</div>\r\n\t<div style="clear:left"></div>\r\n\r\n\t\t\t<div class="search_match_info">\r\n\t\t\t\t\t\t\t\t\t\t<div>Custom URL: steamcommunity.com/id/<span style="color: whitesmoke">zxZEmpy</span></div>\r\n\t\t\t\t\t\t\t\t\t\t<div>\r\n\t\t\t\t\tAlso known as: <span style="color: whitesmoke">trill</span>, <span style="color: whitesmoke">[TGIF] Mario Batali</span>, <span style="color: whitesmoke">[TGIF] Mario \xdfatali</span>, <span style="color: whitesmoke">Mario \xdfatali</span>, <span style="color: whitesmoke">[TGIF\'</span>, <span style="color: whitesmoke">[TGIF] Mario \u03b2atali</span>\t\t\t\t</div>\r\n\t\t\t\t\t</div>\r\n\t\t</div>\r\n\t\t\t\t\t\t\t\t<div class="search_row">\r\n\t<div class="search_result_friend">\r\n\t\t\t</div>\r\n\t<div class="mediumHolder_default" data-miniprofile="280326130" style="float:left;"><div class="avatarMedium"><a href="https://steamcommunity.com/id/xempyjecar"><img src="https://steamcdn-a.akamaihd.net/steamcommunity/public/images/avatars/89/8928b324ba9c12859283e8be3f11f19d9232033c_medium.jpg"></a></div></div>\r\n\t<div class="searchPersonaInfo">\r\n\t\t<a class="searchPersonaName" href="https://steamcommunity.com/id/xempyjecar">Xempy -A-</a><br />\r\n\t\t\t\t\tIgor<br />\t\t\tSerbia&nbsp;<img style="margin-bottom:-2px" src="https://steamcommunity-a.akamaihd.net/public/images/countryflags/rs.gif" border="0" />\t\t\t</div>\r\n\t<div style="clear:left"></div>\r\n\r\n\t\t\t<div class="search_match_info">\r\n\t\t\t\t\t\t\t\t\t\t<div>Custom URL: steamcommunity.com/id/<span style="color: whitesmoke">xempyjecar</span></div>\r\n\t\t\t\t\t\t\t\t\t\t<div>\r\n\t\t\t\t\tAlso known as: <span style="color: whitesmoke">Xempy -A- NEW SEASON HYPEE</span>, <span style="color: whitesmoke">Brekija</span>, <span style="color: whitesmoke">FAIRPLAY ORGANISATION</span>, <span style="color: whitesmoke">Xempy | csgoshit.com</span>, <span style="color: whitesmoke">Xempy | csgorage.com</span>, <span style="color: whitesmoke">\u2500\u2500\u2500\u2554\u2550\u2550\u2550\u2557</span>, <span style="color: whitesmoke">XempyTheCupcake</span>\t\t\t\t</div>\r\n\t\t\t\t\t</div>\r\n\t\t</div>\r\n\t\t\t\t\t\t\t\t<div class="search_row">\r\n\t<div class="search_result_friend">\r\n\t\t\t</div>\r\n\t<div class="mediumHolder_default" data-miniprofile="315139919" style="float:left;"><div class="avatarMedium"><a href="https://steamcommunity.com/id/filipppp"><img src="https://steamcdn-a.akamaihd.net/steamcommunity/public/images/avatars/ca/caa5747851b5255a2d76699d855bf20e709af3d1_medium.jpg"></a></div></div>\r\n\t<div class="searchPersonaInfo">\r\n\t\t<a class="searchPersonaName" href="https://steamcommunity.com/id/filipppp">Xempy -A-</a><br />\r\n\t\t\t\t\tIgor<br />\t\t\tSerbia&nbsp;<img style="margin-bottom:-2px" src="https://steamcommunity-a.akamaihd.net/public/images/countryflags/rs.gif" border="0" />\t\t\t</div>\r\n\t<div style="clear:left"></div>\r\n\r\n\t\t\t<div class="search_match_info">\r\n\t\t\t\t\t\t\t\t\t\t<div>Custom URL: steamcommunity.com/id/<span style="color: whitesmoke">filipppp</span></div>\r\n\t\t\t\t\t\t\t\t\t\t<div>\r\n\t\t\t\t\tAlso known as: <span style="color: whitesmoke">Extreeemeeee</span>, <span style="color: whitesmoke">Ratatatatatata</span>\t\t\t\t</div>\r\n\t\t\t\t\t</div>\r\n\t\t</div>\r\n\t\t\t\t\t\t\t\t<div class="search_row">\r\n\t<div class="search_result_friend">\r\n\t\t\t</div>\r\n\t<div class="mediumHolder_default" data-miniprofile="258386073" style="float:left;"><div class="avatarMedium"><a href="https://steamcommunity.com/id/lenyagoglov"><img src="https://steamcdn-a.akamaihd.net/steamcommunity/public/images/avatars/71/71ee8d0519c74cea0352836b188c747b36224f8f_medium.jpg"></a></div></div>\r\n\t<div class="searchPersonaInfo">\r\n\t\t<a class="searchPersonaName" href="https://steamcommunity.com/id/lenyagoglov">Xempys</a><br />\r\n\t\t\t\t\tTed<br />\t\t\tLuxembourg&nbsp;<img style="margin-bottom:-2px" src="https://steamcommunity-a.akamaihd.net/public/images/countryflags/lu.gif" border="0" />\t\t\t</div>\r\n\t<div style="clear:left"></div>\r\n\r\n\t\t\t<div class="search_match_info">\r\n\t\t\t\t\t\t\t\t\t\t<div>Custom URL: steamcommunity.com/id/<span style="color: whitesmoke">lenyagoglov</span></div>\r\n\t\t\t\t\t\t\t\t</div>\r\n\t\t</div>\r\n\t\t\t\t\t\t\t\t<div class="search_row">\r\n\t<div class="search_result_friend">\r\n\t\t\t</div>\r\n\t<div class="mediumHolder_default" data-miniprofile="257927191" style="float:left;"><div class="avatarMedium"><a href="https://steamcommunity.com/id/rostislavtseychuk85"><img src="https://steamcdn-a.akamaihd.net/steamcommunity/public/images/avatars/86/8641de85a283f0d23d1cbeb35ee0c0d5ca87a83b_medium.jpg"></a></div></div>\r\n\t<div class="searchPersonaInfo">\r\n\t\t<a class="searchPersonaName" href="https://steamcommunity.com/id/rostislavtseychuk85">Xempys</a><br />\r\n\t\t\t\t\tGabriel<br />\t\t\tLebanon&nbsp;<img style="margin-bottom:-2px" src="https://steamcommunity-a.akamaihd.net/public/images/countryflags/lb.gif" border="0" />\t\t\t</div>\r\n\t<div style="clear:left"></div>\r\n\r\n\t\t\t<div class="search_match_info">\r\n\t\t\t\t\t\t\t\t\t\t<div>Custom URL: steamcommunity.com/id/<span style="color: whitesmoke">rostislavtseychuk85</span></div>\r\n\t\t\t\t\t\t\t\t</div>\r\n\t\t</div>\r\n\t\t\t\t\t\t\t\t<div class="search_row">\r\n\t<div class="search_result_friend">\r\n\t\t\t</div>\r\n\t<div class="mediumHolder_default" data-miniprofile="252811169" style="float:left;"><div class="avatarMedium"><a href="https://steamcommunity.com/id/mochulskayaa"><img src="https://steamcdn-a.akamaihd.net/steamcommunity/public/images/avatars/76/76c10b0744403468aaf8090f56ca8ddd61338925_medium.jpg"></a></div></div>\r\n\t<div class="searchPersonaInfo">\r\n\t\t<a class="searchPersonaName" href="https://steamcommunity.com/id/mochulskayaa">Xempys</a><br />\r\n\t\t\t\t\tRichard<br />\t\t\tGuatemala&nbsp;<img style="margin-bottom:-2px" src="https://steamcommunity-a.akamaihd.net/public/images/countryflags/gt.gif" border="0" />\t\t\t</div>\r\n\t<div style="clear:left"></div>\r\n\r\n\t\t\t<div class="search_match_info">\r\n\t\t\t\t\t\t\t\t\t\t<div>Custom URL: steamcommunity.com/id/<span style="color: whitesmoke">mochulskayaa</span></div>\r\n\t\t\t\t\t\t\t\t</div>\r\n\t\t</div>\r\n\t\t\t\t\t\t\t\t<div class="search_row">\r\n\t<div class="search_result_friend">\r\n\t\t\t</div>\r\n\t<div class="mediumHolder_default" data-miniprofile="260028611" style="float:left;"><div class="avatarMedium"><a href="https://steamcommunity.com/id/katerukhina"><img src="https://steamcdn-a.akamaihd.net/steamcommunity/public/images/avatars/24/24241e97a6caf3bd932a01ea22afc6b3d758f1a1_medium.jpg"></a></div></div>\r\n\t<div class="searchPersonaInfo">\r\n\t\t<a class="searchPersonaName" href="https://steamcommunity.com/id/katerukhina">Xempys</a><br />\r\n\t\t\t\t\tChristian<br />\t\t\tFiji&nbsp;<img style="margin-bottom:-2px" src="https://steamcommunity-a.akamaihd.net/public/images/countryflags/fj.gif" border="0" />\t\t\t</div>\r\n\t<div style="clear:left"></div>\r\n\r\n\t\t\t<div class="search_match_info">\r\n\t\t\t\t\t\t\t\t\t\t<div>Custom URL: steamcommunity.com/id/<span style="color: whitesmoke">katerukhina</span></div>\r\n\t\t\t\t\t\t\t\t</div>\r\n\t\t</div>\r\n\t\t\t\t\t\t\t\t<div class="search_row">\r\n\t<div class="search_result_friend">\r\n\t\t\t</div>\r\n\t<div class="mediumHolder_default" data-miniprofile="292454844" style="float:left;"><div class="avatarMedium"><a href="https://steamcommunity.com/id/purdenkos"><img src="https://steamcdn-a.akamaihd.net/steamcommunity/public/images/avatars/5c/5c7f9d1b71a68ab8599ae0fe8f2c4e0445348eaa_medium.jpg"></a></div></div>\r\n\t<div class="searchPersonaInfo">\r\n\t\t<a class="searchPersonaName" href="https://steamcommunity.com/id/purdenkos">Xempys</a><br />\r\n\t\t\t\t\tPatrik<br />\t\t\tCote D\'ivoire (Ivory Coast)&nbsp;<img style="margin-bottom:-2px" src="https://steamcommunity-a.akamaihd.net/public/images/countryflags/ci.gif" border="0" />\t\t\t</div>\r\n\t<div style="clear:left"></div>\r\n\r\n\t\t\t<div class="search_match_info">\r\n\t\t\t\t\t\t\t\t\t\t<div>Custom URL: steamcommunity.com/id/<span style="color: whitesmoke">purdenkos</span></div>\r\n\t\t\t\t\t\t\t\t</div>\r\n\t\t</div>\r\n\t\t\t\t\t\t\t\t<div class="search_row">\r\n\t<div class="search_result_friend">\r\n\t\t\t</div>\r\n\t<div class="mediumHolder_default" data-miniprofile="56000172" style="float:left;"><div class="avatarMedium"><a href="https://steamcommunity.com/id/v2incent"><img src="https://steamcdn-a.akamaihd.net/steamcommunity/public/images/avatars/ac/ac45a256e0a14712efff255db0105fedd80a4f0e_medium.jpg"></a></div></div>\r\n\t<div class="searchPersonaInfo">\r\n\t\t<a class="searchPersonaName" href="https://steamcommunity.com/id/v2incent">Ext4ze ` ^0| \'Xempy^0\'</a><br />\r\n\t\t\t\t\tv2incent<br />\t\t\t&nbsp;\t\t\t</div>\r\n\t<div style="clear:left"></div>\r\n\r\n\t\t\t<div class="search_match_info">\r\n\t\t\t\t\t\t\t\t\t\t<div>Custom URL: steamcommunity.com/id/<span style="color: whitesmoke">v2incent</span></div>\r\n\t\t\t\t\t\t\t\t</div>\r\n\t\t</div>\r\n\t\t\t\t\t\t\t\t<div class="search_row">\r\n\t<div class="search_result_friend">\r\n\t\t\t</div>\r\n\t<div class="mediumHolder_default" data-miniprofile="297670812" style="float:left;"><div class="avatarMedium"><a href="https://steamcommunity.com/id/xempy"><img src="https://steamcdn-a.akamaihd.net/steamcommunity/public/images/avatars/62/62ea583f7f838562c73cb70e3993e27acd583aef_medium.jpg"></a></div></div>\r\n\t<div class="searchPersonaInfo">\r\n\t\t<a class="searchPersonaName" href="https://steamcommunity.com/id/xempy">xempsanity `\xb4</a><br />\r\n\t\t\t\t\tIgor<br />\t\t\tSerbia&nbsp;<img style="margin-bottom:-2px" src="https://steamcommunity-a.akamaihd.net/public/images/countryflags/rs.gif" border="0" />\t\t\t</div>\r\n\t<div style="clear:left"></div>\r\n\r\n\t\t\t<div class="search_match_info">\r\n\t\t\t\t\t\t\t\t\t\t<div>Custom URL: steamcommunity.com/id/<span style="color: whitesmoke">xempy</span></div>\r\n\t\t\t\t\t\t\t\t\t\t<div>\r\n\t\t\t\t\tAlso known as: <span style="color: whitesmoke">XEMPYKiNGOFNOTHiNG</span>, <span style="color: whitesmoke">X3MPY</span>, <span style="color: whitesmoke">X3MPY * brother\'s on acc</span>\t\t\t\t</div>\r\n\t\t\t\t\t</div>\r\n\t\t</div>\r\n\t\t\t\t\t\t\t\t<div class="search_row">\r\n\t<div class="search_result_friend">\r\n\t\t\t</div>\r\n\t<div class="mediumHolder_default" data-miniprofile="121633219" style="float:left;"><div class="avatarMedium"><a href="https://steamcommunity.com/id/Empyrk"><img src="https://steamcdn-a.akamaihd.net/steamcommunity/public/images/avatars/6b/6b87d7a04bf211a2665b828436ad34e549f2b193_medium.jpg"></a></div></div>\r\n\t<div class="searchPersonaInfo">\r\n\t\t<a class="searchPersonaName" href="https://steamcommunity.com/id/Empyrk">Empyrk</a><br />\r\n\t\t\t\t\tMatteo<br />\t\t\tToscana, Italy&nbsp;<img style="margin-bottom:-2px" src="https://steamcommunity-a.akamaihd.net/public/images/countryflags/it.gif" border="0" />\t\t\t</div>\r\n\t<div style="clear:left"></div>\r\n\r\n\t\t\t<div class="search_match_info">\r\n\t\t\t\t\t\t\t\t\t\t<div>Custom URL: steamcommunity.com/id/<span style="color: whitesmoke">Empyrk</span></div>\r\n\t\t\t\t\t\t\t\t</div>\r\n\t\t</div>\r\n\t\t\t\t<div style="clear: both"></div>\r\n\t\t<div style="float: right; padding-bottom: 2px">\r\n\t\t\t\t\t\tShowing 1 - 11 of 11\t\t\t</div>\r\n\t<div style="clear: both"></div>\r\n\r\n\r\n', u'search_filter': u'users', u'search_text': u'xempy', u'success': 1, u'search_page': 1}

您只需访问results["html"]即可获取来源。