python在PythonAnywhere上请求奇怪的错误

时间:2017-10-02 22:20:45

标签: python python-requests pythonanywhere

因此,当我在PyCharm / shell脚本中运行本地计算机时,以下代码可以正常工作:

# -*- coding: utf-8 -*-

import requests
from lxml import etree, html
import chardet

def gimme_pairs():

    url = "https://halbidoncom/sha.xml"
    page = requests.get(url).content
    encoding = chardet.detect(page)['encoding']

    if encoding != 'utf-8':
        page = page.decode(encoding, 'replace').encode('utf-8')

    doc = html.fromstring(page, base_url=url)
    print(doc)
    print(page)
    wanted = doc.xpath('//location')

    print(wanted)

    date_list = None
    tashkif_list = None

    for elem in wanted:
        date_list = elem.xpath('locationdata/timeunitdata/date/text()')
        tashkif_list = elem.xpath('locationdata/timeunitdata/element/elementvalue/text()')

但是在PythonAnywhere上我获得了doc的输出:

  

B' \ n \ n \ nChallenge = 355121; \ nChallengeId = 58551073; \ nGenericErrorMessageCookies ="饼干   必须启用才能查看此内容   页面。"; \ n \ n \ n \ n功能测试(var1)\ n {\ n \ t \ t \ t \ t \ t \ t   var_str ="" + Challenge; \ n \ tvar var_arr = var_str.split(""); \ n \ t \ tvar   LastDig = VAR   _arr.reverse()[0]; \ n \ tvar minDig = var_arr.sort()[0]; \ n \ tvar subvar1 =(2 *(var_arr [2]))+(var_arr [1] * 1); \ n \ tvar subvar2 =(2 * var_arr [2])+ v   ar_arr [1]; \ n \ TVAR   my_pow = Math.pow(((var_arr [0] * 1)+2),var_arr [1]); \ n \ TVAR   x =(var1 * 3 + subvar1) 1; \ n \ tvar y = Math.cos(Math.PI subvar2); \ n \ t变量a   nswer = X * Y; \ n \ tanswer- = my_pow * 1; \ n \ tanswer + =(minDig * 1) - (* LastDig 1); \ n \ tanswer =回答+ subvar2; \ n \ treturn   回答; \ n} \ n \ n \ ncl ent = null; \ nif   (window.XMLHttpRequest)\ n {\ n \ t \ t \ t \ t \ t \ n \ n \ n \ n \ n \ n \ n \ n \ n \ n \ n \ n \ n \ n \ n \ n \ n \ n \ n \ n \ n   的XMLHttpRequest(); \ N} \ nelse \ N {\ n \ TIF   (window.ActiveXObject)\ n \ t {\ n \ t \ tclient = new   的ActiveXObject(\' MSXML2.XMLHTTP.3.0 \&#39); \ n \吨}; \ N} \ NIF   !(((!!客户端)及及(!! Math.pow)及及(!! Math.cos)及及(!! []排序)及;及(!! [ ] .reverse)))\ N {\ n \ tdocu   ment.write("并非所有需要的JavaScript方法都是   支持。
"); \ n \ n} \ nelse \ n {\ n \ tclient.onreadystatechange =   function()\ n \ t {\ n \ t \ tif(c lient.readyState == 4)\ n \ t \ t {\ n \ t \ t \ t \ ttvar   的myCookie = client.getResponseHeader(" X-AA-Cookie的值&#34); \ n \吨\吨\ TIF   ((MyCookie == null)||(MyCooki   È=="&#34))\ n \吨\吨\吨{\ n \吨\吨\吨\ tdocument.write(client.responseText); \ n \吨\吨\吨\ treturn ; \ n \吨\吨\吨} \ n \吨\吨\吨\ n \吨\吨\ TVAR   cookieName = MyCookie.split(\' = \')[0]; \ n \ t \ t \ tif   (document.cookie.indexOf(cookieName)== - 1)\ n \吨\吨\吨{\ n \吨\吨\吨\ tdocument.write(GenericErrorMessageCookies); \ n \吨\吨\吨\ treturn; \   Ñ\吨\吨\吨} \ n \吨\吨\ twindow.location.reload(真); \ n \吨\吨} \ n \吨}; \ n \ TY =试验(挑战); \ n \ tclient.open(" POST",window.location的,TRUE); \ n \ tclient.set   RequestHeader(\' X-AA-挑战-ID \&#39 ;,   ChallengeId); \ n \ tclient.setRequestHeader(\' X-AA-质询结果\',Y); \ n \ tclient.setRequestHeader(\' X-   AA-挑战\',挑战); \ n \ tclient.setRequestHeader(\'内容类型\'   ,\' text / plain \'); \ n \ tclient.send(); \ n} \ n \ n \ n \ n   必须启用nJavaScript才能查看此内容   页面\ n \ n'

我尝试的事情:

  • 交换urllib.open()
  • 的请求
  • 手动添加标题
  • 确保安装相同的软件包
  • 升级到PA高级帐户

是什么给出的?令我印象深刻的是,请求应该在我的机器和他们的机器上具有相同的功能。

1 个答案:

答案 0 :(得分:3)

看起来您尝试抓取的服务器具有保护功能,可以确保您使用真正的浏览器/请求后面的人。如果您很好地格式化该响应,您会发现它在页面上使用ChallengeChallengeId设置了一些标题。

我假设PythonAnywhere使用的IP /服务器已被服务器所有者添加到列表中以阻止请求(过去可能有人真的发送过垃圾邮件?)

仔细查看相同的标题,我发现这个项目似乎解决了同样的问题:https://github.com/niryariv/opentaba-server/

他们检查了挑战:https://github.com/niryariv/opentaba-server/blob/master/lib/mavat_scrape.py#L31并使用此助手解析他们:https://github.com/niryariv/opentaba-server/blob/master/lib/helpers.py#L109

希望有所帮助!