当我运行以下代码时,它返回一个空字符串
url = 'http://www.allflicks.net/wp-content/themes/responsive/processing/processing_us.php?draw=5&columns[0][data]=box_art&columns[0][name]=&columns[0][searchable]=true&columns[0][orderable]=false&columns[0][search][value]=&columns[0][search][regex]=false&columns[1][data]=title&columns[1][name]=&columns[1][searchable]=true&columns[1][orderable]=true&columns[1][search][value]=&columns[1][search][regex]=false&columns[2][data]=year&columns[2][name]=&columns[2][searchable]=true&columns[2][orderable]=true&columns[2][search][value]=&columns[2][search][regex]=false&columns[3][data]=genre&columns[3][name]=&columns[3][searchable]=true&columns[3][orderable]=true&columns[3][search][value]=&columns[3][search][regex]=false&columns[4][data]=rating&columns[4][name]=&columns[4][searchable]=true&columns[4][orderable]=true&columns[4][search][value]=&columns[4][search][regex]=false&columns[5][data]=available&columns[5][name]=&columns[5][searchable]=true&columns[5][orderable]=true&columns[5][search][value]=&columns[5][search][regex]=false&columns[6][data]=director&columns[6][name]=&columns[6][searchable]=true&columns[6][orderable]=true&columns[6][search][value]=&columns[6][search][regex]=false&columns[7][data]=cast&columns[7][name]=&columns[7][searchable]=true&columns[7][orderable]=true&columns[7][search][value]=&columns[7][search][regex]=false&order[0][column]=5&order[0][dir]=desc&start=0&length=25&search[value]=sherlock&search[regex]=false&movies=true&shows=true&documentaries=true&rating=netflix&_=1451768717982'
print requests.get(url).text
但是如果我把网址放在我的浏览器中,它会显示我的json信息。调试浏览器时我确实注意到必须安装插件篡改数据才能查看json。如果浏览器没有插件,则会出现空白页面。所以,我的理论是它必须做一些事情,而http请求正在递交,但我不是从这里去的地方。
任何帮助都会很棒。
答案 0 :(得分:3)
您需要打开session,访问主页面以设置Cookie,然后将XHR请求发送到“processing_us.php”:
url = "http://www.allflicks.net/wp-content/themes/responsive/processing/processing_us.php?draw=5&columns[0][data]=box_art&columns[0][name]=&columns[0][searchable]=true&columns[0][orderable]=false&columns[0][search][value]=&columns[0][search][regex]=false&columns[1][data]=title&columns[1][name]=&columns[1][searchable]=true&columns[1][orderable]=true&columns[1][search][value]=&columns[1][search][regex]=false&columns[2][data]=year&columns[2][name]=&columns[2][searchable]=true&columns[2][orderable]=true&columns[2][search][value]=&columns[2][search][regex]=false&columns[3][data]=genre&columns[3][name]=&columns[3][searchable]=true&columns[3][orderable]=true&columns[3][search][value]=&columns[3][search][regex]=false&columns[4][data]=rating&columns[4][name]=&columns[4][searchable]=true&columns[4][orderable]=true&columns[4][search][value]=&columns[4][search][regex]=false&columns[5][data]=available&columns[5][name]=&columns[5][searchable]=true&columns[5][orderable]=true&columns[5][search][value]=&columns[5][search][regex]=false&columns[6][data]=director&columns[6][name]=&columns[6][searchable]=true&columns[6][orderable]=true&columns[6][search][value]=&columns[6][search][regex]=false&columns[7][data]=cast&columns[7][name]=&columns[7][searchable]=true&columns[7][orderable]=true&columns[7][search][value]=&columns[7][search][regex]=false&order[0][column]=5&order[0][dir]=desc&start=0&length=25&search[value]=sherlock&search[regex]=false&movies=true&shows=true&documentaries=true&rating=netflix&_=1451768717982"
with requests.Session() as session:
session.headers = {"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36"}
session.get("http://www.allflicks.net/")
response = session.get(url, headers={"Accept" : "application/json, text/javascript, */*; q=0.01",
"X-Requested-With": "XMLHttpRequest",
"Referer": "http://www.allflicks.net/",
"Host": "www.allflicks.net"})
print(response.json())