通过Python Requests和Postman重新创建XMLHttpRequest的问题

时间:2019-04-20 21:36:10

标签: python web-scraping xmlhttprequest python-requests postman

我正在尝试使用Python或Postman缩短bit.do上的链接。 bit.do url shortener form 在Chrome中,一切正常。但不是使用Python / Postman。我得到了页面,但是只有错误,但是来自Chrome的请求和我要使之看起来相同的请求。 使用Chrome开发者工具捕获了两个POST请求,两个请求均名为url-shortener.pl。 以下是这些请求(因此强迫我不要使用URL缩短器的链接,因此我不得不转义它们。):

General:
Request URL: https://bit\.do/mod_perl/url-shortener.pl
Request Method: POST
Status Code: 200 
Remote Address: 54.83.52.76:443
Referrer Policy: no-referrer-when-downgrade

Response Headers:
content-type: application/json
date: Sat, 20 Apr 2019 20:12:06 GMT
server: nginx/1.14.1
status: 200

Request Headers:
:authority: bit\.do
:method: POST
:path: /mod_perl/url-shortener.pl
:scheme: https
accept: */*
accept-encoding: gzip, deflate, br
accept-language: en-US,en;q=0.9,ru-RU;q=0.8,ru;q=0.7
content-length: 112
content-type: application/x-www-form-urlencoded; charset=UTF-8
cookie: permasession=1554914974|phkaoymp1b; __utmc=60667454; __utma=60667454.372171702.1554914974.1555785612.1555789898.5; __utmz=60667454.1555789898.5.4.utmcsr=dynomapper.com|utmccn=(referral)|utmcmd=referral|utmcct=/blog/21-sitemaps-and-seo/495-top-14-url-shorteners; __utmt=1; __utmb=60667454.3.10.1555789898
origin: https://bit\.do
referer: https://bit\.do/
user-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36
x-requested-with: XMLHttpRequest

Form Data:
action: shorten
url: google.com
url2:  site2 
url_hash: 
url_stats_is_private: 0
permasession: 1554914974|phkaoymp1b


General:
Request URL: https://bit\.do/mod_perl/url-shortener.pl
Request Method: POST
Status Code: 200

Response Headers: 
Remote Address: 54.83.52.76:443
Referrer Policy: no-referrer-when-downgrade
content-type: application/json
date: Sat, 20 Apr 2019 20:12:06 GMT
server: nginx/1.14.1
status: 200

Request Headers:
:authority: bit\.do
:method: POST
:path: /mod_perl/url-shortener.pl
:scheme: https
accept: */*
accept-encoding: gzip, deflate, br
accept-language: en-US,en;q=0.9,ru-RU;q=0.8,ru;q=0.7
content-length: 32
content-type: application/x-www-form-urlencoded; charset=UTF-8
cookie: permasession=1554914974|phkaoymp1b; __utmc=60667454; __utma=60667454.372171702.1554914974.1555785612.1555789898.5; __utmz=60667454.1555789898.5.4.utmcsr=dynomapper.com|utmccn=(referral)|utmcmd=referral|utmcct=/blog/21-sitemaps-and-seo/495-top-14-url-shorteners; __utmt=1; __utmb=60667454.3.10.1555789898
origin: https://bit\.do
referer: https://bit\.do/
user-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36
x-requested-with: XMLHttpRequest

Form Data:
action: get_title
url_id: 49444432

首先,我尝试了Postman。参数: postman post request parameters 标头: postman post request headers 响应预览: postman response preview 结果:错误:无效的URL。请输入有效的网址。 然后我切换到Python并尝试使用Requests。

def bitdo():
    headers = {
        'accept': '*/*',
        'accept-encoding': 'gzip, deflate, br',
        'accept-language': 'en-US,en;q=0.9,ru-RU;q=0.8,ru;q=0.7',
        'content-length': '112',
        'content-type': 'application/x-www-form-urlencoded; charset=UTF-8',
        'cookie': 'permasession=1554914974|phkaoymp1b; __utmc=60667454; __utma=60667454.372171702.1554914974.1555785612.1555789898.5; __utmz=60667454.1555789898.5.4.utmcsr=dynomapper.com|utmccn=(referral)|utmcmd=referral|utmcct=/blog/21-sitemaps-and-seo/495-top-14-url-shorteners; __utmt=1; __utmb=60667454.3.10.1555789898',
        'origin': 'https://bit\.do',
        'referer': 'https://bit\.do/',
        'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36'
    }
    params = {
        'action': 'shorten',
        'url': 'ya.ru',
        'url2': 'site2',
        'url_hash': '',
        'url_stats_is_private': 0,
        'permasessions': '1554914974|phkaoymp1b'
    }

    r = requests.post('https://bit\.do/mod_perl/url-shortener.pl', params, headers=headers)
    print(r.text)

结果是:

<title>Warning</title>

<!-- head.thtml -->
<meta http-equiv="content-type" content="text/html; charset=utf-8">


<style type="text/css">

a { color: #338; text-decoration: none; }

html {
 height: 100%;
}

body {
 font-family: Arial;
 margin: 0;
 height: 100%;
 color: #404040;
}

.url {
 background-color: #f9f9f9;
 width: 100%;
 height: 16px;
 border: 1px solid #654;
 padding: 3px;
}

.round {
 background-color: white;
 border: 1px solid #bbb;
 margin-bottom: 10px;
 -moz-border-radius: 15px;
 -webkit-border-radius: 15px;
 border-radius: 15px;
 width: 100%;
 max-width: 840px;
}

.input-placeholder {
 position: absolute;
 font-style: italic;
 color: #aaa;
 margin: 0.3em 0 0 0.5em;
}

.orange_logo {
  color: #fab035;
}

.orange_text {
 color: #D04000;
}

.orange_dot {
 color: #ff8800;
 margin-left: 10px;
}


.table-striped > tbody > tr:nth-child(odd) {
  background-color: #efefef;
}

.table1 {
 border-spacing: 0;
 border-collapse: collapse;
}

.table1 th {
 text-align: center;
}

.table1 td, .table1 th {
 border: 1px solid #a0a0a0;
 padding: 5px;
}

.bitbox {
 width: 100%;
}

/* first td should have minimal width */
.bitbox td:first-child {
 width: 12px;
 padding-right: 12px;
 white-space: nowrap;
}

</style>


<body>

<div style="
 height: 30px; 
 line-height: 30px;
 background-color: #fab035;
 xtext-align: right;
 color: black; 
 padding-right: 10px;
 font-size: 0.8em;
">

<a href="http://bit.do/"><img src="/images/bit-do-url-shortener-logo-66x66.png"
alt="URL Shortener - bit.do"
style="
 width: 31px;
 height: 31px;
 margin-left: 80px;
 margin-right: 10px;
vertical-align: middle;
"
></a>

<!--
Create short link: 
<input placeholder="http://...">
-->


<!--
<b>
<a href="http://bit.do/" style="color: black">bit.do - URL Shortener</a>
</b>
-->

<!--
<a href="/admin" style="color: black">Login to manage your links</a>
-->



<table border=0 cellpadding=0 cellspacing=0 width=100%>

<tr>
 <td class=top_left>&nbsp;</td>
 <td class=top_middle><b>Warning</b></td>
 <td class=top_right>&nbsp;</td>
</tr>

<tr>
 <td class=middle_left></td>
 <td class=middle_middle>



<ul>
<br>
<pre class="warning_message">ERROR site2: Can not create short link. Contact us for API usage.
</pre>

<p>
<br>
<br>





<br>
<br>
<br>





</ul>
<p>
<a href="http://bit.do/" class="button">&#8617; Back to bit.do (url shortener)</a>

<p>


</td>
</tr>
</table>
<!-- /table all -->


<p style="float: right; text-align: right; font-size: 0.8em; color: #808080; margin: 0; margin-right: 10px;">
Follow us on Twitter: <b><a href="https://twitter.com/bitdo" target=_blank class="orange_logo" style="border: 0;">@bitdo</a>&nbsp;</b>
</p>
<br style="clear: both;">
<hr style="
 border: 0px;
 height: 1px;
 background-color: #e0e0e0;
 xbackground-color: #fab035;

">

<div style="
 margin: 0 auto;
 text-align: center;
 xbackground-color: #c8c8c8;
 bottom: 0px;
"
>
<span style="font-size: 0.7em; line-height: 35px;">
Shorten and personalize long web addresses. Get real-time traffic statistics for your links. Free service.<br>
</span>
<span style="font-size: 15px; font-family: arial;">

<a href="http://bit.do/">bit.do - home</a>

<span style="color:#bbbbbb">|</span>
<a href="/best-url-shortener.php">why bit.do is better</a>
<span style="color:#bbbbbb">|</span>
<a href="/about-us.php">about us</a> 
<span style="color:#bbbbbb">|</span>
<a href="/what-is-url-shortener.php">about url shortener</a>
<span style="color:#bbbbbb">|</span>
<a href="/contact.php">contact</a>

<!--
<a href="">FAQ</a> |
<a href="">terms</a> |
-->
</span>
<p style="margin: 7px; color: #909090; font-size: 0.5em;">
Copyright &copy; 2019 - Insite</p>

</div> <!-- /foot -->


</div>
<!-- /height 100% -->

<!-- TODO: ALREADY LOADED ? -->
<script src="/js/jquery/jquery.min.js"></script>

<script type="text/javascript">

  var _gaq = _gaq || [];
  _gaq.push(['_setAccount', 'UA-756399-13']);
  _gaq.push(['_trackPageview']);

  (function() {
    var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
    ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
    var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
  })();

</script>

</html>

<!-- END OF erro.thtml -->

错误site2:无法创建短链接。请与我们联系以获取API使用情况。

我希望HTML响应中的链接缩短。有人可以向我解释我的请求出了什么问题吗?

1 个答案:

答案 0 :(得分:2)

似乎最少的工作代码。

它需要标头'X-Requested-With',因为它是AXAJ / XHR请求。

它需要permasession,但第一个GET不会发送它,因此很可能是使用JavaScript在页面上生成的。但它始终对我有用,permasession

也许稍后它将需要新的/新的permasession

" site2 "

中有空格
import requests

headers={
    'X-Requested-With': 'XMLHttpRequest', # need it
}

data = {
    'action': 'shorten',
    'url': 'https://onet.pl',
    'url2': ' site2 ', # need spaces 
    'url_hash': None,
    'url_stats_is_private': 0,
    'permasession': '1555801674|ole2ky65f9', # need it
}

r = requests.post('http://bit\.do/mod_perl/url-shortener.pl', headers=headers, data=data)

print(r.status_code)
print(r.json())

开始时不需要requests.Session()User-AgentGET请求。


编辑1555801674中的值'permasession': '1555801674|ole2ky65f9'是带有当前日期和时间的时间戳。

import datetime

datetime.datetime.fromtimestamp(1555801674)

datetime.datetime(2019, 4, 21, 1, 7, 54)

也许ole2ky65f9也是时间戳,但它是缩短的值。