curl和python请求库

时间:2016-11-21 07:54:55

标签: python json curl python-requests

我试图删除这个问题 - 但是第二个想法我会保留它 - 这是一个现场演示,作为开发者我应该更加关注细节

我想从网站上获取一些数据。请求的URL将查看请求的内容类型,然后相应地做出响应。

所以我尝试了curl命令:

curl --header "Accept: application/json, text/javascript, */*; q=0.01\r\nX-Requested-With: XMLHttpRequest\r\nUser-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/37.0.2062.120 Chrome/37.0.2062.120 Safari/537.36\r\n" http://www.tpex.org.tw/web/stock/margin_trading/margin_balance/margin_bal_result.php\?l\=en-us\&d\=2016/11/15\&_\=1479700586981 -v
* About to connect() to www.tpex.org.tw port 80 (#0)
*   Trying 210.63.162.130... connected
> GET /web/stock/margin_trading/margin_balance/margin_bal_result.php?l=en-us&d=2016/11/15&_=1479700586981 HTTP/1.1
> User-Agent: curl/7.22.0 (x86_64-pc-linux-gnu) libcurl/7.22.0 OpenSSL/1.0.1 zlib/1.2.3.4 libidn/1.23 librtmp/2.3
> Host: www.tpex.org.tw
> Accept: application/json, text/javascript, */*; q=0.01\r\nX-Requested-With: XMLHttpRequest\r\nUser-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/37.0.2062.120 Chrome/37.0.2062.120 Safari/537.36\r\nAccept-Encoding: gzip,deflate,sdch\r\n
> 
* HTTP 1.0, assume close after body
< HTTP/1.0 200 OK
< Date: Mon, 21 Nov 2016 07:35:56 GMT
< Server: Apache
< Content-Type: text/html; charset=utf-8
< X-Cache: MISS from localhost
< X-Cache-Lookup: MISS from localhost:3128
< Via: 1.0 localhost (squid/3.1.19)
< Connection: close
<
{"reportDate":"2016\/11\/15","iTotalRecords":610,"aaData":[["006201","YA HORNG ELECTRONIC CO.","6","0","0","0","6","0","0.09","6,361","0","0","0","0","0","0","0.0","6,361","0",""],...}

响应被截断但基本上是JSON。

然而,有我的Python代码,我认为没有太大区别。但回应是html ......

g_tpex_headers = {
    'Accept-Encoding': 'gzip,deflate,sdch',
    'Accept': 'application/json, text/javascript, */*; q=0.01',
    'User-Agent': (
        'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36'
        ' (KHTML, like Gecko) Ubuntu Chromium/37.0.2062.120'
        ' Chrome/37.0.2062.120 Safari/537.36'
    ),
    'X-Requested-With': 'XMLHttpRequest',
}
data_link = (
    'http://www.tpex.org.tw/web/stock/margin_trading/margin_balance/'
    'margin_bal.php?l=en-us&d={}&_=1479700586981'
)
data = []
with requests.Session() as session:
    session.headers = g_tpex_headers
    res = session.get(
        actual_data_link.format(target_dt.strftime('%Y/%m/%d'))
    )
    print(res.content[:400])

日志:

send: 'GET /web/stock/margin_trading/margin_balance/margin_bal.php?l=en-us&d=2016/11/18&_=1479700586981 HTTP/1.1\r\nHost: www.tpex.org.tw\r\nX-Requested-With: XMLHttpRequest\r\nAccept-Encoding: gzip,deflate,sdch\r\nAccept: application/json, text/javascript, */*; q=0.01\r\nUser-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/37.0.2062.120 Chrome/37.0.2062.120 Safari/537.36\r\n\r\n'

和回复

<!DOCTYPE HTML>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=Edge">
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title> HOME&nbsp;&gt;&nbsp;Mainboard&nbsp;&gt;&nbsp;Margin Trading&nbsp;&gt;&nbsp;Margin Balance</title>
<link rel="icon" type="image/ico" href="/web/images/favicon.ic

我看不出太大的区别。那么为什么python请求没有得到JSON响应。

2 个答案:

答案 0 :(得分:3)

您提出请求的路径不同。在cURL命令中,最终路径组件为margin_bal_result.php,在Python脚本中为margin_bal.php。在Python脚本中更改路径以匹配cURL命令中的路径后,您将获得JSON响应。

更新:使用cURL,您需要单独指定标题,而不是将它们添加到一起。因此,在您的示例中,您应该使用以下命令:

curl --header "Accept: application/json, text/javascript, */*; q=0.01" --header "X-Requested-With: XMLHttpRequest" --header "User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/37.0.2062.120 Chrome/37.0.2062.120 Safari/537.36" http://www.tpex.org.tw/web/stock/margin_trading/margin_balance/margin_bal_result.php\?l\=en-us\&d\=2016/11/15\&_\=1479700586981 -v > httpres.txt

这会导致发送以下请求:

* Hostname was NOT found in DNS cache
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying 210.63.162.130...
* Connected to www.tpex.org.tw (210.63.162.130) port 80 (#0)
> GET /web/stock/margin_trading/margin_balance/margin_bal_result.php?l=en-us&d=2016/11/15&_=1479700586981 HTTP/1.1
> Host: www.tpex.org.tw
> Accept: application/json, text/javascript, */*; q=0.01
> X-Requested-With: XMLHttpRequest
> User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/37.0.2062.120 Chrome/37.0.2062.120 Safari/537.36

答案 1 :(得分:1)

尝试让python中的请求与curl中的请求完全相同。 你的代码:

data_link = (
    'http://www.tpex.org.tw/web/stock/margin_trading/margin_balance/'
    'margin_bal.php?l=en-us&d={}&_=1479700586981'
)

改变:

data_link = (
    'http://www.tpex.org.tw/web/stock/margin_trading/margin_balance/'
    'margin_bal_result.php?l=en-us&d={}&_=1479700586981'
)

在我更正 data_link 后,我发现它确实有效。