Question

所以我有一个使用裸套接字下载网页的程序。我必须使用裸套接字，不能使用任何请求或urllib等。我在Squid代理后面的网络上，所以我的python程序只需connect到代理服务器并发出GET请求我从HAR文件中获取的对象。我使用curl测试了请求

curl https://apis.google.com/_/scs/abc-static/_/js/k=gapi.gapi.en.wgbKiK972Ko.O/m=gapi_iframes,googleapis_client,plusone/rt=j/sv=1/d=1/ed=1/rs=AItRSTOlX0YCaQmKijyj5lpKQ5AVm7UE6A/cb=gapi.loaded_0 -o out_file

我将输出作为正确的整个文件。我检查了响应的标题，它们是

HTTP/1.1 200 OK
Vary: Accept-Encoding
Content-Type: text/javascript; charset=UTF-8
Last-Modified: Thu, 11 Dec 2014 20:44:59 GMT
Date: Fri, 12 Dec 2014 03:38:46 GMT
Expires: Sat, 12 Dec 2015 03:38:46 GMT
X-Content-Type-Options: nosniff
Server: sffe
X-XSS-Protection: 1; mode=block
Cache-Control: public, max-age=31536000
Age: 1065247
Alternate-Protocol: 443:quic,p=0.02
Transfer-Encoding: chunked

现在我尝试在python中使用套接字编程做同样的事情：

    HOST = 'proxy.address.of.squid.proxy'
    PORT = 3128
    s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    s.connect((HOST, PORT))
    url = 'https://apis.google.com/_/scs/abc-static/_/js/k=gapi.gapi.en.wgbKiK972Ko.O/m=gapi_iframes,googleapis_client,plusone/rt=j/sv=1/d=1/ed=1/rs=AItRSTOlX0YCaQmKijyj5lpKQ5AVm7UE6A/cb=gapi.loaded_0'
    httpVrsn = 'HTTP/1.1'
    domain = 'apis.google.com'
    objReq = 'GET '+url+' '+httpVrsn+'\r\nHost: '+domain+'\r\n\r\n';
    s.send(objReq);
    data = '';
    try:
        data = s.recv(1024);
        print data
    # other non-relevant stuff

我得到的输出是

HTTP/1.0 501 Not Implemented
Server: squid/3.1.19
Mime-Version: 1.0
Date: Wed, 24 Dec 2014 10:25:42 GMT
Content-Type: text/html
Content-Length: 3576
X-Squid-Error: ERR_UNSUP_REQ 0
Vary: Accept-Language
Content-Language: en
X-Cache: MISS from localhost
X-Cache-Lookup: NONE from localhost:3128
Via: 1.0 localhost (squid/3.1.19)
Connection: close

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<title>ERROR: The requested URL could not be retrieved</title>
<style type="text/css"><!-- 
 /*
 Stylesheet for Squid Error pages
 Adapted from design by Free CSS Templates
 http://www.freecsstemplates.org
 Released for free under a Creative Commons Attribution 2.5 License
*/

/* Page basics */
* {
  font-family: verdana, sans-serif;
}

html body {
  margin: 0;
  padding: 0;
  background: #efefef;
  font-size: 12px;
  color: #1e1e1e;
}

所以我检查了this，其中解释说我的Squid代理3.1不支持 Transfer-Encoding：chunked 但它说的是POST请求的情况，我不确定如果它也适用于GET请求。我还检查了Unable to test HTTP PUT-based file upload via Squid Proxy。我无法理解curl甚至我的浏览器是否能够在相同代理后面的同一网络上请求时获取内容但是通过python我无法成功获取响应？

所以有没有办法让我的python程序在没有调整Squid代理的情况下工作，因为我无法控制代理。

Answer 1

Curl使用CONNECT方法，这是一种隧道方法。代理只是在TCP级别连接到远程端，curl执行所有通信，包括TLS握手。所有TCP / IP数据包都被铲除了＃39;来自代理人的来回。但请注意，代理的静默拦截（MITM）在某些条件下是可能的（例如，当管理员/公司将自己的CA证书放入您的证书池时）。

你的python脚本做的是要求代理与远程端通信。不知何故，您的代理无法进行TLS连接（在构建期间未配置或禁用，或者根本没有其他功能）。

要配置Squid，请参阅http://wiki.squid-cache.org/Features/HTTPS

Squid代理在python请求上给出501而不是curl

1 个答案: