该网站为https://www.orias.fr
。它似乎使用tls 1.2并重定向到/welcome
页面。
这是检查其可访问性的一段代码( .NET 4.5 ):
Private Function getURLStatus() As String
ServicePointManager.SecurityProtocol = ServicePointManager.SecurityProtocol.Tls12
Dim req As HttpWebRequest = HttpWebRequest.Create("https://www.orias.fr")
req.UserAgent = "Mozilla/5.0 (Windows NT 6.3; Trident/7.0; rv:11.0) like Gecko"
req.AllowAutoRedirect = True
Dim resp As HttpWebResponse = Nothing
Try
resp = DirectCast(req.GetResponse(), HttpWebResponse)
Catch ex As WebException
resp = DirectCast(ex.Response, HttpWebResponse)
If resp Is Nothing Then Return ex.Message
End Try
Return CInt(resp.StatusCode)
End Function
这为我提供了404
HTTP响应。
另外,我尝试使用 cURL
(7.54.1 - Cygwin,输出粘贴在下面)访问此网址。经过一些握手线,我注意到了这一点:
$ curl -vA "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:57.0) Gecko/20100101 Firefox/57.0" 'https://www.orias.fr'
[...]
* HTTP 1.1 or later with persistent connection, pipelining supported
< HTTP/1.1 404 Not Found
但 cURL
能够“克服”此响应并获取网络内容。我怎样才能在我的代码中执行此操作?
cURL
输出
$ curl -vA "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:57.0) Gecko/20100101 Firefox/57.0" 'https://www.orias.fr'
* STATE: INIT => CONNECT handle 0x600057930; line 1410 (connection #-5000)
* Rebuilt URL to: https://www.orias.fr/
* Added connection 0. The cache now contains 1 members
* STATE: CONNECT => WAITRESOLVE handle 0x600057930; line 1446 (connection #0)
* Trying 160.92.131.100...
* TCP_NODELAY set
* STATE: WAITRESOLVE => WAITCONNECT handle 0x600057930; line 1527 (connection #0)
* Connected to www.orias.fr (160.92.131.100) port 443 (#0)
* STATE: WAITCONNECT => SENDPROTOCONNECT handle 0x600057930; line 1579 (connection #0)
* Marked for [keep alive]: HTTP default
* ALPN, offering h2
* ALPN, offering http/1.1
* Cipher selection: ALL:!EXPORT:!EXPORT40:!EXPORT56:!aNULL:!LOW:!RC4:@STRENGTH
* successfully set certificate verify locations:
* CAfile: /etc/pki/tls/certs/ca-bundle.crt
CApath: none
* TLSv1.2 (OUT), TLS header, Certificate Status (22):
* TLSv1.2 (OUT), TLS handshake, Client hello (1):
* STATE: SENDPROTOCONNECT => PROTOCONNECT handle 0x600057930; line 1593 (connection #0)
* TLSv1.2 (IN), TLS handshake, Server hello (2):
* TLSv1.2 (IN), TLS handshake, Certificate (11):
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
* TLSv1.2 (IN), TLS handshake, Server finished (14):
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
* TLSv1.2 (OUT), TLS change cipher, Client hello (1):
* TLSv1.2 (OUT), TLS handshake, Finished (20):
* TLSv1.2 (IN), TLS change cipher, Client hello (1):
* TLSv1.2 (IN), TLS handshake, Finished (20):
* SSL connection using TLSv1.2 / ECDHE-RSA-AES256-GCM-SHA384
* ALPN, server did not agree to a protocol
* Server certificate:
* subject: C=FR; L=Paris; O=ORIAS; CN=orias.fr
* start date: Mar 31 13:49:03 2017 GMT
* expire date: Apr 28 14:19:02 2019 GMT
* subjectAltName: host "www.orias.fr" matched cert's "www.orias.fr"
* issuer: C=US; O=Entrust, Inc.; OU=See www.entrust.net/legal-terms; OU=(c) 2012 Entrust, Inc. - for authorized use only; CN=Entrust Certification Authority - L1K
* SSL certificate verify ok.
* STATE: PROTOCONNECT => DO handle 0x600057930; line 1614 (connection #0)
> GET / HTTP/1.1
> Host: www.orias.fr
> User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:57.0) Gecko/20100101 Firefox/57.0
> Accept: */*
>
* STATE: DO => DO_DONE handle 0x600057930; line 1676 (connection #0)
* STATE: DO_DONE => WAITPERFORM handle 0x600057930; line 1801 (connection #0)
* STATE: WAITPERFORM => PERFORM handle 0x600057930; line 1811 (connection #0)
* HTTP 1.1 or later with persistent connection, pipelining supported
< HTTP/1.1 404 Not Found
< Date: Thu, 21 Dec 2017 18:45:04 GMT
* Server Apache is not blacklisted
< Server: Apache
< Set-Cookie: JSESSIONID=51A18E34C2884CDF4FBC71CE98CDD400.B8D1472E80D54428AFD2557FEED2; Path=/; Secure; HttpOnly
< Transfer-Encoding: chunked
< Content-Type: text/html;charset=ISO-8859-1
[HTML content]
答案 0 :(得分:0)
事实上, cURL
为https//www.orias.fr返回的HTML仅包含重定向:
<html>
<head>
<title></title>
<meta content="1; url=/c" http-equiv="refresh" />
</head>
<body onload="javascript:location.replace('/c')">
<!-- truncated irrelevant content (fills up space for IE) -->
</body>
</html>
浏览器通过/c
元素或通过Javascript执行重定向到<meta>
。 cURL
显示了/c
网址的302重定向:
< HTTP/1.1 302 Found
[...]
< Location: https://www.orias.fr/c/portal/layout
< Content-Length: 0
< Content-Type: text/html
然后 cURL
显示/c/portal/layout
:
< HTTP/1.1 302 Found
[...]
< Location: https://www.orias.fr/welcome
< Content-Length: 0
< Content-Type: text/html;charset=UTF-8
最后,/welcome
返回200
。
TL; DR:检查网址是否为&#34;人类&#34;有效,表现得像一个真实的&#34;浏览器。