使用HttpWebRequest的404 HTTP响应,用于使用浏览器的HTTPS URL

时间:2017-12-21 18:53:20

标签: .net curl https httpwebrequest

该网站为https://www.orias.fr。它似乎使用tls 1.2并重定向到/welcome页面。

这是检查其可访问性的一段代码( .NET 4.5 ):

Private Function getURLStatus() As String

    ServicePointManager.SecurityProtocol = ServicePointManager.SecurityProtocol.Tls12
    Dim req As HttpWebRequest = HttpWebRequest.Create("https://www.orias.fr")
    req.UserAgent = "Mozilla/5.0 (Windows NT 6.3; Trident/7.0; rv:11.0) like Gecko"
    req.AllowAutoRedirect = True
    Dim resp As HttpWebResponse = Nothing

    Try
        resp = DirectCast(req.GetResponse(), HttpWebResponse)
    Catch ex As WebException
        resp = DirectCast(ex.Response, HttpWebResponse)
        If resp Is Nothing Then Return ex.Message
    End Try

    Return CInt(resp.StatusCode)

End Function

这为我提供了404 HTTP响应。

另外,我尝试使用 cURL (7.54.1 - Cygwin,输出粘贴在下面)访问此网址。经过一些握手线,我注意到了这一点:

$ curl -vA "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:57.0) Gecko/20100101 Firefox/57.0" 'https://www.orias.fr'
[...]
* HTTP 1.1 or later with persistent connection, pipelining supported
    < HTTP/1.1 404 Not Found

cURL 能够“克服”此响应并获取网络内容。我怎样才能在我的代码中执行此操作?

cURL输出

$ curl -vA "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:57.0) Gecko/20100101 Firefox/57.0" 'https://www.orias.fr'
* STATE: INIT => CONNECT handle 0x600057930; line 1410 (connection #-5000)
* Rebuilt URL to: https://www.orias.fr/
* Added connection 0. The cache now contains 1 members
* STATE: CONNECT => WAITRESOLVE handle 0x600057930; line 1446 (connection #0)
*   Trying 160.92.131.100...
* TCP_NODELAY set
* STATE: WAITRESOLVE => WAITCONNECT handle 0x600057930; line 1527 (connection #0)
* Connected to www.orias.fr (160.92.131.100) port 443 (#0)
* STATE: WAITCONNECT => SENDPROTOCONNECT handle 0x600057930; line 1579 (connection #0)
* Marked for [keep alive]: HTTP default
* ALPN, offering h2
* ALPN, offering http/1.1
* Cipher selection: ALL:!EXPORT:!EXPORT40:!EXPORT56:!aNULL:!LOW:!RC4:@STRENGTH
* successfully set certificate verify locations:
*   CAfile: /etc/pki/tls/certs/ca-bundle.crt
  CApath: none
* TLSv1.2 (OUT), TLS header, Certificate Status (22):
* TLSv1.2 (OUT), TLS handshake, Client hello (1):
* STATE: SENDPROTOCONNECT => PROTOCONNECT handle 0x600057930; line 1593 (connection #0)
* TLSv1.2 (IN), TLS handshake, Server hello (2):
* TLSv1.2 (IN), TLS handshake, Certificate (11):
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
* TLSv1.2 (IN), TLS handshake, Server finished (14):
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
* TLSv1.2 (OUT), TLS change cipher, Client hello (1):
* TLSv1.2 (OUT), TLS handshake, Finished (20):
* TLSv1.2 (IN), TLS change cipher, Client hello (1):
* TLSv1.2 (IN), TLS handshake, Finished (20):
* SSL connection using TLSv1.2 / ECDHE-RSA-AES256-GCM-SHA384
* ALPN, server did not agree to a protocol
* Server certificate:
*  subject: C=FR; L=Paris; O=ORIAS; CN=orias.fr
*  start date: Mar 31 13:49:03 2017 GMT
*  expire date: Apr 28 14:19:02 2019 GMT
*  subjectAltName: host "www.orias.fr" matched cert's "www.orias.fr"
*  issuer: C=US; O=Entrust, Inc.; OU=See www.entrust.net/legal-terms; OU=(c) 2012 Entrust, Inc. - for authorized use only; CN=Entrust Certification Authority - L1K
*  SSL certificate verify ok.
* STATE: PROTOCONNECT => DO handle 0x600057930; line 1614 (connection #0)
> GET / HTTP/1.1
> Host: www.orias.fr
> User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:57.0) Gecko/20100101 Firefox/57.0
> Accept: */*
>
* STATE: DO => DO_DONE handle 0x600057930; line 1676 (connection #0)
* STATE: DO_DONE => WAITPERFORM handle 0x600057930; line 1801 (connection #0)
* STATE: WAITPERFORM => PERFORM handle 0x600057930; line 1811 (connection #0)
* HTTP 1.1 or later with persistent connection, pipelining supported
< HTTP/1.1 404 Not Found
< Date: Thu, 21 Dec 2017 18:45:04 GMT
* Server Apache is not blacklisted
< Server: Apache
< Set-Cookie: JSESSIONID=51A18E34C2884CDF4FBC71CE98CDD400.B8D1472E80D54428AFD2557FEED2; Path=/; Secure; HttpOnly
< Transfer-Encoding: chunked
< Content-Type: text/html;charset=ISO-8859-1
[HTML content]

1 个答案:

答案 0 :(得分:0)

事实上, cURL https//www.orias.fr返回的HTML仅包含重定向:

<html>
    <head>
        <title></title>
        <meta content="1; url=/c" http-equiv="refresh" />
    </head>
    <body onload="javascript:location.replace('/c')">
        <!-- truncated irrelevant content (fills up space for IE) -->
    </body>
</html>

浏览器通过/c元素或通过Javascript执行重定向到<meta> cURL 显示了/c网址的302重定向:

< HTTP/1.1 302 Found
[...]
< Location: https://www.orias.fr/c/portal/layout
< Content-Length: 0
< Content-Type: text/html

然后 cURL 显示/c/portal/layout

< HTTP/1.1 302 Found
[...]
< Location: https://www.orias.fr/welcome
< Content-Length: 0
< Content-Type: text/html;charset=UTF-8

最后,/welcome返回200

TL; DR:检查网址是否为&#34;人类&#34;有效,表现得像一个真实的&#34;浏览器。