我正在研究刮刀,我需要从一个网站上抓取数据。我已经厌倦了非常简单的webclient downloadString来获取数据,这似乎与其他网站一起工作,但是我在以下代码中尝试了这个问题。任何帮助将不胜感激。谢谢。遵循VB.Net中的代码,但我也很高兴在c#中有一个可行的解决方案。
Private Function GetHtml() As String
Dim mData As String = ""
Try
'ServicePointManager.ServerCertificateValidationCallback = New Security.RemoteCertificateValidationCallback(AddressOf ValidateServerCertificate)
'ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls12
With mWC
mData = .DownloadString("https://www.adorama.com/brands")
End With
Catch ex As Exception
Debug.Print(ex.Message)
'With CertificateValidationCallback
'The remote server returned an error: (403) Forbidden.
'Without CertificateValidationCallback
'The request was aborted: Could not create SSL/TLS secure channel.
End Try
Return mData
End Function
Private Shared Function ValidateServerCertificate(ByVal sender As Object, ByVal certificate As X509Certificate, ByVal chain As X509Chain, ByVal sslPolicyErrors As Net.Security.SslPolicyErrors) As Boolean
If sslPolicyErrors = Net.Security.SslPolicyErrors.None Then
Return True
End If
Return True
End Function
答案 0 :(得分:1)
在使用.DownloadString()方法之前,您必须为此网站添加User-Agent。
mWC.Headers.Add("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:55.0) Gecko/20100101 Firefox/55.0")