我从网站下载txt文件时遇到问题。下面的脚本下载http代码,而不是实际的txt文件及其内容。
$WebClient = New-Object System.Net.WebClient
$WebClient.DownloadFile("https://thegivebackproject.org/CheckStatus.txt", "D:\CheckStatus.txt")
答案 0 :(得分:1)
您可以使用简单的Invoke-WebRequest:
Invoke-WebRequest -URI https://thegivebackproject.org/CheckStatus.txt -OutFile D:\CheckStatus.txt
答案 1 :(得分:1)
简短回答
服务器正在执行浏览器嗅探,以根据请求中的User-Agent
标头发送不同的响应。您可以通过发送罐头用户代理字符串来获取所需的响应:
$useragent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.97 Safari/537.36"
Invoke-WebRequest -URI https://thegivebackproject.org/CheckStatus.txt -OutFile c:\temp\CheckStatus.txt -UserAgent $useragent
长期回答
响应您所访问的url的服务器正在进行浏览器嗅探,以确定要返回的内容。如果您给它提供一个User-Agent
标头,它可以识别出它将返回您期望的响应(即,文字文本“ Azeemkhan-WaseemRaza”)。
如果您不包含User-Agent
标头(而$WebClient.DownloadFile
不包含标头),则服务器将以HTML页面作为响应。
如果您安装了Fiddler之类的HTTP跟踪工具,则可以自己看到此行为。在浏览器中点击页面时,您会看到以下HTTP请求和响应对:
请求
GET https://thegivebackproject.org/CheckStatus.txt HTTP/1.1
Host: thegivebackproject.org
Connection: keep-alive
Pragma: no-cache
Cache-Control: no-cache
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.97 Safari/537.36
Sec-Fetch-User: ?1
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3
Sec-Fetch-Site: none
Sec-Fetch-Mode: navigate
Accept-Encoding: gzip, deflate, br
Accept-Language: en-GB,en-US;q=0.9,en;q=0.8
Cookie: SPSI=ee952ba44e33e958f963807ede78624b
回复
HTTP/1.1 200 OK
Server: nginx
Date: Tue, 12 Nov 2019 08:13:57 GMT
Content-Type: text/plain
Content-Length: 20
Connection: keep-alive
Last-Modified: Thu, 07 Nov 2019 16:15:48 GMT
Accept-Ranges: bytes
X-Cache: MISS
Azeemkhan-WaseemRaza
但是当您使用$WebClient.DownloadFile
时,您会看到:
请求
GET https://thegivebackproject.org/CheckStatus.txt HTTP/1.1
Host: thegivebackproject.org
回复
HTTP/1.1 200 OK
Server: nginx
Date: Tue, 12 Nov 2019 08:14:21 GMT
Content-Type: text/html; charset=UTF-8
Transfer-Encoding: chunked
Connection: keep-alive
Set-Cookie: SPSI=9c24f8993046ef610e25cc727c4a4ae2; Path=/
Set-Cookie: adOtr=obsvl; Expires=Thu, 2 Aug 2001 20:47:11 UTC; Path=/
Set-Cookie: UTGv2=D-h4d40f620bfdd6c3b77b035ee99f96621134; Expires=Wed, 11-Nov-20 08:14:21 GMT; Path=/
cache-control: no-store, no-cache, max-age=0, must-revalidate, private, max-stale=0, post-check=0, pre-check=0
Vary: Accept-Encoding
X-Cache: MISS
Accept-Ranges: bytes
5908
<!doctype html>
<head>
<meta charset="utf-8">
<meta http-equiv="x-ua-compatible" content="ie=edge">
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
<title>StackPath</title>
<style>
* {
box-sizing: border-box;
}
... etc...
解决方法是在请求中包含一个公认的User-Agent
标头,如果您像@BiNZGi建议的那样使用Invoke-WebRequest
而不是WebClient类,则更容易实现-请参阅“简短答案”上面的代码。
此外,请注意,User-Agent
的这种嗅探行为特定于“ thegivebackproject.org”网站,对于其他网站并不一定适用-您不必总是凭经验包括User-Agent
标头。