PowerShell从WebSite下载Txt文件

时间:2019-11-12 06:15:37

标签: powershell

我从网站下载txt文件时遇到问题。下面的脚本下载http代码,而不是实际的txt文件及其内容。

$WebClient = New-Object System.Net.WebClient $WebClient.DownloadFile("https://thegivebackproject.org/CheckStatus.txt", "D:\CheckStatus.txt")

2 个答案:

答案 0 :(得分:1)

您可以使用简单的Invoke-WebRequest

Invoke-WebRequest -URI https://thegivebackproject.org/CheckStatus.txt -OutFile D:\CheckStatus.txt

答案 1 :(得分:1)

简短回答

服务器正在执行浏览器嗅探,以根据请求中的User-Agent标头发送不同的响应。您可以通过发送罐头用户代理字符串来获取所需的响应:

$useragent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.97 Safari/537.36"
Invoke-WebRequest -URI https://thegivebackproject.org/CheckStatus.txt -OutFile c:\temp\CheckStatus.txt -UserAgent $useragent

长期回答

响应您所访问的url的服务器正在进行浏览器嗅探,以确定要返回的内容。如果您给它提供一个User-Agent标头,它可以识别出它将返回您期望的响应(即,文字文本“ Azeemkhan-WaseemRaza”)。

如果您不包含User-Agent标头(而$WebClient.DownloadFile 不包含标头),则服务器将以HTML页面作为响应。

如果您安装了Fiddler之类的HTTP跟踪工具,则可以自己看到此行为。在浏览器中点击页面时,您会看到以下HTTP请求和响应对:

请求

GET https://thegivebackproject.org/CheckStatus.txt HTTP/1.1
Host: thegivebackproject.org
Connection: keep-alive
Pragma: no-cache
Cache-Control: no-cache
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.97 Safari/537.36
Sec-Fetch-User: ?1
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3
Sec-Fetch-Site: none
Sec-Fetch-Mode: navigate
Accept-Encoding: gzip, deflate, br
Accept-Language: en-GB,en-US;q=0.9,en;q=0.8
Cookie: SPSI=ee952ba44e33e958f963807ede78624b

回复

HTTP/1.1 200 OK
Server: nginx
Date: Tue, 12 Nov 2019 08:13:57 GMT
Content-Type: text/plain
Content-Length: 20
Connection: keep-alive
Last-Modified: Thu, 07 Nov 2019 16:15:48 GMT
Accept-Ranges: bytes
X-Cache: MISS

Azeemkhan-WaseemRaza

但是当您使用$WebClient.DownloadFile时,您会看到:

请求

GET https://thegivebackproject.org/CheckStatus.txt HTTP/1.1
Host: thegivebackproject.org

回复

HTTP/1.1 200 OK
Server: nginx
Date: Tue, 12 Nov 2019 08:14:21 GMT
Content-Type: text/html; charset=UTF-8
Transfer-Encoding: chunked
Connection: keep-alive
Set-Cookie: SPSI=9c24f8993046ef610e25cc727c4a4ae2; Path=/
Set-Cookie: adOtr=obsvl; Expires=Thu, 2 Aug 2001 20:47:11 UTC; Path=/
Set-Cookie: UTGv2=D-h4d40f620bfdd6c3b77b035ee99f96621134; Expires=Wed, 11-Nov-20 08:14:21 GMT; Path=/
cache-control: no-store, no-cache, max-age=0, must-revalidate, private,  max-stale=0, post-check=0, pre-check=0
Vary: Accept-Encoding
X-Cache: MISS
Accept-Ranges: bytes

5908
<!doctype html>
<head>
  <meta charset="utf-8">
  <meta http-equiv="x-ua-compatible" content="ie=edge">
  <meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
  <title>StackPath</title>
  <style>
    * {
      box-sizing: border-box;
    }
... etc...

解决方法是在请求中包含一个公认的User-Agent标头,如果您像@BiNZGi建议的那样使用Invoke-WebRequest而不是WebClient类,则更容易实现-请参阅“简短答案”上面的代码。

此外,请注意,User-Agent的这种嗅探行为特定于“ thegivebackproject.org”网站,对于其他网站并不一定适用-您不必总是凭经验包括User-Agent标头。