是否有可能阻止Net :: HTTP请求发送一些标头?

时间:2018-06-21 14:49:56

标签: ruby web-scraping http-headers default-value net-http

我有以下代码:

require 'uri'
require 'net/http'

URL = 'https://www.scraping-me.com/ok/dude';
uri = URI.parse(URL)
http = Net::HTTP.new(uri.host, uri.port)
http.use_ssl = true
http.set_debug_output($stdout)
request = Net::HTTP::Post.new(uri.path)
request['Content-Length'] = '0'
p http.request(request)

执行它,我收到以下输出:

opening connection to www.scraping-me.com:443...
opened
starting SSL for www.scraping-me.com:443...
SSL established
<- "POST /ok/dude HTTP/1.1\r\nAccept-Encoding: gzip;q=1.0,deflate;q=0.6,identity;q=0.3\r\nAccept: */*\r\nUser-Agent: Ruby\r\nContent-Length: 0\r\nConnection: close\r\nHost: www.scraping-me.com\r\nContent-Type: application/x-www-form-urlencoded\r\n\r\n"
<- ""
-> "HTTP/1.1 403 Forbidden\r\n"
-> "Server: CloudFront\r\n"
-> "Date: Thu, 21 Jun 2018 13:23:20 GMT\r\n"
-> "Content-Type: text/html\r\n"
-> "Content-Length: 560\r\n"
-> "Connection: close\r\n"
-> "X-Cache: Error from cloudfront\r\n"
-> "Via: 1.1 8c17e8fbe0b8e6fb8aa40ba7a7b911d2.cloudfront.net (CloudFront)\r\n"
-> "X-Amz-Cf-Id: vQZZvIl543OJSbMxkmMWSrpsUFYaZJ6f2VOiAI_CQd8jTAteK8X73Q==\r\n"
-> "\r\n"
reading 560 bytes...
-> "<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\" \"http://www.w3.org/TR/html4/loose.dtd\">\n<HTML><HEAD><META HTTP-EQUIV=\"Content-Type\" CONTENT=\"text/html; charset=iso-8859-1\">\n<TITLE>ERROR: The request could not be satisfied</TITLE>\n</HEAD><BODY>\n<H1>403 ERROR</H1>\n<H2>The request could not be satisfied.</H2>\n<HR noshade size=\"1px\">\nRequest blocked.\n\n<BR clear=\"all\">\n<HR noshade size=\"1px\">\n<PRE>\nGenerated by cloudfront (CloudFront)\nRequest ID: vQZZvIl543OJSbMxkmMWSrpsUFYaZJ6f2VOiAI_CQd8jTAteK8X73Q==\n</PRE>\n<ADDRESS>\n</ADDRESS>\n</BODY></HTML>"
read 560 bytes
Conn close
#<Net::HTTPForbidden 403 Forbidden readbody=true>

是否可以阻止Net :: HTTP发送一些标头,例如User-Agent?

将其设置为空字符串也将发送它。当我将其设置为零时,也会发生同样的情况。

1 个答案:

答案 0 :(得分:0)

基于Configure Response Headers, 您必须在p http.request(request)行之前添加:

request.instance_eval { @header.delete('user-agent') }

或者您可以通过{p>来获取@header进行查看/编辑

request.instance_variable_get(:@header)