我需要以某种方式从Scrapy中的Request
对象中提取纯HTTP请求消息(例如,我可以复制/粘贴此请求并从Burp运行)。
所以给定一个scrapy.http.Request
对象,我想得到相应的请求消息,例如,例如。
POST /test/demo_form.php HTTP/1.1
Host: w3schools.com
name1=value1&name2=value2
显然,我在Request
对象中拥有所需的所有信息,但是尝试手动重建消息很容易出错,因为我可能会错过一些边缘情况。我的理解是Scrapy首先将此Request
转换为Twisted
对象,然后将标题和正文写入TCP传输。所以也许还可以做类似的事情,但是写一个字符串呢?
更新
我可以使用以下代码获取HTTP 1.0
请求消息,该消息基于http.py
。有没有办法与HTTP 1.1
次请求/ http11.py
做类似的事情,这实际上是什么?我显然希望尽可能避免重复Scrapy
/ Twisted
框架中的代码。
factory = webclient.ScrapyHTTPClientFactory(request)
transport = StringTransport()
protocol = webclient.ScrapyHTTPPageGetter()
protocol.factory = factory protocol.makeConnection(transport)
request_message = transport.value()
print(request_message.decode("utf-8"))
答案 0 :(得分:1)
由于scrapy是开源的,并且还有很多扩展点,这应该是可行的。
请求最终汇总并在ScrapyAgent.download_request
ScrapyAgent.download_request
(https://github.com/scrapy/scrapy/blob/master/scrapy/core/downloader/handlers/http11.py#L270)
如果您将钩子放在那里,您可以转储请求类型,请求标头和请求正文。
要将代码放在那里,您可以尝试使用猴子修补ScrapyAgent
或子类HTTP11DownloadHandler
来执行请求记录,然后将from tkinter import *
# canvas specifications/variables
canvas_height = 400
canvas_width = 600
p1_canvas_colour = "white"
p1_canvas_colour_number = 1
#p1 (what you draw with) specifications/variables
p1_x = canvas_width/2
p1_y = canvas_height
p1_colour = "black"
p1_colour_change = 1
line_width = 1
line_length = 1
#p1 controls
def p1_move_N(event):
global p1_y
canvas.create_line(p1_x, p1_y, p1_x, (p1_y-line_length), width=line_width, fill=p1_colour)
p1_y = p1_y - line_length
def p1_move_S(event):
global p1_y
canvas.create_line(p1_x, p1_y, p1_x, (p1_y+line_length), width=line_width, fill=p1_colour)
p1_y = p1_y + line_length
def p1_move_E(event):
global p1_x
canvas.create_line(p1_x, p1_y, (p1_x+line_length), p1_y, width=line_width, fill=p1_colour)
p1_x = p1_x + line_length
def p1_move_W(event):
global p1_x
canvas.create_line(p1_x, p1_y, (p1_x-line_length), p1_y, width=line_width, fill=p1_colour)
p1_x = p1_x - line_length
def erase_all(event):
canvas.delete(ALL)
#the p1 colour change
def p1_line_colour_change(event):
global p1_colour
global p1_colour_change
p1_colour_change += 1
if p1_colour_change == 1:
p1_colour = ("black")
if p1_colour_change == 2:
p1_colour = ("white")
if p1_colour_change == 3:
p1_colour = ("red")
if p1_colour_change == 4:
p1_colour = ("orange")
if p1_colour_change == 5:
p1_colour = ("yellow")
if p1_colour_change == 6:
p1_colour = ("green")
if p1_colour_change == 7:
p1_colour = ("blue")
if p1_colour_change == 8:
p1_colour = ("purple")
if p1_colour_change == 9:
p1_colour = ("pink")
if p1_colour_change == 10:
p1_colour = ("brown")
if p1_colour_change == 11:
p1_colour_change = 1
p1_colour = ("black")
#the canvas colour change
def p1_canvas_colour_change(event):
global p1_canvas_colour
global p1_canvas_colour_number
p1_canvas_colour_number += 1
if p1_canvas_colour_number == 1:
p1_canvas_colour = ("white")
if p1_canvas_colour_number == 2:
p1_canvas_colour = ("black")
if p1_canvas_colour_number == 3:
p1_canvas_colour = ("red")
if p1_canvas_colour_number == 4:
p1_canvas_colour = ("orange")
if p1_canvas_colour_number == 5:
p1_canvas_colour = ("yellow")
if p1_canvas_colour_number == 6:
p1_canvas_colour = ("green")
if p1_canvas_colour_number == 7:
p1_canvas_colour = ("blue")
if p1_canvas_colour_number == 8:
p1_canvas_colour = ("purple")
if p1_canvas_colour_number == 9:
p1_canvas_colour = ("pink")
if p1_canvas_colour_number == 10:
p1_canvas_colour = ("brown")
if p1_canvas_colour_number == 11:
p1_canvas_colour_number = 1
p1_canvas_colour = ("white")
#the window/canvas
window = Tk()
window.title("Drawing")
canvas = Canvas(bg=p1_canvas_colour, height=canvas_height, width=canvas_width, highlightthickness=0)
canvas.pack()
#binding the functions to keys
window.bind("<Up>", p1_move_N)
window.bind("<Down>", p1_move_S)
window.bind("<Left>", p1_move_W)
window.bind("<Right>", p1_move_E)
window.bind("<BackSpace>", erase_all)
window.bind("<KP_Enter>", p1_line_colour_change)
window.bind("<Shift_R>", p1_canvas_colour_change)
window.mainloop()
子类化为使用Scrapy代理,然后将HTTP11DownloadHandler设置为项目的settings.py中的http / https请求的新DOWNLOAD_HANDLER(详见:https://doc.scrapy.org/en/latest/topics/settings.html#download-handlers)
在我看来,这是您最接近的记录请求,而不使用数据包嗅探器或日志代理(这可能对您的方案有点过分)。