从Scrapy

时间:2017-04-22 23:20:28

标签: python scrapy twisted

我需要以某种方式从Scrapy中的Request对象中提取纯HTTP请求消息(例如,我可以复制/粘贴此请求并从Burp运行)。

所以给定一个scrapy.http.Request对象,我想得到相应的请求消息,例如,例如。

POST /test/demo_form.php HTTP/1.1
Host: w3schools.com

name1=value1&name2=value2

显然,我在Request对象中拥有所需的所有信息,但是尝试手动重建消息很容易出错,因为我可能会错过一些边缘情况。我的理解是Scrapy首先将此Request转换为Twisted对象,然后将标题和正文写入TCP传输。所以也许还可以做类似的事情,但是写一个字符串呢?

更新

我可以使用以下代码获取HTTP 1.0请求消息,该消息基于http.py。有没有办法与HTTP 1.1次请求/ http11.py做类似的事情,这实际上是什么?我显然希望尽可能避免重复Scrapy / Twisted框架中的代码。

factory = webclient.ScrapyHTTPClientFactory(request)
transport = StringTransport()
protocol = webclient.ScrapyHTTPPageGetter()
protocol.factory = factory protocol.makeConnection(transport)
request_message = transport.value()
print(request_message.decode("utf-8"))

1 个答案:

答案 0 :(得分:1)

由于scrapy是开源的,并且还有很多扩展点,这应该是可行的。

请求最终汇总并在ScrapyAgent.download_request ScrapyAgent.download_requesthttps://github.com/scrapy/scrapy/blob/master/scrapy/core/downloader/handlers/http11.py#L270

中的scrapy / core / downloader / handlers / http11.py中发送

如果您将钩子放在那里,您可以转储请求类型,请求标头和请求正文。

要将代码放在那里,您可以尝试使用猴子修补ScrapyAgent或子类HTTP11DownloadHandler来执行请求记录,然后将from tkinter import * # canvas specifications/variables canvas_height = 400 canvas_width = 600 p1_canvas_colour = "white" p1_canvas_colour_number = 1 #p1 (what you draw with) specifications/variables p1_x = canvas_width/2 p1_y = canvas_height p1_colour = "black" p1_colour_change = 1 line_width = 1 line_length = 1 #p1 controls def p1_move_N(event): global p1_y canvas.create_line(p1_x, p1_y, p1_x, (p1_y-line_length), width=line_width, fill=p1_colour) p1_y = p1_y - line_length def p1_move_S(event): global p1_y canvas.create_line(p1_x, p1_y, p1_x, (p1_y+line_length), width=line_width, fill=p1_colour) p1_y = p1_y + line_length def p1_move_E(event): global p1_x canvas.create_line(p1_x, p1_y, (p1_x+line_length), p1_y, width=line_width, fill=p1_colour) p1_x = p1_x + line_length def p1_move_W(event): global p1_x canvas.create_line(p1_x, p1_y, (p1_x-line_length), p1_y, width=line_width, fill=p1_colour) p1_x = p1_x - line_length def erase_all(event): canvas.delete(ALL) #the p1 colour change def p1_line_colour_change(event): global p1_colour global p1_colour_change p1_colour_change += 1 if p1_colour_change == 1: p1_colour = ("black") if p1_colour_change == 2: p1_colour = ("white") if p1_colour_change == 3: p1_colour = ("red") if p1_colour_change == 4: p1_colour = ("orange") if p1_colour_change == 5: p1_colour = ("yellow") if p1_colour_change == 6: p1_colour = ("green") if p1_colour_change == 7: p1_colour = ("blue") if p1_colour_change == 8: p1_colour = ("purple") if p1_colour_change == 9: p1_colour = ("pink") if p1_colour_change == 10: p1_colour = ("brown") if p1_colour_change == 11: p1_colour_change = 1 p1_colour = ("black") #the canvas colour change def p1_canvas_colour_change(event): global p1_canvas_colour global p1_canvas_colour_number p1_canvas_colour_number += 1 if p1_canvas_colour_number == 1: p1_canvas_colour = ("white") if p1_canvas_colour_number == 2: p1_canvas_colour = ("black") if p1_canvas_colour_number == 3: p1_canvas_colour = ("red") if p1_canvas_colour_number == 4: p1_canvas_colour = ("orange") if p1_canvas_colour_number == 5: p1_canvas_colour = ("yellow") if p1_canvas_colour_number == 6: p1_canvas_colour = ("green") if p1_canvas_colour_number == 7: p1_canvas_colour = ("blue") if p1_canvas_colour_number == 8: p1_canvas_colour = ("purple") if p1_canvas_colour_number == 9: p1_canvas_colour = ("pink") if p1_canvas_colour_number == 10: p1_canvas_colour = ("brown") if p1_canvas_colour_number == 11: p1_canvas_colour_number = 1 p1_canvas_colour = ("white") #the window/canvas window = Tk() window.title("Drawing") canvas = Canvas(bg=p1_canvas_colour, height=canvas_height, width=canvas_width, highlightthickness=0) canvas.pack() #binding the functions to keys window.bind("<Up>", p1_move_N) window.bind("<Down>", p1_move_S) window.bind("<Left>", p1_move_W) window.bind("<Right>", p1_move_E) window.bind("<BackSpace>", erase_all) window.bind("<KP_Enter>", p1_line_colour_change) window.bind("<Shift_R>", p1_canvas_colour_change) window.mainloop() 子类化为使用Scrapy代理,然后将HTTP11DownloadHandler设置为项目的settings.py中的http / https请求的新DOWNLOAD_HANDLER(详见:https://doc.scrapy.org/en/latest/topics/settings.html#download-handlers

在我看来,这是您最接近的记录请求,而不使用数据包嗅探器或日志代理(这可能对您的方案有点过分)。