Question

我正在尝试创建一个可以链接并找到其他链接的scrapy爬虫，当我再次运行它时，它会在这些网站上找到链接。 Here's my current code

在代码的底部，您会看到它

        print(parents)
        print(fust)
        thefile.write(parents)
        thefile.write("~")
        thefile.write(str(response.xpath('//a/@href').extract()[counta]))
        thefile.write("\n")

当我在收集链接后第二次运行爬虫时，它会打印parent变量，例如：

http://roblox.com~http://en.help.roblox.com/

...但是当我第二次运行它时会刮擦http://en.help.roblox.com/，打印

http://roblox.com~http://en.help.roblox.com/

...作为父母var在thefile.write(parents)之前，但当它thefile.write(parents)时它似乎写

〜https://en.help.roblox.com/hc/en-us/articles/115004630823-Roblox-Privacy-and-Cookie-Policy

...并且不会将父变量写入其中。我怎样才能解决这个问题？感谢。

可变打印但不写入文件scrapy

0 个答案: