Question

我有一个URL列表，并希望抓取每个网页的位置对象。我引用的数据是通过在浏览器的控制台中输入“window.location”来生成的。例如，使用Chrome在www.github.com上执行此操作会为您提供以下输出：

位置{assign：function，replace：function，reload：function，ancestorOrigins：DOMStringList，origin：“https://github.com”...}

展开后，您可以看到更多信息：

Location {
    ancestorOrigins: DOMStringList 
    assign: function () { [native code] } 
    hash: "" 
    host: "github.com" 
    hostname: "github.com" 
    href: "https://github.com/" 
    origin: "https://github.com" 
    pathname: "/" 
    port: "" 
    protocol: "https:" 
    reload: function () { [native code] } 
    replace: function () { [native code] } 
    search: "" 
    toString: function toString() { [native code] } 
    valueOf: function valueOf() { [native code] } 
    __proto__: Location  
}

我过去曾使用过Python和Mechanize库，但直到现在才想要这个功能，我不知道如何继续。任何建议都会受到欢迎。

Answer 1

据我了解，您希望在所需的网页上执行JavaScript调用。我的建议是使用一些无头浏览器。我使用名为PyQt4的Framework做了类似的事情。您还可以使用PhantomJS等其他无头网络浏览器。或者你也可能对Selenium这个工具感兴趣。

如何刮取位置物体？

1 个答案: