使用slimit抓取嵌套属性 - Python

时间:2017-10-11 00:35:44

标签: python

我试图从具有如下嵌套结构的网站获取网址:

<script>
    var.model = {
       data: {
           "alist": [{
                'url': 'http://www.here.org'
           }]
       }
    }
</script>

问题是,我似乎找不到通过这棵树的方法。我有使用BeautifulSoup4但没有使用slimit的经验,我现在正在努力学习。

def grabLinks(base, limit):
    base_url = requests.get(base)
    soup = BeautifulSoup(base_url.content, "html.parser")
    for script in soup.find_all("script", {'src': False}):
        if isinstance(script, NavigableString): continue
        tree = Parser().parse(script.text)
        for node in nodevisitor.visit(tree):
            if isinstance(node, ast.Assign) and getattr(node.left, 'value', '') == "data":
                print(getattr(node.right, 'properties'))

我可以到达名称&#34; alist&#34; (有很多getattrs),但我无法访问其中的值字典。如果您需要更多信息,请告诉我。

编辑:已修复!更新的代码:

def grabLinks(base, limit):
    base_url = requests.get(base)
    soup = BeautifulSoup(base_url.content, "html.parser")
    for script in soup.find_all("script", {'src': False}):
        if isinstance(script, NavigableString): continue
        tree = Parser().parse(script.text)
        for node in nodevisitor.visit(tree):
            if isinstance(node, a.Assign) and getattr(node.left, 'value', '') == "data":
                sibnode = getattr(node.right, 'properties')[0]
                print(sibnode.right.items[0].properties[0].right.value)

0 个答案:

没有答案