Question

我试图抓住这个网站

http://www.gramfeed.com/instagram/tags#Andorra

我正试图从帖子中获取所有数据。这就是我正在尝试但不幸的是帖子并没有获得所有帖子的列表。知道我做错了什么吗？谢谢！

class GramfeedSpider(Spider):
name = "gramfeed"
allowed_domains = ["gramfeed.com"]
start_urls = ["http://www.gramfeed.com/instagram/tags#Andorra"]

def parse(self, response):
    """
    The lines below is a spider contract. For more info see:
    http://doc.scrapy.org/en/latest/topics/contracts.html

    @url http://www.gramfeed.com/instagram/tags#Andorra
    @scrapes name 
    """
    sel = Selector(response)
    posts = sel.xpath('//div[@id="content"]/div')
    #posts = sel.xpath('//div[@id="content"]/div[@class="grid-cell"]')
    #posts = sel.xpath('//div[@id="content"]/div[@onclick="showPhoto(0)"]')
    print "@@@@@@"
    print posts
    print "@@@@@@"

Answer 1

这是一个非常动态的网页，结果是异步加载的，您需要一个Javascript引擎才能在此页面上执行JavaScript。您应该看看是否可以使用scrapy-splash中间件或selenium解决此问题。

为什么这个XML选择器没有从我试图抓取的网站上获取正确的数据？

1 个答案: