Scrapy没有获得所有数据

时间:2016-02-08 19:50:46

标签: xpath syntax web-scraping scrapy

我试图抓住这个页面:

http://binpar.caicyt.gov.ar/cgi-bin/koha/opac-detail.pl?biblionumber=98723

使用此代码:

getline

但我只获得了出版物的标题,我不确定原因。

干杯!

1 个答案:

答案 0 :(得分:0)

您应该通过property属性获取内部字段:

$ scrapy shell http://binpar.caicyt.gov.ar/cgi-bin/koha/opac-detail.pl?biblionumber=98723
>>> for publication in response.css('div#wrap > div.main > div.container-fluid > div.row-fluid > div.span9 > div#catalogue_detail_biblio > div.record'):
...     author = publication.css("span[property=contributor] span[property=name]::text").extract_first()
...     title = publication.css("h1[property=name]::text").extract_first()
...     issn = publication.css("span[property=issn]::text").extract_first()
...     print(author, title, issn)
... 
(u'Asociaci\xf3n Filat\xe9lica de la Rep\xfablica Argentina', u'AFRA, bolet\xedn informativo. ', u'0001-1193.')