我试图抓住这个页面:
http://binpar.caicyt.gov.ar/cgi-bin/koha/opac-detail.pl?biblionumber=98723
使用此代码:
getline
但我只获得了出版物的标题,我不确定原因。
干杯!
答案 0 :(得分:0)
您应该通过property
属性获取内部字段:
$ scrapy shell http://binpar.caicyt.gov.ar/cgi-bin/koha/opac-detail.pl?biblionumber=98723
>>> for publication in response.css('div#wrap > div.main > div.container-fluid > div.row-fluid > div.span9 > div#catalogue_detail_biblio > div.record'):
... author = publication.css("span[property=contributor] span[property=name]::text").extract_first()
... title = publication.css("h1[property=name]::text").extract_first()
... issn = publication.css("span[property=issn]::text").extract_first()
... print(author, title, issn)
...
(u'Asociaci\xf3n Filat\xe9lica de la Rep\xfablica Argentina', u'AFRA, bolet\xedn informativo. ', u'0001-1193.')