有没有办法在CSS选择器中包含OR / AND以进行Web抓取

时间:2018-05-02 13:32:54

标签: scrapy css-selectors scrapy-spider

我要做的是,从一个已经改变其结构的网站中抽出来,从map$ vagrant up aws Bringing machine 'aws' up with 'virtualbox' provider... ==> aws: Box 'dummy' could not be found. Attempting to find and install... aws: Box Provider: virtualbox aws: Box Version: >= 0 ==> aws: Box file was not detected as metadata. Adding it directly... ==> aws: Adding box 'dummy' (v0) for provider: virtualbox aws: Downloading: ./boxes/dummy.box aws: An error occurred while downloading the remote file. The error message, if any, is reproduced below. Please fix this error and try again. Couldn't open file /C:/my_folder/boxes/dummy.boxhere 获得<p> H2个孩子H3 }。

目前我可以使用H2或H3单独执行此操作,但在导出到.contains(RESEARCHER)时似乎会产生一些错误。这就是我在做的事情:

.csv

有没有办法将它们合并到一个表达式中?

1 个答案:

答案 0 :(得分:2)

不,您可以做的最好的事情是将两个选择器放在同一个字符串中,逗号介于两者之间:

response.css(".field-item.even h2:contains(RESEARCHER) + p ::text, .field-item.even h3:contains(RESEARCHER) + p ::text").extract()