Question

我正在使用scrapy抓取一个网站。 div列表中有些显示无，其他是显示块。我只想从显示块的div中获取数据。但是我无法从div中获取style属性。我还检查了stackoverflow上的解决方案，即

response.xpath("//div").xpath("@style").extract()

这给了我输出中的空白列表。它不是获取样式属性。或者我可以使用scrapy获取原始HTML然后在美丽的汤的帮助下获取div的style属性？或者，如果我可以获得字符串中的原始html也将有所帮助。我想要的只是风格属性。

示例html： -

<div class="asd">div content need to extract</div>

<div class="asd" style="display:none">no need to extract</div>

Answer 1

基于您的示例html，此解决方案可能很有用（使用beautifulsoup）：

divs = soup.select('div.asd')
theDiv = [div for div in divs if 'style' not in div.attrs]

Answer 2

我认为你的xpath已关闭，试试这个：

response.xpath("//div/@style").extract()

或：

response.xpath("//div").xpath("./@style").extract()
# notice the relevant path here^^