如何从具有class和id的html文件中选择scrapy中的数据?

时间:2017-03-04 15:43:59

标签: xpath web-scraping scrapy scrapy-spider



<div class="section-body" id="section-2"><p>Most people with aortic stenosis do not develop symptoms until the disease is advanced. The diagnosis may have been made when the health care provider heard a heart murmur and performed tests.</p><p>Symptoms of aortic stenosis include:</p><ul><li>Chest discomfort: The chest pain may get worse with activity and reach into the arm, neck, or jaw. The chest may also feel tight or squeezed.</li><li>Cough, possibly bloody.</li><li>Breathing problems when exercising.</li><li>Becoming easily tired.</li><li>Feeling the heartbeat (palpitations).</li><li>Fainting, weakness, or dizziness with activity.</li></ul><p>In infants and children, symptoms include:</p><ul><li>Becoming easily tired with exertion (in mild cases)</li><li>Failure to gain weight</li><li>Poor feeding</li><li>Serious breathing problems that develop within days or weeks of birth (in severe cases)</li></ul><p>Children with mild or moderate aortic stenosis may get worse as they get older. They are also at risk for a heart infection called bacterial endocarditis.</p></div></div></section>
&#13;
&#13;
&#13;

我有上面的脚本,我想废弃列表中的数据。即在...   我在scrapy中尝试过以下命令,但没有工作。它正在给予&#39; []&#39;作为输出。

 response.css("article div.section-body p").extract() <-- this is giving all info under section body but I want only under section-2
  response.css("article div.section-body.section-2 p::text").extract()
 response.xpath("//article/*[contains(@id, 'setion-2')]").extract()

请帮我提取。谢谢

1 个答案:

答案 0 :(得分:0)

尝试

response.css("article div.section-body#section-2 p::text").extract()

div.section-body#section-2表示选择同时具有等级section-body和ID section-2

的DIV

请注意,#选择了ID,而.选择了类...因此,您的问题中发布的CSS Selector错误。