Question

＆＃13;

<div class="section-body" id="section-2"><p>Most people with aortic stenosis do not develop symptoms until the disease is advanced. The diagnosis may have been made when the health care provider heard a heart murmur and performed tests.</p><p>Symptoms of aortic stenosis include:</p><ul><li>Chest discomfort: The chest pain may get worse with activity and reach into the arm, neck, or jaw. The chest may also feel tight or squeezed.</li><li>Cough, possibly bloody.</li><li>Breathing problems when exercising.</li><li>Becoming easily tired.</li><li>Feeling the heartbeat (palpitations).</li><li>Fainting, weakness, or dizziness with activity.</li></ul><p>In infants and children, symptoms include:</p><ul><li>Becoming easily tired with exertion (in mild cases)</li><li>Failure to gain weight</li><li>Poor feeding</li><li>Serious breathing problems that develop within days or weeks of birth (in severe cases)</li></ul><p>Children with mild or moderate aortic stenosis may get worse as they get older. They are also at risk for a heart infection called bacterial endocarditis.</p></div></div></section>

＆＃13;

我有上面的脚本，我想废弃列表中的数据。即在... 我在scrapy中尝试过以下命令，但没有工作。它正在给予＆＃39; []＆＃39;作为输出。

 response.css("article div.section-body p").extract() <-- this is giving all info under section body but I want only under section-2
  response.css("article div.section-body.section-2 p::text").extract()
 response.xpath("//article/*[contains(@id, 'setion-2')]").extract()

请帮我提取。谢谢

Answer 1

尝试

response.css("article div.section-body#section-2 p::text").extract()

div.section-body#section-2表示选择同时具有等级section-body和ID section-2

的DIV

请注意，#选择了ID，而.选择了类...因此，您的问题中发布的CSS Selector错误。

如何从具有class和id的html文件中选择scrapy中的数据？

1 个答案: