网页搜寻:以栏目<p>内容p和脚本获取文本

时间:2019-01-06 16:31:47

标签: python html web-scraping

这是我的示例:

  

Blockquote

<p>
    According to news reports, the brokerage firm has cut FY20-21 earnings per share (EPS) estimates by 9-38 per cent factoring in lower commodity prices and stronger rupee.

<br/>

<script type="text/javascript">document.write("<!--");if(isUserBanner=="free"&&(displayConBanner==1))document.write("-->");</script><script>googletag.cmd.push(function(){googletag.defineOutOfPageSlot('/6516239/outofpage_1x1_desktop','div-gpt-ad-1490771277198-0').addService(googletag.pubads());googletag.pubads().enableSyncRendering();googletag.enableServices();});</script>
<div class="article-middle-banner" id="div-gpt-ad-1490771277198-0">
<script>googletag.cmd.push(function(){googletag.display('div-gpt-ad-1490771277198-0');});
</script>
</div>
<script>var banHeight=$(".article-middle-banner iframe").height();if(banHeight<=1){$(".article-middle-banner").height(0);$(".article-middle-banner").next().next().remove();}</script><!-- INPAGE_BANNER --><script>displayConBanner=1;
</script>
<br/>

Chinese domestic spot HRC (hard-rolled coil) prices and export HRC prices have both declined by nearly 15 per cent over the last three months to $542/ton and $495/ton respectively, impacted by potential disruptions due to the trade war, lower than expected winter production cuts and slowdown in domestic demand, according to a report by Antique Broking. 

<br/>
<br/>

<p>
        World Steel Association (WSA) expects Chinese steel demand to be flat in 2019 in the absence of any major stimulus measures that were seen in H12018 particularly for the real estate sector, the report added.</p>

    CLSA has downgraded Tata Steel to ‘Sell’ from ‘Buy’ and has slashed the target price to Rs 460 from Rs 855, earlier. Similarly, JSW Steel has been downgraded to ‘Sell’ from ‘Underperform'. Also, the target price has been reduced Rs 260 from Rs 375, as per the reports. Hindalco has been downgraded to ‘Sell’ from ‘Underperform’ and the target price has been revised to Rs 210 from Rs 255.

</p>

我只是想获取这些嵌套标签中的所有文本。

这就是我要提取的内容;;

    According to news reports, the brokerage firm has cut FY20-21 earnings per share (EPS) estimates by 9-38 per cent factoring in lower commodity prices and stronger rupee.


Chinese domestic spot HRC (hard-rolled coil) prices and export HRC prices have both declined by nearly 15 per cent over the last three months to $542/ton and $495/ton respectively, impacted by potential disruptions due to the trade war, lower than expected winter production cuts and slowdown in domestic demand, according to a report by Antique Broking. 


        World Steel Association (WSA) expects Chinese steel demand to be flat in 2019 in the absence of any major stimulus measures that were seen in H12018 particularly for the real estate sector, the report added.

    CLSA has downgraded Tata Steel to ‘Sell’ from ‘Buy’ and has slashed the target price to Rs 460 from Rs 855, earlier. Similarly, JSW Steel has been downgraded to ‘Sell’ from ‘Underperform'. Also, the target price has been 
根据报告,

从375卢比降低了260卢比。 Hindalco已从“跑输大盘”下调至“卖出”,并将目标价格从255卢比下调至210卢比。

1 个答案:

答案 0 :(得分:0)

使用.extract()删除<div><script>标记中的内容

import bs4

html = '''<p>
    According to news reports, the brokerage firm has cut FY20-21 earnings per share (EPS) estimates by 9-38 per cent factoring in lower commodity prices and stronger rupee.

<br/>

<script type="text/javascript">document.write("<!--");if(isUserBanner=="free"&&(displayConBanner==1))document.write("-->");</script><script>googletag.cmd.push(function(){googletag.defineOutOfPageSlot('/6516239/outofpage_1x1_desktop','div-gpt-ad-1490771277198-0').addService(googletag.pubads());googletag.pubads().enableSyncRendering();googletag.enableServices();});</script>
<div class="article-middle-banner" id="div-gpt-ad-1490771277198-0">
<script>googletag.cmd.push(function(){googletag.display('div-gpt-ad-1490771277198-0');});
</script>
</div>
<script>var banHeight=$(".article-middle-banner iframe").height();if(banHeight<=1){$(".article-middle-banner").height(0);$(".article-middle-banner").next().next().remove();}</script><!-- INPAGE_BANNER --><script>displayConBanner=1;
</script>
<br/>

Chinese domestic spot HRC (hard-rolled coil) prices and export HRC prices have both declined by nearly 15 per cent over the last three months to $542/ton and $495/ton respectively, impacted by potential disruptions due to the trade war, lower than expected winter production cuts and slowdown in domestic demand, according to a report by Antique Broking. 

<br/>
<br/>

<p>
        World Steel Association (WSA) expects Chinese steel demand to be flat in 2019 in the absence of any major stimulus measures that were seen in H12018 particularly for the real estate sector, the report added.</p>

    CLSA has downgraded Tata Steel to ‘Sell’ from ‘Buy’ and has slashed the target price to Rs 460 from Rs 855, earlier. Similarly, JSW Steel has been downgraded to ‘Sell’ from ‘Underperform'. Also, the target price has been reduced Rs 260 from Rs 375, as per the reports. Hindalco has been downgraded to ‘Sell’ from ‘Underperform’ and the target price has been revised to Rs 210 from Rs 255.

</p>'''


soup = bs4.BeautifulSoup(html, 'html.parser')

alpha =  soup.find_all('p')
for p in alpha:

    while p.find('div'):
        p.find('div').extract()

    while p.find('script'):
        p.find('script').extract()

    p_text = p.text
    print (p_text)

输出:

    According to news reports, the brokerage firm has cut FY20-21 earnings per share (EPS) estimates by 9-38 per cent factoring in lower commodity prices and stronger rupee.







Chinese domestic spot HRC (hard-rolled coil) prices and export HRC prices have both declined by nearly 15 per cent over the last three months to $542/ton and $495/ton respectively, impacted by potential disruptions due to the trade war, lower than expected winter production cuts and slowdown in domestic demand, according to a report by Antique Broking. 




        World Steel Association (WSA) expects Chinese steel demand to be flat in 2019 in the absence of any major stimulus measures that were seen in H12018 particularly for the real estate sector, the report added.

    CLSA has downgraded Tata Steel to ‘Sell’ from ‘Buy’ and has slashed the target price to Rs 460 from Rs 855, earlier. Similarly, JSW Steel has been downgraded to ‘Sell’ from ‘Underperform'. Also, the target price has been reduced Rs 260 from Rs 375, as per the reports. Hindalco has been downgraded to ‘Sell’ from ‘Underperform’ and the target price has been revised to Rs 210 from Rs 255.



        World Steel Association (WSA) expects Chinese steel demand to be flat in 2019 in the absence of any major stimulus measures that were seen in H12018 particularly for the real estate sector, the report added.