如何在python BeautifulSoup中删除特定文本

时间:2017-09-01 20:56:09

标签: python web-scraping beautifulsoup

在此我只想废弃 Vishakhapatnam - Ankapalli [Km 2.837 to& Km; 395.870至Km358.00(New Chainage从Km 700.544至Km 740.255)] ,然后如何报废,请帮助

<p><b><lable style="color:#3097b0;"> Aganampudi ( Public Funded ) </lable></b> <br/>Km 728.055 - <b>NH-16 in Andhra Pradesh <br/> Stretch : </b>Vishakhapatnam - Ankapalli [Km 2.837 to &amp;Km; 395.870 to Km358.00(New Chainage From Km 700.544 to Km 740.255)] <br/> <b>Tollable Length :</b> Km 40.707 Km(s) </p>

2 个答案:

答案 0 :(得分:1)

文档 https://www.crummy.com/software/BeautifulSoup/bs4/doc/

from bs4 import BeautifulSoup
 a = '<p><b><lable style="color:#3097b0;"> Aganampudi ( Public Funded ) </lable></b> <br/>Km 728.055 - <b>NH-16 in Andhra Pradesh <br/> Stretch : </b>Vishakhapatnam - Ankapalli [Km 2.837 to &amp;Km; 395.870 to Km358.00(New Chainage From Km 700.544 to Km 740.255)] <br/> <b>Tollable Length :</b> Km 40.707 Km(s) </p>'

b = BeautifulSoup(a,'html.parser')
answer=list(b.descendants)[11]

#
    list(b.descendants)[11]
    Out[23]: 'Vishakhapatnam - Ankapalli [Km 2.837 to &Km; 395.870 to Km358.00(New Chainage From Km 700.544 to Km 740.255)] '

答案 1 :(得分:0)

由于您想要的文字周围没有特殊标记,我认为您最好尝试访问nextSibling的{​​{1}}:

<b>

现在element = soup.b for i in range(5): element = element.nextSibling 存储了所需的文字:

element