如何废弃特定的单词

时间:2017-09-04 07:24:49

标签: python web-scraping beautifulsoup

我只想废弃 Andhra Pradesh 这个词,我正在努力为此撰写查询,我们将不胜感激任何帮助。

>>> container.findAll('b')
[<b><lable style="color:#3097b0;"> Aganampudi ( Public Funded ) </lable></b>, <b>NH-16 in Andhra Pradesh <br/> Stretch : </b>, <b>Tollable Length :</b>, <b>Fee Effective Date : </b>, <b>  Due date of toll revision : </b>, <b style="color:Orange"> (With Discounting) </b>, <b> Rest Areas : </b>, <b>Truck Lay byes :</b>, <b>Static Weigh Bridge : </b>, <b> Helpline No. : </b>, <b>Emergency Services :</b>, <b>Nearest Police Station: </b>, <b>Highway Administrator (Project Director): </b>, <b>Project Implementation Unit(PIU)</b>, <b>Regional Office(RO)</b>, <b>Representative of Consultant</b>, <b>Representative of Concessionaire: </b>, <b>Nearest Hospital(s): </b>]
>>> search1 = container.findAll('b')
>>> search1[1]
<b>NH-16 in Andhra Pradesh <br/> Stretch : </b>
>>>

1 个答案:

答案 0 :(得分:1)

您可以使用Python的字符串函数来提取它。

清洁过的绳子看起来像“Andhra Pradesh Stretch的NH-16:”

我只是查找“in”的索引为6,使用.index()的“Stretch”为25,然后使用text[onset + 2:offset]从索引6到25获取文本 - 这是Python的子串的版本。如果您需要澄清,请告诉我。

text = search1[1].get_text()
onset = text.index('in')
offset = text.index('Stretch')
name = str(text[onset + 2:offset]).strip(' ')
print name