在返回的字符串中搜索文本时出现问题

时间:2014-10-25 09:59:30

标签: string search beautifulsoup webpage

我编写的代码可以返回:

<div id="IncidentDetailContainer"><p>The Fire Service received a call reporting a car on fire at    the above location. One fire appliance from Ashburton attended.</p><p>Fire crews confirmed one car   well alight and severley damaged by fire. The vehicle was extinguished by fire crews using two   breathing apparatus wearers and one hose reel jet. The cause of the fire is still under investigation   by the Fire Service and Police.</p><p> </p><p> </p></div>

我想搜索它并找到“Ashburton”部分,但到目前为止,无论我使用什么,我都没有返回或[]。

我的问题是:这是一个可以搜索的正常字符串(而且我做错了)或是因为我从网页源代码中得到它我无法通过它正常搜索?

应该很简单,我知道,但我仍然没有!

from bs4 import BeautifulSoup
from urllib import request
import sys, traceback

webpage = request.urlopen("http://www.dsfire.gov.uk/News/Newsdesk/IncidentsPast7days.cfm?siteCategoryId=3&T1ID=26&T2ID=35")
soup = BeautifulSoup(webpage)
incidents = soup.find(id="CollapsiblePanel1")
Links = []
for line in incidents.find_all('a'):
    Links.append("http://www.dsfire.gov.uk/News/Newsdesk/"+line.get('href'))
n = 0
e = len(Links)
if e == n:
   print("No Incidents Found Please Try Later")
   sys.exit(0)
while n < e:
    webpage = request.urlopen(Links[n])
    soup = BeautifulSoup(webpage)
    station =  soup.find(id="IncidentDetailContainer")
    #search string
    print(soup.body.findAll(text='Ashburton'))
    n=n+1

仅供参考,如果网页今天没有任何事件,它不会搜索任何东西(很明显)这就是为什么我包含了返回的字符串,所以如果你运行它并且什么也没得到,那就是原因。

1 个答案:

答案 0 :(得分:0)

text=中提供一种模式。来自您提供的html,如果您想找到具有&#34; Ashburton&#34;的tag。在其中,你可以使用这样的东西,

soup.find_all('p', text=re.compile(r'Ashburton'))

你只能得到这个文字,

soup.find_all(text=re.compile(r'Ashburton'))