我无法过滤table[3]
的结果,只能包含今天有日期的行。我使用此网址作为我的数据源:
http://tides.mobilegeographics.com/locations/3881.html
我可以收回所有数据,但我的过滤功能尚未完成。 5天后我得到了整个范围。我只想要这样的东西:(当天)
Montauk Point, Long Island Sound, New York
41.0717° N, 71.8567° W
2014-03-13 12:37 PM EDT 0.13 feet Low Tide
2014-03-13 6:51 PM EDT Sunset
2014-03-13 7:13 PM EDT 2.30 feet High Tide
我怎样才能得到这个,然后计算潮水是否会在接下来的40分钟内进出。
感谢您的帮助。
我的代码是:
import sre, urllib2, sys, BaseHTTPServer, datetime, re, time, pprint, smtplib
from bs4 import BeautifulSoup
from bs4.diagnose import diagnose
data = urllib2.urlopen('http://tides.mobilegeographics.com/locations/3881.html').read()
day = datetime.date.today().day
month = datetime.date.today().month
year = datetime.date.today().year
date = datetime.date.today()
soup = BeautifulSoup(data)
keyinfo = soup.find_all('h2')
str_date = datetime.date.today().strftime("%Y-%m-%d")
time_text = datetime.datetime.now() + datetime.timedelta(minutes = 20)
t_day = time_text.strftime("%Y-%m-%d")
tide_table = soup.find_all('table')[3]
pre = tide_table.findAll('pre')
dailytide = []
pattern = str_date
allmatches = re.findall(r'pattern', pre)
print allmatches
if allmatches:
print allmatches
else:
print "Match for " + str_date + " not found in data string \n" + datah
答案 0 :(得分:0)
您不需要正则表达式,只需拆分pre
标记的内容并检查今天的日期是否在该行中:
import urllib2
import datetime
from bs4 import BeautifulSoup
URL = 'http://tides.mobilegeographics.com/locations/3881.html'
soup = BeautifulSoup(urllib2.urlopen(URL))
pre = soup.find_all('table')[3].find('pre').text
today = datetime.date.today().strftime("%Y-%m-%d")
for line in pre.split('\n'):
if today in line:
print line
打印:
2014-03-13 6:52 PM EDT Sunset
2014-03-13 7:13 PM EDT 2.30 feet High Tide