从表中仅截取今天的日期行

时间:2014-03-13 16:30:40

标签: python html python-2.7 html-parsing beautifulsoup

我无法过滤table[3]的结果,只能包含今天有日期的行。我使用此网址作为我的数据源:

http://tides.mobilegeographics.com/locations/3881.html

我可以收回所有数据,但我的过滤功能尚未完成。 5天后我得到了整个范围。我只想要这样的东西:(当天)

Montauk Point, Long Island Sound, New York
41.0717° N, 71.8567° W

2014-03-13 12:37 PM EDT   0.13 feet  Low Tide
2014-03-13  6:51 PM EDT   Sunset
2014-03-13  7:13 PM EDT   2.30 feet  High Tide

我怎样才能得到这个,然后计算潮水是否会在接下来的40分钟内进出。

感谢您的帮助。

我的代码是:

import sre, urllib2, sys, BaseHTTPServer, datetime, re, time, pprint, smtplib
from bs4 import BeautifulSoup
from bs4.diagnose import diagnose

data = urllib2.urlopen('http://tides.mobilegeographics.com/locations/3881.html').read()
day = datetime.date.today().day
month = datetime.date.today().month

year = datetime.date.today().year
date = datetime.date.today()
soup = BeautifulSoup(data)

keyinfo = soup.find_all('h2')
str_date = datetime.date.today().strftime("%Y-%m-%d")
time_text = datetime.datetime.now() + datetime.timedelta(minutes = 20)

t_day = time_text.strftime("%Y-%m-%d")
tide_table = soup.find_all('table')[3]
pre = tide_table.findAll('pre')

dailytide = []
pattern = str_date
allmatches = re.findall(r'pattern', pre)
print allmatches

if allmatches:
    print allmatches
else:
    print "Match for " + str_date + " not found in data string \n" + datah

1 个答案:

答案 0 :(得分:0)

您不需要正则表达式,只需拆分pre标记的内容并检查今天的日期是否在该行中:

import urllib2
import datetime
from bs4 import BeautifulSoup


URL = 'http://tides.mobilegeographics.com/locations/3881.html'
soup = BeautifulSoup(urllib2.urlopen(URL))
pre = soup.find_all('table')[3].find('pre').text

today = datetime.date.today().strftime("%Y-%m-%d")
for line in pre.split('\n'):
    if today in line:
        print line

打印:

2014-03-13  6:52 PM EDT   Sunset
2014-03-13  7:13 PM EDT   2.30 feet  High Tide