我可以抓取特定日期网址的目标值...我应该如何设置日期时间并抓取以跳过没有目标表格的网址?这是我到目前为止的代码 -
date = datetime.datetime.today()
url = "http://www.wsj.com/mdc/public/page/2_3022-mfsctrscan-moneyflow- 20161205.html?mod=mdc_pastcalendar"
我知道我将 {date} 添加到网址以使动态日期生效 - 提供了一个静态网址,以防网址为空。
date_time = urlopen(url.format(date=date.strftime('%Y%m%d')))
address = url
print 'Retrieving information from: ' + address
print '\n'
soup = BeautifulSoup (requests.get(address).content, "lxml")
抓取进行为:
rows = soup.select('div#column0 table tr')[2:]
headers = ['name', 'last', 'chg', 'pct_chg',
'total_money_flow', 'total_tick_up', 'total_tick_down', 'total_up_down_ratio',
'block_money_flow', 'block_tick_up', 'block_tick_down', 'block_up_down_ratio']
for row in rows:
# skip non-data rows
if row.find("td", class_="b14") is True:
continue
print(dict(zip(headers, [cell.get_text(strip=True) for cell in row.find_all('td')])))