Question

所以我正在抓这个印度天气网站

http://202.54.31.7/citywx/localwx.php

因此，从左侧窗格中您可以看到所有印度州，如果您将鼠标悬停在它们上方，则可以选择城市/地区。所以我从左侧窗格中选择Delhi->safdarjung并将此页面保存在本地： -

from BeautifulSoup import BeautifulSoup
import urllib, urllib2

imd_ind = urllib2.urlopen('http://202.54.31.7/citywx/localwx.php')
delhi_info = imd_ind.read()
open('delhi_info.html', 'w').write(delhi_info)
soup = BeautifulSoup(open('delhi_info.html'))
soup.prettify

只打印这么多： -

<bound method BeautifulSoup.prettify of <html><head><title>Local Weather Forecast</title>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type" />
<meta content="MSHTML 5.00.2920.0" name="GENERATOR" /></head>
<frameset border="0" cols="330,611*" frameborder="NO" framespacing="0" rows="*"><frame name="menuFrame" noresize="noResize" src="menu.php" /><frame name="mainframe" src="http://202.54.31.7/citywx/city_weather1.php?id=42182" /></frameset></html>
>

然而，如果我检查本地保存的页面＆＃34; delhi_info.html＆＃34;在chrome中，我可以看到很多信息日期，温度，阴天等等（即很多，等等），但为什么我不能通过任何BeautifulSoup方法看到它们。请帮忙

Answer 1

HTML中有框架元素。您在保存的HTML文件中包含此代码：

src="http://202.54.31.7/citywx/city_weather1.php?id=42182"

BeautifulSoup无法废弃此框架，因此您需要提取此网址，将其打开然后废弃数据。

网页抓取imd网站的一些问题

1 个答案: