我正在使用Beautiful Soup for yahoo weather API(python 2.7):
url = 'http://weather.yahooapis.com/forecastrss?w=2344116'
page=urllib2.urlopen(url).read()
soup = BeautifulSoup(page)
但在此之后,在解析的URL中,没有任何CDATA。为什么美丽的汤忽略了这个?如何防止忽略CDATA?
xml中的:
<img src="http://l.yimg.com/a/i/us/we/52/11.gif"/>
解析页面中的:
如你所见,CDATA丢失了。
答案 0 :(得分:2)
CDATA部分不被忽略;只是按照文本 处理CDATA部分 的方式处理它:
>>> print soup.select('description:nth-of-type(2)')[0].text
<img src="http://l.yimg.com/a/i/us/we/52/11.gif"/><br />
<b>Current Conditions:</b><br />
Light Rain Shower, 59 F<BR />
<BR /><b>Forecast:</b><BR />
Sun - Rain/Wind. High: 63 Low: 57<br />
Mon - Rain/Wind. High: 60 Low: 53<br />
Tue - PM Showers. High: 58 Low: 55<br />
Wed - Mostly Cloudy. High: 64 Low: 57<br />
Thu - Rain. High: 63 Low: 55<br />
<br />
<a href="http://us.rd.yahoo.com/dailynews/rss/weather/Istanbul__TR/*http://weather.yahoo.com/forecast/TUXX0014_f.html">Full Forecast at Yahoo! Weather</a><BR/><BR/>
(provided by <a href="http://www.weather.com" >The Weather Channel</a>)<br/>
您可以将该部分解析为单独的页面:
>>> description_soup = BeautifulSoup(soup.select('description:nth-of-type(2)')[0].text)
>>> description_soup.img
<img src="http://l.yimg.com/a/i/us/we/52/11.gif"/>
请注意,由于这是您正在解析的 XML Feed ,请考虑使用XML模式(需要安装lxml
):
soup = BeautifulSoup(page, 'xml')
或(更好),使用feedparser
来解析RSS提要。
答案 1 :(得分:1)
为什么你想要CDATA如此糟糕?从我所看到的,相同的数据以更加结构化的方式排列几行:
In [28]: soup.findAll('yweather:forecast')
Out[28]:
[<yweather:forecast day="Sun" date="26 Oct 2014" low="57" high="63" text="Rain/Wind" code="12">
</yweather:forecast>,
<yweather:forecast day="Mon" date="27 Oct 2014" low="54" high="61" text="Rain/Wind" code="12">
</yweather:forecast>,
<yweather:forecast day="Tue" date="28 Oct 2014" low="56" high="59" text="Rain" code="12">
</yweather:forecast>,
<yweather:forecast day="Wed" date="29 Oct 2014" low="57" high="63" text="AM Showers" code="39">
</yweather:forecast>,
<yweather:forecast day="Thu" date="30 Oct 2014" low="55" high="62" text="Light Rain" code="11">
<guid ispermalink="false">TUXX0014_2014_10_30_9_00_EEST</guid>
</yweather:forecast>]