我试图从外汇日历中抓取新闻数据,但我的xml文件有小问题
def get_news_calendar():
r = requests.get('http://www.forexfactory.com/ffcal_week_this.xml')
soup = BeautifulSoup(r.text , 'lxml')
events = soup.find_all('event')
for event in events:
print event.find('title').text, event.find('country').text, event.find('date'), event.find('time').text, event.find('impact').text, event.find('forecast').text, event.find('previous').text
输出:
Current Account EUR <date></date>
Retail Sales m/m GBP <date></date>
MPC Member Saunders Speaks GBP <date></date>
Core CPI m/m CAD <date></date>
CPI m/m CAD <date></date>
Trimmed CPI y/y CAD <date></date>
Median CPI y/y CAD <date></date>
Common CPI y/y CAD <date></date>
FOMC Member Kashkari Speaks USD <date></date>
Flash Manufacturing PMI USD <date></date>
Flash Services PMI USD <date></date>
Existing Home Sales USD <date></date>
IMF Meetings ALL <date></date>
IMF Meetings ALL <date></date>
Treasury Sec Mnuchin Speaks USD <date></date>
French Presidential Election EUR <date></date>
示例xml文件:
<event>
<title>German Flash Manufacturing PMI</title>
<country>EUR</country>
<date><![CDATA[04-21-2017]]></date>
<time><![CDATA[7:30am]]></time>
<impact><![CDATA[Medium]]></impact>
<forecast><![CDATA[58.1]]></forecast>
<previous><![CDATA[58.3]]></previous>
</event>
我如何打印cdata的值?
答案 0 :(得分:1)
您似乎错误地解析了解析器的名称。您正在解析XML文档,因此您需要使用lxml-xml
而不是lxml
。
尝试替换
soup = BeautifulSoup(r.text , 'lxml')
与
soup = BeautifulSoup(r.text , 'lxml-xml')
对get_news_calendar
函数进行此更改后,我会在示例XML文件中运行以下输出:
German Flash Manufacturing PMI EUR <date>04-21-2017</date> 7:30am Medium 58.1 58.3
答案 1 :(得分:0)
考虑直接使用lxml
并在所有xpath
个节点上运行<event>
,因为.text()
可以检索 CData 内容。
import requests
import lxml.etree as et
def get_news_calendar():
r = requests.get('http://www.forexfactory.com/ffcal_week_this.xml')
data = et.fromstring(r.text.encode("utf-8"))
events = data.xpath('//event')
for event in events:
print(event.find('title').text, event.find('country').text,
event.find('date').text, event.find('time').text,
event.find('impact').text, event.find('forecast').text,
event.find('previous').text)
get_news_calendar()
# Bank Holiday NZD 04-16-2017 9:00pm Holiday None None
# Bank Holiday AUD 04-16-2017 10:00pm Holiday None None
# GDP q/y CNY 04-17-2017 2:00am High 6.8% 6.8%
# Industrial Production y/y CNY 04-17-2017 2:00am High 6.2% 6.3%
# Fixed Asset Investment ytd/y CNY 04-17-2017 2:00am Medium 8.8% 8.9%
# NBS Press Conference CNY 04-17-2017 2:00am Medium None None
# Retail Sales y/y CNY 04-17-2017 2:00am Low 9.7% 9.5%
# Bank Holiday CHF 04-17-2017 6:00am Holiday None None
# BOJ Gov Kuroda Speaks JPY 04-17-2017 6:15am High None None
# Bank Holiday GBP 04-17-2017 7:00am Holiday None None
# French Bank Holiday EUR 04-17-2017 7:00am Holiday None None
# ...