Beautifulsoup在标签内提取并输出为JSON

时间:2016-07-25 07:20:33

标签: python json beautifulsoup

如前一个问题中所述,我使用美丽的python汤来从网站上检索天气数据。

以下是该网站的外观:

<channel>
<title>2 Hour Forecast</title>
<source>Meteorological Services Singapore</source>
<description>2 Hour Forecast</description>
<item>
<title>Nowcast Table</title>
<category>Singapore Weather Conditions</category>
<forecastIssue date="18-07-2016" time="03:30 PM"/>
<validTime>3.30 pm to 5.30 pm</validTime>
<weatherForecast>
<area forecast="TL" lat="1.37500000" lon="103.83900000" name="Ang Mo Kio"/>
<area forecast="SH" lat="1.32100000" lon="103.92400000" name="Bedok"/>
<area forecast="TL" lat="1.35077200" lon="103.83900000" name="Bishan"/>
<area forecast="CL" lat="1.30400000" lon="103.70100000" name="Boon Lay"/>
<area forecast="CL" lat="1.35300000" lon="103.75400000" name="Bukit Batok"/>
<area forecast="CL" lat="1.27700000" lon="103.81900000" name="Bukit Merah"/>` 
<channel>

我设法使用这些代码检索我需要的信息:

import requests
from bs4 import BeautifulSoup
import urllib3
import json


weather = []

#getting the time

r = requests.get('http://www.nea.gov.sg/api/WebAPI/?dataset=2hr_nowcast&keyref=<keyrefno>')
soup = BeautifulSoup(r.content, "xml")
time = soup.find('validTime').string
print "validTime: " + time

for currentdate in soup.find_all('item'):
 element = currentdate.find('forecastIssue')
 print "date: " + element['date']

for currentdate in soup.find_all('item'):
 element = currentdate.find('forecastIssue')
 print "time: " + element['time'] 

for area in soup.find('weatherForecast').find_all('area'):
 print area


 #file writing
with open("c:/scripts/nea.json", 'w') as outfile:
json.dumps(weather, outfile)
#outfile.write(",")

这是我得到的输出(在CMD中):

C:\scripts>python neaweather.py                                                     
2.30 pm to 4.30 pm                                                              
date: 25-07-2016                                                              
time: 02:30 PM                                                                 
<area forecast="LR" lat="1.37500000" lon="103.83900000" name="Ang Mo Kio"/>   
<area forecast="LR" lat="1.32100000" lon="103.92400000" name="Bedok"/>        
<area forecast="LR" lat="1.35077200" lon="103.83900000" name="Bishan"/>       
<area forecast="LR" lat="1.30400000" lon="103.70100000" name="Boon Lay"/>     
<area forecast="LR" lat="1.35300000" lon="103.75400000" name="Bukit Batok"/>  
<area forecast="LR" lat="1.27700000" lon="103.81900000" name="Bukit Merah"/>

我有几个问题,我不确定如何解决:

  1. 有没有办法检索区域中的属性forecast =“LR”lat =“1.37500000”lon =“103.83900000”name =“Ang Mo Kio” 没有< / strong>它的标签?

    我尝试在代码中添加“.text”,但总会出现错误

  2. 我希望输出为我的输出的JSON格式,因为它不是表格格式,如教程中所示,如何使用python创建JSON文件:/

  3. 编辑:我已设法在JSON文件中打开数据,但如何将unicode字符串格式化为普通字符串,因为结果包含u'?

1 个答案:

答案 0 :(得分:0)

在您的代码中尝试此操作:

android