Question

正如前面提到的问题所提到的，我正在使用带有python的Beautiful soup来从网站上检索天气数据。

以下是网站的外观：

<channel>
<title>2 Hour Forecast</title>
<source>Meteorological Services Singapore</source>
<description>2 Hour Forecast</description>
<item>
<title>Nowcast Table</title>
<category>Singapore Weather Conditions</category>
<forecastIssue date="18-07-2016" time="03:30 PM"/>
<validTime>3.30 pm to 5.30 pm</validTime>
<weatherForecast>
<area forecast="TL" lat="1.37500000" lon="103.83900000" name="Ang Mo Kio"/>
<area forecast="SH" lat="1.32100000" lon="103.92400000" name="Bedok"/>
<area forecast="TL" lat="1.35077200" lon="103.83900000" name="Bishan"/>
<area forecast="CL" lat="1.30400000" lon="103.70100000" name="Boon Lay"/>
<area forecast="CL" lat="1.35300000" lon="103.75400000" name="Bukit Batok"/>
<area forecast="CL" lat="1.27700000" lon="103.81900000" name="Bukit Merah"/>` 
<channel>

我设法使用这些代码检索我需要的信息：

import requests
from bs4 import BeautifulSoup
import urllib3

#getting the ValidTime

r = requests.get('http://www.nea.gov.sg/api/WebAPI/?   
dataset=2hr_nowcast&keyref=781CF461BB6606AD907750DFD1D07667C6E7C5141804F45D')
soup = BeautifulSoup(r.content, "xml")
time = soup.find('validTime').string
print "validTime: " + time

#getting the date

for currentdate in soup.find_all('item'):
    element = currentdate.find('forecastIssue')
    print "date: " + element['date']

#getting the time

for currentdate in soup.find_all('item'):
    element = currentdate.find('forecastIssue')
    print "time: " + element['time'] 

for area in soup.find('weatherForecast').find_all('area'):
    area_attrs_li = [area.attrs for area in soup.find('weatherForecast').find_all('area')]
    print area_attrs_li

以下是我的结果：

{'lat': u'1.34039000', 'lon': u'103.70500000', 'name': u'Jurong West',   
'forecast': u'LR'}, {'lat': u'1.31200000', 'lon': u'103.86200000', 'name':  
 u'Kallang', 'forecast': u'LR'},

如何删除你＆＃39;从结果？我尝试使用谷歌搜索时找到的方法，但它似乎无法正常工作

我在Python方面并不强大，并且已经坚持了很长一段时间。

编辑：我试过这样做：

f = open("C:\\scripts\\nea.csv" , 'wt')

try:
 for area in area_attrs_li:
 writer = csv.writer(f)
 writer.writerow( (time, element['date'], element['time'], area_attrs_li))

finally:
  f.close()

print open("C:/scripts/nea.csv", 'rt').read()

但是，我希望将该区域分开，因为CSV中的记录是重复的：

谢谢。

Answer 1

编辑1 -Topic：

您错过了转义字符：

C:\scripts>python neaweather.py
File "neaweather.py", line 30
writer.writerow( ('time', 'element['date']', 'element['time']', 'area_attrs_li') )

writer.writerow( ('time', 'element[\'date\']', 'element[\'time\']', 'area_attrs_li') 
                                   ^

SyntaxError：语法无效

编辑2：

如果要插入值：

writer.writerow( (time, element['date'], element['time'], area_attrs_li) )

编辑3：

将结果拆分为不同的行：

for area in area_attrs_li:
    writer.writerow( (time, element['date'], element['time'], area)

编辑4：拆分根本不正确，但它应该更好地理解如何解析和拆分数据以根据您的需要进行更改。要在图像中显示时再次分割区域元素，可以解析它

for area in area_attrs_li:
    # cut off the characters you don't need
    area = area.replace('[','')
    area = area.replace(']','')
    area = area.replace('{','')
    area = area.replace('}','')

    # remove other characters
    area = area.replace("u'","\"").replace("'","\"")

    # split the string into a list
    areaList = area.split(",")

    # create your own csv-seperator
    ownRowElement = ';'.join(areaList)

    writer.writerow( (time, element['date'], element['time'], ownRowElement)

Offtopic：这对我有用：

import csv
import json

x="""[ 
    {'lat': u'1.34039000', 'lon': u'103.70500000', 'name': u'Jurong West','forecast': u'LR'}
]"""

jsontxt = json.loads(x.replace("u'","\"").replace("'","\""))

f = csv.writer(open("test.csv", "w+"))

# Write CSV Header, If you dont need that, remove this line
f.writerow(['lat', 'lon', 'name', 'forecast'])

for jsontext in jsontxt:
    f.writerow([jsontext["lat"], 
                jsontext["lon"], 
                jsontext["name"], 
                jsontext["forecast"],
                ])

使用BeautifulSoup提取数据并输出到CSV

1 个答案: