Python / bs4:尝试从本地网站打印温度/城市

时间:2017-06-30 13:22:09

标签: python parsing bs4

我正试图从当地网站获取并打印当前的天气温度和城市名称,但没有成功。 所有我需要它来阅读和打印城市(Lodrina),温度(23.1C),如果可能的话,ca-cond-firs的标题(“Temperaturaemdeclínio”) - 这最后一个随温度上升或下降而变化。 ..

这是该网站的html部分:

THIS IS THE HTML (the part of matters:)
#<div class="ca-cidade"><a href="/site/internas/conteudo/meteorologia/grafico.shtml?id=23185109">Londrina</a></div>
<ul class="ca-condicoes">
<li class="ca-cond-firs"><img src="/site/imagens/icones_condicoes/temperatura/temp_baixa.png" title="Temperatura em declínio"/><br/>23.1°C</li>
<li class="ca-cond"><img src="/site/imagens/icones_condicoes/vento/L.png"/><br/>10 km/h</li>
<li class="ca-cond"><div class="ur">UR</div><br/>54%</li>
<li class="ca-cond"><img src="/site/imagens/icones_condicoes/chuva.png"/><br/>0.0 mm</li>

这是我所做的代码:

from bs4 import BeautifulSoup
import requests

URL = 'http://www.simepar.br/site/index.shtml'

rawhtml = requests.get(URL).text
soup = BeautifulSoup(rawhtml, 'lxml')

id = soup.find('a', 'id=23185109')
print(id)

任何帮助?

4 个答案:

答案 0 :(得分:2)

Record

下面的代码可以根据需要获得页面右侧的温度详情。

from bs4 import BeautifulSoup
import requests

URL = 'http://www.simepar.br/site/index.shtml'

rawhtml = requests.get(URL).text
soup = BeautifulSoup(rawhtml, 'html.parser') # parse page as html

temp_table = soup.find_all('table', {'class':'cidadeTempo'}) # get detail of table with class name cidadeTempo
for entity in temp_table:
    city_name = entity.find('h3').text # fetches name of city
    city_temp_max = entity.find('span', {'class':'tempMax'}).text # fetches max temperature
    city_temp_min = entity.find('span', {'class':'tempMin'}).text # fetches min temperature
    print("City :{} \t Max_temp: {} \t Min_temp: {}".format(city_name, city_temp_max, city_temp_min)) # prints content

答案 1 :(得分:0)

我不确定您的代码遇到了什么问题。在我尝试使用您的代码时,我发现我需要使用html解析器来成功解析网站。我还使用了soup.findAll()来查找与所需类匹配的元素。希望以下内容能引导您找到答案:

from bs4 import BeautifulSoup
import requests

URL = 'http://www.simepar.br/site/index.shtml'

rawhtml = requests.get(URL).text
soup = BeautifulSoup(rawhtml, 'html.parser')

rows = soup.findAll('li', {'class', 'ca-cond-firs'})
print rows

答案 2 :(得分:0)

你走了。您可以根据图标名称自定义该风。

#!/usr/bin/env python
# -*- encoding: utf8 -*-
import sys

reload(sys)
sys.setdefaultencoding('utf-8')

from bs4 import BeautifulSoup
import requests

def get_weather_data():

    URL = 'http://www.simepar.br/site/index.shtml'

    rawhtml = requests.get(URL).text
    soup = BeautifulSoup(rawhtml, 'html.parser')

    cities = soup.find('div', {"class":"ca-content-wrapper"})

    weather_data = []

    for city in cities.findAll("div", {"class":"ca-bg"}):

        name = city.find("div", {"class":"ca-cidade"}).text
        temp = city.find("li", {"class":"ca-cond-firs"}).text

        conditons = city.findAll("li", {"class":"ca-cond"})

        weather_data.append({
            "city":name,
            "temp":temp,
            "conditions":[{
                "wind":conditons[0].text +" "+what_wind(conditons[0].find("img")["src"]),
                "humidity":conditons[1].text,
                "raind":conditons[2].text,
            }]
        })


    return weather_data

def what_wind(img):
    if img.find ("NE"):
        return "From North East"

    if img.find ("O"):
        return "From West"

    if img.find ("N"):
        return "From North"

    #you can add other icons here


print get_weather_data()

这就是该网站的所有天气数据。

答案 3 :(得分:0)

你应该试试BS4中的CSS3选择器,我个人觉得它比find和find_all更容易使用。

from bs4 import BeautifulSoup
import requests

URL = 'http://www.simepar.br/site/index.shtml'

rawhtml = requests.get(URL).text
soup = BeautifulSoup(rawhtml, 'lxml')

# soup.select returns the list of all the elements that matches the CSS3 selector

# get the text inside each <a> tag inside div.ca-cidade
cities = [cityTag.text for cityTag in soup.select("div.ca-cidade > a")] 

# get the temperature inside each li.ca-cond-firs
temps = [tempTag.text for tempTag in soup.select("li.ca-cond-firs")]

# get the temperature status inside each li.ca-cond-firs > img title attibute
tempStatus = [tag["title"] for tag in soup.select("li.ca-cond-firs > img")]

# len(cities) == len(temps) == len(tempStatus) => This is normally true.

for i in range(len(cities)):
    print("City: {}, Temperature: {}, Status: {}.".format(cities[i], temps[i], tempStatus[i]))