网页,剪贴簿,类仅打印文字

时间:2016-04-12 11:36:15

标签: python web-scraping beautifulsoup

我有以下代码:

#!/usr/bin/python
#weather.scraper

from bs4 import BeautifulSoup
import urllib

def main():
    """weather scraper"""
    r = urllib.urlopen("https://www.wunderground.com/history/airport/KPHL/2016/1/1/MonthlyHistory.html?&reqdb.zip=&reqdb.magic=&reqdb.wmo=&MR=1").read()
    soup = BeautifulSoup(r, "html.parser")
    tables = soup.find_all("table", class_="responsive airport-history-summary-table")

    scrapedData = {}
    for table in tables:
        print 'Weather Philadelphia'

        for tr in table.find_all("tr"):
            firstTd = tr.find("td")
            if firstTd and firstTd.has_attr("class") and "indent" in firstTd['class']:
                values = {}
                tds = tr.find_all("td")
                maxVal = tds[1].find("span", class_="wx-value")
                avgVal = tds[2].find("span", class_="wx-value")
                minVal = tds[3].find("span", class_="wx-value")
                print maxVal, avgVal, minVal
                if maxVal:
                    values['max'] = maxVal.text
                if avgVal:
                    values['avg'] = avgVal.text
                if minVal:
                    values['min'] = minVal.text
                if len(tds) > 4:
                    sumVal = tds[4].find("span", class_="wx-value")
                    if sumVal:
                        values['sum'] = sumVal.text
                scrapedData[firstTd.text] = values

    print scrapedData


if __name__ == "__main__":
    main()

它的作用:这个刮板在某些网站上打印出表格的值 当我运行代码时,它会打印出以下内容:

Weather Philadelphia
<span class="wx-value">18</span> <span class="wx-value">6</span> <span class="wx-value">-2</span>
    <span class="wx-value">12</span> <span class="wx-value">1</span> <span class="wx-value">-6</span>
    <span class="wx-value">6</span> <span class="wx-value">-3</span> <span class="wx-value">-11</span>
    None None None
    None None None
    None None None
    <span class="wx-value">14</span> <span class="wx-value">-7</span> <span class="wx-value">-21</span>
    <span class="wx-value">35.6</span> <span class="wx-value">2.5</span> <span class="wx-value">0.0</span>
    <span class="wx-value">46</span> <span class="wx-value">8</span> <span class="wx-value">0</span>
    <span class="wx-value">61</span> <span class="wx-value">16</span> <span class="wx-value">0</span>
    <span class="wx-value">79</span> <span class="wx-value">42</span> <span class="wx-value">27</span>
    <span class="wx-value">1038</span> <span class="wx-value">1017</span> <span class="wx-value">993</span>
    {u'Cooling Degree Days (base 65)': {}, u'Gust Wind': {'max': u'79', 'avg': u'42', 'min': u'27'}, u'Min Temperature': {'max': u'6', 'avg': u'-3', 'min': u'-11'}, u'Heating Degree Days (base 65)': {}, u'Dew Point': {'max': u'14', 'avg': u'-7', 'min': u'-21'}, u'Growing Degree Days (base 50)': {}, u'Snowdepth': {'max': u'46', 'avg': u'8', 'min': u'0'}, u'Sea Level Pressure': {'max': u'1038', 'avg': u'1017', 'min': u'993'}, u'Max Temperature': {'max': u'18', 'avg': u'6', 'min': u'-2'}, u'Precipitation': {'max': u'35.6', 'sum': u'66.80', 'avg': u'2.5', 'min': u'0.0'}, u'Wind': {'max': u'61', 'avg': u'16', 'min': u'0'}, u'Mean Temperature': {'max': u'12', 'avg': u'1', 'min': u'-6'}}

但我想要的不是:

<span class="wx-value">18</span> <span class="wx-value">6</span> <span class="wx-value">-2</span>

是否打印出没有span类

的值
18
6
-2

提前致谢!

2 个答案:

答案 0 :(得分:0)

justtext = scrapedData.get_text()

查看文档! https://www.crummy.com/software/BeautifulSoup/bs4/doc/

答案 1 :(得分:0)

您只是尝试打印maxVal,avgVal,minVal

相反,请尝试使用.text

print maxVal.text, avgVal.text, minVal.text