无法从BeautifulSoup中获取表

时间:2016-08-12 11:07:19

标签: python beautifulsoup urllib2

from BeautifulSoup import BeautifulSoup
import urllib2

url = 'http://www.data.jma.go.jp/obd/stats/etrn/view/monthly_s3_en.php?block_no=47401&view=1'
html = urllib2.urlopen(url).read()        
soup = BeautifulSoup(html)
table = soup.find('table')
print table

未产生预期的表格。

我想抓住下表:

enter image description here

2 个答案:

答案 0 :(得分:1)

HTML中有多个表格。获取第二个表格:

v6

或者你可以通过它的CSS类直接进入表:

tables = soup.findAll('table')
print tables[1]    # the second table

请注意,上述内容使用from bs4 import BeautifulSoup table = soup.find_all('table', class_='data2_s') print table

答案 1 :(得分:1)

首先,不再维护使用bs4 beaufifulsoup3 ,而且您想要的表具有类*data2_s*,调用find("table")只需获取第一个表在页面上不是你想要的:

from bs4 import BeautifulSoup
import urllib2

url = 'http://www.data.jma.go.jp/obd/stats/etrn/view/monthly_s3_en.php?block_no=47401&view=1'
html = urllib2.urlopen(url).read()
soup = BeautifulSoup(html)
table = soup.select_one("table.data2_s") # or table = soup.find("table", class_="data2_s")
print table

这给了你:

<table class="data2_s"><caption class="m">WAKKANAI   WMO Station ID:47401 Lat 45<sup>o</sup>24.9'N  Lon 141<sup>o</sup>40.7'E</caption><tr><th scope="col">Year</th><th scope="col">Jan</th><th scope="col">Feb</th><th scope="col">Mar</th><th scope="col">Apr</th><th scope="col">May</th><th scope="col">Jun</th><th scope="col">Jul</th><th scope="col">Aug</th><th scope="col">Sep</th><th scope="col">Oct</th><th scope="col">Nov</th><th scope="col">Dec</th><th scope="col">Annual</th></tr><tr class="mtx" style="text-align:right;"><td style="text-align:center">1938</td><td class="data_0_0_0_0">-5.2</td><td class="data_0_0_0_0">-4.9</td><td class="data_0_0_0_0">-0.6</td><td class="data_0_0_0_0">4.7</td><td class="data_0_0_0_0">9.5</td><td class="data_0_0_0_0">11.6</td><td class="data_0_0_0_0">17.9</td><td class="data_0_0_0_0">22.2</td><td class="data_0_0_0_0">16.5</td><td class="data_0_0_0_0">10.7</td><td class="data_0_0_0_0">3.3</td><td class="data_0_0_0_0">-4.7</td><td class="data_0_0_0_0">6.8</td></tr>
<tr class="mtx" style="text-align:right;"><td style="text-align:center">1939</td><td class="data_0_0_0_0">-7.5</td><td class="data_0_0_0_0">-6.6</td><td class="data_0_0_0_0">-1.4</td><td class="data_0_0_0_0">4.0</td><td class="data_0_0_0_0">7.5</td><td class="data_0_0_0_0">13.0</td><td class="data_0_0_0_0">17.4</td><td class="data_0_0_0_0">20.0</td><td class="data_0_0_0_0">17.4</td><td class="data_0_0_0_0">9.7</td><td class="data_0_0_0_0">3.0</td><td class="data_0_0_0_0">-2.5</td><td class="data_0_0_0_0">6.2</td></tr>
<tr class="mtx" style="text-align:right;"><td style="text-align:center">1940</td><td class="data_0_0_0_0">-6.0</td><td class="data_0_0_0_0">-5.7</td><td class="data_0_0_0_0">-0.5</td><td class="data_0_0_0_0">3.5</td><td class="data_0_0_0_0">8.5</td><td class="data_0_0_0_0">11.0</td><td class="data_0_0_0_0">16.6</td><td class="data_0_0_0_0">19.7</td><td class="data_0_0_0_0">15.6</td><td class="data_0_0_0_0">10.4</td><td class="data_0_0_0_0">3.7</td><td class="data_0_0_0_0">-1.0</td><td class="data_0_0_0_0">6.3</td></tr>
<tr class="mtx" style="text-align:right;"><td style="text-align:center">1941</td><td class="data_0_0_0_0">-6.5</td><td class="data_0_0_0_0">-5.8</td><td class="data_0_0_0_0">-2.6</td><td class="data_0_0_0_0">3.6</td><td class="data_0_0_0_0">8.1</td><td class="data_0_0_0_0">11.4</td><td class="data_0_0_0_0">12.7</td><td class="data_0_0_0_0">16.5</td><td class="data_0_0_0_0">16.0</td><td class="data_0_0_0_0">10.0</td><td class="data_0_0_0_0">4.0</td><td class="data_0_0_0_0">-2.9</td><td class="data_0_0_0_0">5.4</td></tr>
<tr class="mtx" style="text-align:right;"><td style="text-align:center">1942</td><td class="data_0_0_0_0">-7.8</td><td class="data_0_0_0_0">-8.2</td><td class="data_0_0_0_0">-0.8</td><td class="data_0_0_0_0">3.5</td><td class="data_0_0_0_0">7.1</td><td class="data_0_0_0_0">12.0</td><td class="data_0_0_0_0">17.4</td><td class="data_0_0_0_0">18.4</td><td class="data_0_0_0_0">15.7</td><td class="data_0_0_0_0">10.5</td><td class="data_0_0_0_0">2.5</td><td class="data_0_0_0_0">-2.9</td><td class="data_0_0_0_0">5.6</td></tr>
etc...................................