我有这段代码:
from bs4 import BeautifulSoup
import urllib2
from lxml import html
from lxml.etree import tostring
trees = urllib2.urlopen('http://aviationweather.gov/adds/metars/index? station_ids=KJFK&std_trans=translated&chk_metars=on&hoursStr=most+recent+only&ch k_tafs=on&submit=Submit').read()
soup = BeautifulSoup(open(trees))
print soup.get_text()
item=soup.findAll(id="info")
print item
然而,当我在我的窗口上输入汤时,它会给我一个错误,当我的程序运行时,它会给我一个非常长的HTML代码
等等。任何帮助都会很棒。
答案 0 :(得分:0)
第一个问题出在这一部分:
trees = urllib2.urlopen('http://aviationweather.gov/adds/metars/index?station_ids=KJFK&std_trans=translated&chk_metars=on&hoursStr=most+recent+only&chk_tafs=on&submit=Submit').read()
soup = BeautifulSoup(open(trees))
trees
是一个类文件对象,无需在其上调用open()
,修复它:
soup = BeautifulSoup(trees, "html.parser")
我们还明确将html.parser
设置为基础解析器。
然后,您需要特定关于您要从页面中提取的内容。以下是获取METAR text
值的示例代码:
from bs4 import BeautifulSoup
import urllib2
trees = urllib2.urlopen('http://aviationweather.gov/adds/metars/index?station_ids=KJFK&std_trans=translated&chk_metars=on&hoursStr=most+recent+only&chk_tafs=on&submit=Submit').read()
soup = BeautifulSoup(trees, "html.parser")
item = soup.find("strong", text="METAR text:").find_next("strong").get_text(strip=True).replace("\n", "")
print item
打印KJFK 220151Z 20016KT 10SM BKN250 24/21 A3007 RMK AO2 SLP183 T02440206
。