我正在尝试从<li>
标记
网站:http://snowload.atcouncil.org/index.php/component/vcpsnowload/item
我想通过输入地址来提取不同城市的内容。
Query Date :
August 04, 2017
Address :
gilbert
Latitude :
33.3528264
Longitude :
-111.789027
Elevation :
0 Feet
Elevation Limitation: ASCE 7* Ground Snow Load
Elevation ≤ 2,000 feet: Ground Snow Load is 0 psf
请找到我尝试提取内容的方法。
import requests
from bs4 import BeautifulSoup
page = requests.get("http://snowload.atcouncil.org/index.php/component/vcpsnowload/item")
soup = BeautifulSoup(page.content,'html.parser')
div = soup.find("div",attrs={'class':'span5'})
print div.text
面临的问题就是它没有完全提取,只提取查询日期。
此外,我尝试使用不同的解析器,例如'html.parser'
,'html5lib'
,'lxml'
,这会产生相同的结果。
如果可以使用Selenium和Python,请尝试提供一些解决方案。
答案 0 :(得分:1)
您需要使用HTTP POST方法并在数据中发送位置,例如
8a74070...
似乎有些字符我的终端无法打印所以我添加了
$breakpoints = [400, 360, 240, 120, 60, 48, 36, 24, 12, 9, 6, 3, 1, null];
foreach($breakpoints as $breakpoint){
if($data['months'] >= $breakpoint) {
$data['months'] = $breakpoint ? $breakpoint : 12;
break;
}
}
更新:
从那里你可以得到li元素:import requests
from bs4 import BeautifulSoup
import sys
data = {'optionCoordinate': '2','coordinate_address': 'gilbert'}
page = requests.post("http://snowload.atcouncil.org/index.php/component/vcpsnowload/item", data = data)
soup = BeautifulSoup(page.content,'html.parser')
div = soup.find("div",attrs={'class':'span5'})
print (div.text.encode(sys.stdout.encoding, errors='replace'))
再次更新以写入CSV:
.encode(sys.stdout.encoding, errors='replace')
答案 1 :(得分:0)
此代码将获取您在该页面上定位的每个<li></li>
内的文字。
from bs4 import BeautifulSoup as BS
from requests import get
site = "http://snowload.atcouncil.org/index.php/component/vcpsnowload/item"
req = get(site)
soup = BS(req.text, 'html.parser')
div = soup.find('ul', attrs={'class', 'map-info'})
list_items = div.find_all('li')
for li in list_items:
print(li.text)
答案 2 :(得分:0)
在li
代码中提取内容的自动解决方案
from bs4 import BeautifulSoup
import urllib2
import requests
import sys
from selenium import webdriver
chrome_path = r"/usr/bin/chromedriver"
driver = webdriver.Chrome(chrome_path)
driver.get("http://snowload.atcouncil.org/")
driver.find_element_by_xpath("""//*[@id="adminForm"]/fieldset/div/div[2]/div[2]/label""").click()
driver.find_element_by_xpath("""//*[@id="coordinate_address"]""").click()
cities = ['pheonix']
for city in cities:
print (city)
driver.find_element_by_xpath('//*[@id="coordinate_address"]').send_keys(city)
driver.find_element_by_xpath('//*[@id="adminForm"]/fieldset/div/div[2]/button').click()
x = driver.current_url
#print x
Data = {'optionCoordinate': '2','coordinate_address': cities}
page = requests.post(x, data = Data)
soup = BeautifulSoup(page.content,'html.parser')
div = soup.find('div', attrs={'class': 'span5'})
for li in div.find_all('li'):
y = (li.text)
print y
driver.close()