使用python

时间:2016-12-21 14:45:29

标签: python html beautifulsoup mechanize

我正在使用python程序从here抓取数据。我之前已经取得了成功,但这对我来说是一个挑战。我正在使用美丽的汤和机械化。我需要能够在文本框中输入一个邮政编码,以便在此之后生成结果。

这是包含输入文本框的代码段:



<div id="ContentPlaceHolder1_C001_pnlFindACenter" onkeypress="javascript:return WebForm_FireDefaultButton(event, 'ContentPlaceHolder1_C001_btnSearchClient')">
		
        <div style="width: 400px; float: left; padding-top: 5px;">
            <label for="ContentPlaceHolder1_C001_tbUserAddress" style="font-family: Arial; font-size: 13.3333px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; line-height: normal; text-decoration: none; text-transform: none; color: rgb(0, 0, 0); cursor: auto; display: inline-block; position: relative; z-index: 100; margin-right: -121px; left: 2px; top: 0px; opacity: 1;">Address, City or Zip:</label><input name="ctl00$ContentPlaceHolder1$C001$tbUserAddress" type="text" id="ContentPlaceHolder1_C001_tbUserAddress" class="textInField" style="width: 240px; background-image: url(&quot;&quot;); background-repeat: no-repeat; background-attachment: scroll; background-size: 16px 18px; background-position: 98% 50%; cursor: auto;" data-hasqtip="21" oldtitle="Address, City or Zip:" title="" autocomplete="off" aria-describedby="qtip-21">
            <div id="divDistance" style="display: inline;">
                &nbsp;&nbsp;within&nbsp;&nbsp;
            <select name="ctl00$ContentPlaceHolder1$C001$ddlRadius" id="ContentPlaceHolder1_C001_ddlRadius">
			<option value="5">5</option>
			<option value="10">10</option>
			<option selected="selected" value="25">25</option>
			<option value="50">50</option>
			<option value="100">100</option>

		</select>
                miles
            </div>
        </div>
        <div style="width: 160px; float: left;">
            &nbsp;&nbsp;&nbsp;
            <input type="submit" name="ctl00$ContentPlaceHolder1$C001$btnSearchClient" value="Search" onclick="GeocodeLocation();return false;WebForm_DoPostBackWithOptions(new WebForm_PostBackOptions(&quot;ctl00$ContentPlaceHolder1$C001$btnSearchClient&quot;, &quot;&quot;, false, &quot;&quot;, &quot;find-a-center&quot;, false, false))" id="ContentPlaceHolder1_C001_btnSearchClient" class="btnCenter">
        </div>
        <div style="clear: both;">
        </div>
        <div>
            <span onchange="" style="font-size:12px;display: inline;" data-hasqtip="22" oldtitle="<b>AASM SleepTM</b> is an innovative telemedicine system that brings your sleep doctor to you. Featuring a secure, web-based video platform, AASM SleepTM allows you to meet with your sleep doctor from a distance. These live video visits will save you time and money. AASM SleepTM also syncs with Fitbit sleep data and has an integrated sleep diary, enabling you and your doctor to monitor your sleep." title="" aria-describedby="qtip-22"><input id="ContentPlaceHolder1_C001_chkSleepTM" type="checkbox" name="ctl00$ContentPlaceHolder1$C001$chkSleepTM"><label for="ContentPlaceHolder1_C001_chkSleepTM">Only show AASM SleepTM capable sleep centers in my state</label></span>
            <a href="https://sleeptm.com/" style="font-size: 10px; margin-left: 10px; display: inline;" target="_blank" data-hasqtip="23" oldtitle="<b>AASM SleepTM</b> is an innovative telemedicine system that brings your sleep doctor to you. Featuring a secure, web-based video platform, AASM SleepTM allows you to meet with your sleep doctor from a distance. These live video visits will save you time and money. AASM SleepTM also syncs with Fitbit sleep data and has an integrated sleep diary, enabling you and your doctor to monitor your sleep." title="" aria-describedby="qtip-23">What is AASM SleepTM?</a>
        </div>
    
	</div>
&#13;
&#13;
&#13;

到目前为止,这些是我的尝试

url = 'http://www.sleepeducation.org/find-a-facility'
MILES = '100'
CODE = '33060'

尝试一次

first = urllib2.Request(url,
                   data=urllib.urlencode({'value': CODE}),
                   headers={'User-Agent' : 'Google Chrome'                             'Cookie': 'name = ctl00$ContentPlaceHolder1$C001$tbUserAddress'})

尝试两次

post_params = {
       'ctl00$ContentPlaceHolder1$C001$tbUserAddress': CODE
}
first = urllib.urlencode(post_params)

driver = webdriver.Chrome()
driver.get(url)
sbox = driver.find_element_by_class_name("ctl00$ContentPlaceHolder1$C001$tbUserAddress")
sbox.send_keys(CODE)
        driver.find_element_by_class_name("ctl00$ContentPlaceHolder1$C001$btnSearchClient").click()

尝试3

br = mechanize.Browser()
br.open(url)
br.select_form(name='ctl00$ContentPlaceHolder1$C001$tbUserAddress')
br['value'] = CODE
br.submit()

http = urllib2.urlopen(br.response())
soup = BeautifulSoup(http, "html5lib")
  

错误=&#34;没有匹配名称的表单   &#39; ctl00 $ $ ContentPlaceHolder1 C001 $ tbUserAddress&#39;&#34;

尝试4

soup.find('input', {'name': 'ctl00$ContentPlaceHolder1$C001$tbUserAddress'})['value'] = CODE
soup.find('input', {'name': 'ctl00$ContentPlaceHolder1$C001$btnSearchClient'}).click()

2 个答案:

答案 0 :(得分:1)

如果我正确理解了您的问题,您希望发送具有特定参数的请求,并检查响应。 好的,让我们看看提交后发送的请求。 让我们打开邮差。Post request params

我们可以看到 ctl00 $ ContentPlaceHolder1 $ C001 $ tbUserAddress 获取值100, ctl00 $ ContentPlaceHolder1 $ T6B6681F0008 $ ddlRadius ctl00 $ ContentPlaceHolder1 $ C001 $ ddlRadius ctl00 $ cphTopBar $ T917BC451013 $ rblRadius 得到半径值25。

因此,让我们获取一些数据片段以发送帖子请求并获得必要的回复

我使用python请求

和lxml解析html响应

我更喜欢lxml,它更难理解,但比BeautifulSoup快得多。

import requests
from lxml import html

input_data = {
    'ctl00$cphTopBar$T917BC451013$rblRadius': 25,
    'ctl00$ContentPlaceHolder1$T6B6681F0008$ddlRadius': 25,
    'ctl00$ContentPlaceHolder1$C001$ddlRadius': 25,
    'ctl00$ContentPlaceHolder1$C001$tbUserAddress': 100
}
resp = requests.post('http://www.sleepeducation.org/find-a-facility', data=input_data)
tree = html.fromstring(resp.text)
print(tree.xpath('//div[@id="ContentPlaceHolder1_C001_map_canvas"]')[0])

我没有足够的声誉来放置文档链接,我会尝试将它们放在评论中,或者你只需​​google python请求 python lxml 你也可以用BeautifulSoup

来做
import BeautifulSoup
import requests

input_data = {
        'ctl00$cphTopBar$T917BC451013$rblRadius': 25,
        'ctl00$ContentPlaceHolder1$T6B6681F0008$ddlRadius': 25,
        'ctl00$ContentPlaceHolder1$C001$ddlRadius': 25,
        'ctl00$ContentPlaceHolder1$C001$tbUserAddress': 100
    }
resp = requests.post('http://www.sleepeducation.org/find-a-facility', data=input_data)
soup = BeautifulSoup.BeautifulSoup(resp.text)
soup.find("div", {"id": "ContentPlaceHolder1_C001_map_canvas"})

答案 1 :(得分:0)

这对我有用

from bs4 import BeautifulSoup
from selenium import webdriver

url = 'http://www.sleepeducation.org/find-a-facility'
subButton = 'ContentPlaceHolder1_C001_btnSearchClient'
addyName = 'ctl00$ContentPlaceHolder1$C001$tbUserAddress'
addyId = 'ContentPlaceHolder1_C001_tbUserAddress'

def usingChromeSelenium():
    driver = webdriver.Chrome('C:\Users\documents\chromedriver.exe')
    driver.get(url)
    sleep(1)
    driver.find_element_by_name(addyName).send_keys(CODE)
    driver.find_element_by_id(subButton).click()
    sleep(1)
    html = driver.page_source
    return html

results = usingChromeSelenium()
soup = BeautifulSoup(results, "html.parser")

对于“ webdriver.Chrome()”,您必须下载chrome.exe应用程序文件并在括号内包含该文件的路径,它可能对您没有路径