在Python中抓取动态JavaScript站点并不起作用

时间:2015-11-03 03:17:40

标签: javascript python web-scraping beautifulsoup python-requests

我试图抓住这个网站:

http://courier.correos.cl/Tarificador/aspx/Cep.aspx?s=1&lsrv=20&tipo=1

目标是迭代两个下拉菜单,但首先,我试图只在Python代码中放入一个组合,但这不起作用。我正在使用request和beautifulSoup4。

from bs4 import BeautifulSoup
import requests

url = 'http://courier.correos.cl/Tarificador/aspx/Cep.aspx?s=1&lsrv=20&tipo=1'

with requests.Session() as session:
    session.headers = {
        'User-Agent': 'Mozilla/5.0 (Linux; U; Android 4.0.3; ko-kr; LG-L160L Build/IML74K) AppleWebkit/534.30 (KHTML, like Gecko) Version/4.0 Mobile Safari/534.30',
        'X-Requested-With': 'XMLHttpRequest'
    }
    response = session.get(url)
    soup = BeautifulSoup(response.content)

    # build an options mapping 
    OriginenOptions = {option.get_text(strip=True): option['value'] for option in soup.select("select#ctl00_ContentPlaceHolder1_ddlOrigenComuna option")[1:]}
    DestinoOptions = {option.get_text(strip=True): option['value'] for option in soup.select("select#ctl00_ContentPlaceHolder1_ddlDestino option")[1:]}

    form = soup.find("form", id="aspnetForm")

    Origen='ALGARROBO'
    Destino='ACHAO'
    Peso=1000
    Largo=10
    Alto=10
    Ancho=10
    Tipo=1

    params = {
        'ctl00$ContentPlaceHolder1$ddlOrigenComuna': OriginenOptions.get(Origen),
        'ctl00$ContentPlaceHolder1$ddlDestino': DestinoOptions.get(Destino),
        '__ASYNCPOST': 'true',
        'ctl00$ContentPlaceHolder1$ScriptManager1': 'tctl00$ContentPlaceHolder1$UpdatePanel1|tctl00$ContentPlaceHolder1$updTarifas',
        'ctl00$ContentPlaceHolder1$txtPeso': Peso,
        'ctl00$ContentPlaceHolder1$txtLargo': Largo,
        'ctl00$ContentPlaceHolder1$txtAncho': Ancho,
        'ctl00$ContentPlaceHolder1$txtAlto': Alto,
        'ctl00$ContentPlaceHolder1$rbtlTipoEnvio': Tipo,
        '__EVENTTARGET': 'ctl00$ContentPlaceHolder1$btnCotizar',
        '__EVENTARGUMENT': form.find('input', {'name': '__EVENTARGUMENT'})['value'],
        '__LASTFOCUS': '',
        '__VIEWSTATE': form.find('input', {'name': '__VIEWSTATE'})['value'],
        '__VIEWSTATEGENERATOR': form.find('input', {'name': '__VIEWSTATEGENERATOR'})['value'],
        '__VIEWSTATEENCRYPTED': '',
        '__EVENTVALIDATION': form.find('input', {'name': '__EVENTVALIDATION'})['value']
    }

    response = session.post(url, data=params)
    # parse the results
    soup = BeautifulSoup(response.content)

    for row in soup.select("table#ctl00_ContentPlaceHolder1_GridView1 tr")[1:]:
        print(row.find_all("td")[1].text)

0 个答案:

没有答案