如何使用Python 3中的请求将数据发送到html的特定区域?

时间:2019-07-15 12:44:30

标签: python html web-scraping python-requests

我正在尝试将数据(字符串)发送到以下xpath上的amazon.com搜索框中:

//input[@id="twotabsearchtextbox"] 

带有python请求。我希望能够在搜索框中推送任何数据,就像我是普通用户一样。 IE:在搜索框中输入“ Apple Watch”。

这是我的代码:

import requests
from lxml import html
url = "https://www.amazon.com/"
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) ''AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36'}

page = requests.get(url,headers=headers)
tree = html.fromstring(page.content)
search_box = tree.xpath('//input[@id="twotabsearchtextbox"]')
print(search_box)

我得到一个很好的响应代码:200,经过测试以及请求中的元素:

[<InputElement 1b7eaea45e8 name='field-keywords' type='text'>]

我的问题是如何使用请求而不是Selenium或Scrapy推送数据? 谢谢

1 个答案:

答案 0 :(得分:3)

获取搜索建议

编辑:OP似乎需要搜索建议,这是操作方法。

您需要marketplaceId midaliasprefix向建议端点发出AJAX请求。您可以使用re从HTML中提取市场编号。

您可以打开浏览器的 Developer Tools (F12)并切换到 Network 标签,然后在搜索框中输入一些文字来查找原始请求请求。您会看到对completion.amazon.com的请求。

from bs4 import BeautifulSoup
import requests
from urllib.parse import quote
import re
from pprint import pprint

headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:69.0) Gecko/20100101 Firefox/69.0'}
def get_html(url: str) -> str:
    res = requests.get(url, headers=headers)
    res.raise_for_status()
    html = res.text
    return html

def get_marketplace_id(html: str) -> str:
    return re.search('obfuscatedMarketId:\s*"([^\"]+)"', html).group(1)

def get_suggestions(mid: str, keyword: str) -> list:
    url = f'https://completion.amazon.com/api/2017/suggestions?lop=en_US&mid={mid}&alias=aps&prefix={quote(keyword)}'
    res = requests.get(url, headers)
    res.raise_for_status()
    data = res.json()
    suggestions_raw = data['suggestions']
    suggestions = []
    for it in suggestions_raw:
        suggestions.append(it['value'])
    return suggestions

html = get_html('https://www.amazon.com')
mid = get_marketplace_id(html)
pprint(get_suggestions(mid, 'apple watch'))

输出:

['apple watch band 38mm',
 'apple watch',
 'apple watch band 42mm',
 'apple watch charger',
 'apple watch band',
 'apple watch series 3',
 'apple watch series 4',
 'apple watch band 44mm series 4',
 'apple watch screen protector',
 'apple watch band 40mm series 4']


获取搜索结果

一个更简单的方法将改为创建搜索网址:

search_url = 'https://www.amazon.com/s'
page = requests.get(search_url, headers=headers, params={'k': 'apple watch'})

这将直接为您提供搜索结果,为您节省请求。