所以我知道您不喜欢像硒一样真正单击按钮。但是,我想对我正在编写的程序使用python请求。我的看法是,当您想要单击按钮时,基本上是向服务器提交一个请求,以单击并添加到监视列表(以eBay为例)。我认为您必须为此使用POST。但是,当我尝试找到诸如此类的列表的POST URL
当我在开发工具中检查“网络”选项卡时,我找不到POST请求。我只看到GET请求。我做错什么了吗?
答案 0 :(得分:2)
当我在开发工具中检查“网络”选项卡时,我找不到POST请求。我只看到GET请求。我做错什么了吗?
不,只是您看不到POST请求,因为您发送的数据是作为查询字符串传递的,这意味着数据被编码在url中。
如果打开开发工具并单击按钮(甚至是链接)后单击第一个请求,则可以转到 params 部分,并获取所有发送的数据。
使用请求模块将带有 data 关键字的表单数据与POST请求一起发送,同时将url中编码的参数传递给 params 关键字带有GET和POST方法。
例如,这正是我将商品添加到监视列表时发出的请求:
url = 'https://www.ebay.com/myb'
headers = {
'Host': 'www.ebay.com',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:61.0) Gecko'
'/20100101 Firefox/61.0',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q='
'0.8',
'Accept-Language': 'en-US,en;q=0.5',
'Accept-Encoding': 'gzip, deflate, br',
'Referer': 'https://www.ebay.com/itm/Rockville-HTS56-1000w-5-1-Channel-'
'Home-Theater-System-Bluetooth-USB-8-Subwoofer/302495341359?_trkpar'
'ms=pageci%3Ac45ae602-b2bb-11e8-91a9-74dbd1807185%7Cparentrq%3Ab4e1'
'b5ee1650a9ccac0001e5fffeeebd%7Ciid%3A1',
'Cookie': 'JSESSIONID=6E3675DDC01917342E915A485425DE14; ebay=%5Ecv%3D15'
'555%5Esbf%3D%2310000100000%5Epsi%3DATBv5t4c*%5Ejs%3D1%5E; dp1=bu1p'
'/QEBfX0BAX19AQA**5d73f732^bl/IT5f552ab2^pbf/#800080000080000000000'
'05f552ac5^; s=CgAD4ACBblBUyYjU0YzFiZjkxNjUwYTk5Yjc4NzAzZjhhZmZjNjJ'
'iNjClGshN; nonsession=CgAAIABxbulCyMTUzNjM0NDk4OHgzMDI0OTUzNDEzNTl'
'4MHgyTgDKACBk+MUyYjU0YzFiZjkxNjUwYTk5Yjc4NzAzZjhhZmZjNjJiNjAAywABW'
'5LK0jZ87uR5; ak_bmsc=CBC404CFCC021D437EAEB56AB0A505C80212FF677D1B0'
'0009DC3925B8422CF41~pl1XkGGUVkQmdzLfSzxOHqS7a6B5bt6IE+YZ9pBQsojU23'
'4gAkOREldw07haa9wqBjRKkfaGqXnWck+XkoiOMH75VNvp7RX0Tswwmgd2XI2DLpTf'
'Z3Wic4ULyIjHQiolAXprZboWAssr45zCzbT1DEfphZ+3vHtD2sZcfcIUj/u5hrbWmX'
'WcqZHABtvn/XDI5z8ul1rnRe0ZM87TfkySxS09SXR1c+HoE8BVBm0WeSB6o=; npii'
'=btguid/b54c1bf91650a99b78703f8affc62b605d73f72e^cguid/b54c2316165'
'0ac3c480165a5fe7d7b1e5d73f72e^; cssg=b54c1bf91650a99b78703f8affc62'
'b60; bm_sv=43E32CFCCEEB3DE24A7853BCD296554D~5g3PXHhS+OOCPoJYdO/hGo'
'GEWmrSON6AvaW8RYaPM31Yhe4afGf1MM/OmSgHoFPrTLloRcPphW1KrOy4IjnUiiHU'
'BHq60fazRhTC9rdF6bweXE9Oyz02T4zoySTDLYfL8SJtb99/tNa5v1jarB5cjA==; '
'AMCV_A71B5B5B54F607AB0A4C98A2%40AdobeOrg=-1758798782%7CMCIDTS%7C17'
'782%7CMCMID%7C80964471016865310950758990401017778531%7CMCAID%7CNON'
'E%7CMCOPTOUT-1536352227s%7CNONE; AMCVS_A71B5B5B54F607AB0A4C98A2%40'
'AdobeOrg=1; ds2=sotr/b7pwxzzzzzzz^',
'Connection': 'keep-alive',
'Upgrade-Insecure-Requests': '1',
'DNT': '1',
}
params = {
'_trksid:p2047675.l1359',
'SubmitAction.AddToListVI': 'x',
'item': '302495341359',
'rt': 'nc',
'srt': '0100030000005006d271d47b3a0557eff3cbcd450ad13d38dd94ca9e'\
'2d8918de753cf5a1dfc6eeb0648b5e9c433cbf106609c6d81bed4ad1fa6'\
'fa0fdbeca4bc2e3ae88a523453c4a5620551f91a45384f9d5a4054f8e56',
'etn': 'Watch list',
'tagId': '-99',
'wt': 'f1cc17761369fcda30b0792ff44e1a09',
'ssPageName': 'VIP:watchlink:top:en',
'sourcePage': '4340',
}
requests.get(url, headers=headers, params=params)
[编辑1]
删除“添加到监视列表”按钮的工作代码示例:
from requests import Session
from bs4 import BeautifulSoup
url = 'https://www.ebay.com/itm/Rockville-HTS56-1000w-5-1-Cha'\
'nnel-Home-Theater-System-Bluetooth-USB-8-Subwoofer/302'\
'495341359?_trkparms=pageci%3Ac45ae602-b2bb-11e8-91a9-7'\
'4dbd1807185%7Cparentrq%3Ab4e1b5ee1650a9ccac0001e5fffee'\
'ebd%7Ciid%3A1'
def get_watch_list_url(page_source):
"""
Return "Add to watch list" button url.
"""
soup = BeautifulSoup(page_source)
for button in soup.find_all('a'):
if button.get_text() == 'Add to watch list':
return button.get('href')
def main():
with Session() as session:
response = session.get(url) # visit item page and scrape button
add_to_watch_list_url = get_watch_list_url(response.text)
print('Url is:', add_to_watch_list_url)
response = session.get(add_to_watch_list_url) # add item to watch list
if response.ok:
print('Item successfully added to watch list')
if __name__ == '__main__':
main()
[编辑2]
使用aiohttp
的实现示例:
import aiohttp
import asyncio
from bs4 import BeautifulSoup
url = 'https://www.ebay.com/itm/Rockville-HTS56-1000w-5-1-Cha'\
'nnel-Home-Theater-System-Bluetooth-USB-8-Subwoofer/302'\
'495341359?_trkparms=pageci%3Ac45ae602-b2bb-11e8-91a9-7'\
'4dbd1807185%7Cparentrq%3Ab4e1b5ee1650a9ccac0001e5fffee'\
'ebd%7Ciid%3A1'
def get_watch_list_url(page_source):
soup = BeautifulSoup(page_source, "html.parser")
for button in soup.find_all('a'):
if button.get_text() == 'Add to watch list':
return button.get('href')
async def main():
async with aiohttp.ClientSession() as session:
# Send request and get response.
async with session.get(url) as response:
html = await response.text()
# Extract button url.
add_to_watch_list_url = get_watch_list_url(html)
# Add item to watch list.
async with session.get(add_to_watch_list_url) as response:
status = response.status
print('Request status code:', status) # print the request status
if __name__ == '__main__':
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
答案 1 :(得分:0)
首先,您需要使用按钮检查页面,以确定其发出的请求类型。查看周围的<form>
元素。如果它具有method
属性,则可以查看该表单是执行POST还是GET。如果没有method
,则默认为POST。您还将在form
的{{1}}属性中看到确切的URL。您可以使用beautifulsoup抓取所有这些内容,然后使用请求进行请求。