从站点抓取验证链接 Href

时间:2021-05-28 19:06:29

标签: python selenium web-scraping beautifulsoup

我想从 GmailnatorInbox 获取验证 href 并且此站点包含以下内容的 href 不一致验证Discord Verify HREF

我想使用 bs4 获取此 href 并将其传递到 selenium 驱动程序链接,例如 driver.get(url) url 是 href ofc。

有人可以制作一些代码来从 gmailnator 收件箱中抓取 href 吗?我确实尝试了页面源,但是页面源不包含 href。

这是我为获取 href 而编写的代码,但我需要的 href(不一致)位于框架源中,所以我认为这就是它没有出现的原因。

更新!一切都已完成并修复

driver.get('https://www.gmailnator.com/inbox/#for.ev.e.r.my.girlt.m.p@gmail.com')
time.sleep(6)
driver.find_element_by_xpath('//*[@id="mailList"]/tbody/tr[2]/td/a/table/tbody/tr/td[1]').click()
time.sleep(4)
url = driver.current_url
email_for_data = driver.current_url.split('/')[-3]
print(url)
time.sleep(2)
print('Getting Your Discord Verify link')
print('Time To Get Your Discord Link')
soup = BeautifulSoup(requests.get(url).text, "lxml")
data_email = soup.find("")
token = soup.find("meta", {"name": "csrf-token"})["content"]
cf_email = soup.find("a", class_="__cf_email__")["data-cfemail"]

endpoint = "https://www.gmailnator.com/mailbox/get_single_message/"

data = {
    "csrf_gmailnator_token": token,
    "action": "get_message",
    "message_id": url.split("#")[-1],
    "email": f"{email_for_data}",
}

headers = {
    "referer": f"https://www.gmailnator.com/{email_for_data}/messageid/",
    "cookie": f"csrf_gmailnator_cookie={token}; ci_session={cf_email}",
    "user-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
                  "AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.86 "
                  "YaBrowser/21.3.0.740 Yowser/2.5 Safari/537.36",
    "x-requested-with": "XMLHttpRequest",
}

r = requests.post(endpoint, data=data, headers=headers)
the_real_slim_shady = (
    BeautifulSoup(r.json()["content"], "lxml")
    .find_all("a", {"target": "_blank"})[1]["href"]
)
print(the_real_slim_shady)

1 个答案:

答案 0 :(得分:1)

您可以使用纯 requests 伪造所有内容以获取 Verify 链接。首先,您需要获取 tokencf_email 值。然后,事情就很简单了。

获取链接的方法如下:

import requests
from bs4 import BeautifulSoup

url = "https://www.gmailnator.com/geralddoreyestmp/messageid/#179b454b4c482c4d"
soup = BeautifulSoup(requests.get(url).text, "lxml")

token = soup.find("meta", {"name": "csrf-token"})["content"]
cf_email = soup.find("a", class_="__cf_email__")["data-cfemail"]

endpoint = "https://www.gmailnator.com/mailbox/get_single_message/"

data = {
    "csrf_gmailnator_token": token,
    "action": "get_message",
    "message_id": url.split("#")[-1],
    "email": "geralddoreyestmp",
}

headers = {
    "referer": "https://www.gmailnator.com/geralddoreyestmp/messageid/",
    "cookie": f"csrf_gmailnator_cookie={token}; ci_session={cf_email}",
    "user-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
                  "AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.86 "
                  "YaBrowser/21.3.0.740 Yowser/2.5 Safari/537.36",
    "x-requested-with": "XMLHttpRequest",
}

r = requests.post(endpoint, data=data, headers=headers)
the_real_slim_shady = (
    BeautifulSoup(r.json()["content"], "lxml")
    .find_all("a", {"target": "_blank"})[1]["href"]
)
print(the_real_slim_shady)

输出(您的链接会有所不同!):

https://click.discord.com/ls/click?upn=qDOo8cnwIoKzt0aLL1cBeARJoBrGSa2vu41A5vK-2B4us-3D77CR_3Tswyie9C2vHlXKXm6tJrQwhGg-2FvQ76GD2o0Zl2plCYHULNsKdCuB6s-2BHk1oNirSuR8goxCccVgwsQHdq1YYeGQki4wtPdDA3zi661IJL7H0cOYMH0IJ0t3sgrvr2oMX-2BJBA-2BWZzY42AwgjdQ-2BMAN9Y5ctocPNK-2FUQLxf6HQusMayIeATMiTO-2BlpDytu-2FnIW4axB32RYQpxPGO-2BeHtcSj7a7QeZmqK-2B-2FYkKA4dl5q8I-3D