嘿,我正在尝试从aliexpress抓取一些数据,但是每当我要访问任何url时,它都会要求我在访问该页面之前登录。我不知道如何自动登录网站,某些人可能会使用使用cookie,但我不知道如何使用cookie,这是我的代码:
import requests
from bs4 import BeautifulSoup
import csv
from selenium import webdriver
g = csv.writer(open('aliexpressnew.csv', 'a',newline='',encoding="utf-8"))
#g.writerow(['Product Name','Price','Category','Subcategory'])
links = [
"https://www.aliexpress.com/category/205838503/iphones.html?spm=2114.search0103.0.0.6ab01fbbfe33Rm&site=glo&g=n&needQuery=n&tag="
]
for i in links:
getlink = i
while getlink != 0:
chromepath = 'C:\\Users\Faisal\Desktop\python\chromedriver.exe'
driver = webdriver.Chrome(chromepath)
driver.get(getlink)
soup = BeautifulSoup(driver.page_source, 'html.parser')
a
if itemsname1.find(class_='img-container left-block util-clearfix').find(class_='img').find(class_='picRind j-p4plog'):
if itemsname1.find(class_='img-container left-block util-clearfix').find(class_='img').find(class_='picRind j-p4plog').find('img').get('src'):
image = itemsname1.find(class_='img-container left-block util-clearfix').find(class_='img').find(class_='picRind j-p4plog').find('img').get('src')
else:
image = itemsname1.find(class_='img-container left-block util-clearfix').find(class_='img').find(class_='picRind j-p4plog').find('img').get('image-src')
else :
if itemsname1.find(class_='img-container left-block util-clearfix').find(class_='img').find(class_='picRind ').find('img').get('src'):
image = itemsname1.find(class_='img-container left-block util-clearfix').find(class_='img').find(class_='picRind ').find('img').get('src')
else:
image = itemsname1.find(class_='img-container left-block util-clearfix').find(class_='img').find(class_='picRind ').find('img').get('image-src')
image3 = 'http:'+ str(image)
print(title)
print(price)
#print(rating2)
print(image3)
g.writerow([title,price,subcat2,image])
next1 = soup.find(class_='ui-pagination-navi util-left')
if next1.find(class_="page-end ui-pagination-next ui-pagination-disabled"):
getlink=0
else:
next22 = next1.find(class_='page-next ui-pagination-next')
next3 = "http:" + next22.get('href')
getlink = next3
driver.close()
答案 0 :(得分:0)
首先,您需要在通过 Selenium驱动程序打开网站后进行身份验证。 实际上,您实际上并不需要cookie。
您首先需要检查元素来找到驾驶员可以找到的ID,然后使用 send_keys 填写输入内容:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
delay = 10 // seconds before timout
chromepath = 'C:\\Users\Faisal\Desktop\python\chromedriver.exe'
driver = webdriver.Chrome(chromepath)
driver.get(ALI_EXPRESS_LINK)
# In order to wait the full loading of the page
# (Actually waits for the input of the login part, you can find the id by inspecting element, see attached picture)
WebDriverWait(driver, delay).until(EC.presence_of_element_located((By.ID, "fm-login-id")))
element = driver.find_element_by_id("fm-login-id")
element.send_keys(YOUR_LOGIN_ID)
# Doing the same for the password
element = driver.find_element_by_id("fm-login-password")
element.send_keys(YOUR_PASSWORD)
# Then click the submit button
driver.find_element_by_class_name("password-login").click()
别忘了定义:
ALI_EXPRESS_LINK
您的登录名
您的密码
:)
答案 1 :(得分:0)
听起来您正在获得用户名密码浏览器提示,该提示出现在任何页面内容出现之前,如果是这种情况,您可以导航至以下uri:
http://<username>:<password>@your-url-here.com
例如:
http://foo:bar@example.com
答案 2 :(得分:0)
您可以使用存储的凭据自动加载chrome配置文件,以避免手动登录
How to open URL through default Chrome profile using Python Selenium Webdriver
您必须向Webdriver添加chrome选项
options = webdriver.ChromeOptions()
# paths chrome in windows
options.add_argument("user-data-dir=C:/Users/NameUser/AppData/Local/Google/Chrome/User Data")
options.add_argument("profile-directory=Default")
driver = webdriver.Chrome(chromepath, chrome_options=options)
请确保您在正常启动chrome时已存储登录网站的凭据