Question

嘿，我正在尝试从aliexpress抓取一些数据，但是每当我要访问任何url时，它都会要求我在访问该页面之前登录。我不知道如何自动登录网站，某些人可能会使用使用cookie，但我不知道如何使用cookie，这是我的代码：

import requests
from bs4 import BeautifulSoup
import csv
from selenium import webdriver

g = csv.writer(open('aliexpressnew.csv', 'a',newline='',encoding="utf-8"))
#g.writerow(['Product Name','Price','Category','Subcategory'])

links = [
        "https://www.aliexpress.com/category/205838503/iphones.html?spm=2114.search0103.0.0.6ab01fbbfe33Rm&site=glo&g=n&needQuery=n&tag="

        ]


for i in links:
    getlink = i

    while getlink != 0:
        chromepath = 'C:\\Users\Faisal\Desktop\python\chromedriver.exe'
        driver = webdriver.Chrome(chromepath)
        driver.get(getlink)
        soup = BeautifulSoup(driver.page_source, 'html.parser')


        a





            if itemsname1.find(class_='img-container left-block util-clearfix').find(class_='img').find(class_='picRind j-p4plog'):
                if itemsname1.find(class_='img-container left-block util-clearfix').find(class_='img').find(class_='picRind j-p4plog').find('img').get('src'):
                    image = itemsname1.find(class_='img-container left-block util-clearfix').find(class_='img').find(class_='picRind j-p4plog').find('img').get('src')
                else:

                    image = itemsname1.find(class_='img-container left-block util-clearfix').find(class_='img').find(class_='picRind j-p4plog').find('img').get('image-src')

            else :
                if itemsname1.find(class_='img-container left-block util-clearfix').find(class_='img').find(class_='picRind ').find('img').get('src'):
                    image = itemsname1.find(class_='img-container left-block util-clearfix').find(class_='img').find(class_='picRind ').find('img').get('src')
                else:

                    image = itemsname1.find(class_='img-container left-block util-clearfix').find(class_='img').find(class_='picRind ').find('img').get('image-src')


            image3 = 'http:'+ str(image)





            print(title)
            print(price)
            #print(rating2)
            print(image3)

            g.writerow([title,price,subcat2,image])


        next1 = soup.find(class_='ui-pagination-navi util-left')
        if next1.find(class_="page-end ui-pagination-next ui-pagination-disabled"):

            getlink=0

        else:   

            next22 = next1.find(class_='page-next ui-pagination-next')
            next3 = "http:" + next22.get('href')
            getlink = next3
        driver.close()

Answer 1

首先，您需要在通过 Selenium驱动程序打开网站后进行身份验证。实际上，您实际上并不需要cookie。

您首先需要检查元素来找到驾驶员可以找到的ID，然后使用 send_keys 填写输入内容：

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

delay = 10 // seconds before timout

chromepath = 'C:\\Users\Faisal\Desktop\python\chromedriver.exe'
driver = webdriver.Chrome(chromepath)

driver.get(ALI_EXPRESS_LINK)

# In order to wait the full loading of the page
# (Actually waits for the input of the login part, you can find the id by inspecting element, see attached picture)
WebDriverWait(driver, delay).until(EC.presence_of_element_located((By.ID, "fm-login-id")))

element = driver.find_element_by_id("fm-login-id")
element.send_keys(YOUR_LOGIN_ID)

# Doing the same for the password
element = driver.find_element_by_id("fm-login-password")
element.send_keys(YOUR_PASSWORD)

# Then click the submit button
driver.find_element_by_class_name("password-login").click()

别忘了定义：

ALI_EXPRESS_LINK
您的登录名
您的密码

:)

附加：

Answer 2

听起来您正在获得用户名密码浏览器提示，该提示出现在任何页面内容出现之前，如果是这种情况，您可以导航至以下uri：

http://<username>:<password>@your-url-here.com

例如：

http://foo:bar@example.com

Answer 3

您可以使用存储的凭据自动加载chrome配置文件，以避免手动登录

How to open URL through default Chrome profile using Python Selenium Webdriver

您必须向Webdriver添加chrome选项

options = webdriver.ChromeOptions()
# paths chrome in windows
options.add_argument("user-data-dir=C:/Users/NameUser/AppData/Local/Google/Chrome/User Data")
options.add_argument("profile-directory=Default")

driver = webdriver.Chrome(chromepath, chrome_options=options)

请确保您在正常启动chrome时已存储登录网站的凭据

我如何使用python和selenium自动登录网站

3 个答案: