无法使用请求从相同的网址收集配置文件名称

时间:2019-09-12 07:13:02

标签: python python-3.x selenium web-scraping python-requests

我正在尝试使用selenium登录到Facebook,然后将cookies转移到requests模块,以便我可以使用requests从两个URL中收集配置文件名称。 。这两个网址中可用的配置文件名称不是动态的,但确实需要登录。

我的以下脚本可以成功登录,但是在传输cookies时可能出了点问题,这也许就是脚本在碰到此行AttributeError时抛出name = soup.select_one("#fb-timeline-cover-name > a").text的原因。

到目前为止,我已经写过:

import requests
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.common.keys import Keys

url = "https://www.facebook.com/"

links = [
    "https://www.facebook.com/hillsendagain?fref=gm&dti=157300781073597&hc_location=group",
    "https://www.facebook.com/mark.porton.9?fref=gm&dti=157300781073597&hc_location=group"
]

chrome_options = webdriver.ChromeOptions()
prefs = {"profile.default_content_setting_values.notifications" : 2}
chrome_options.add_experimental_option("prefs",prefs)
driver = webdriver.Chrome(options=chrome_options)

driver.get(url)
driver.find_element_by_id("email").send_keys("username")
driver.find_element_by_id("pass").send_keys("password",Keys.RETURN)

driver_cookies = driver.get_cookies()
c = {c['name']:c['value'] for c in driver_cookies}

for link in links:
    res = requests.get(link,headers={'User-Agent':'Mozilla/5.0'},cookies=c)
    soup = BeautifulSoup(res.text,"lxml")
    name = soup.select_one("#fb-timeline-cover-name > a").text
    print(name)
driver.quit()

如何使用请求仅获取个人资料名称?

  

PS并不是仅使用硒来获取配置文件名称,因为我已经知道如何做到这一点。

1 个答案:

答案 0 :(得分:1)

尽管您感兴趣的内容不是动态的,但它们已被注释掉。请尝试以下操作来实现:

for link in links:
    content = requests.get(link,headers={'User-Agent':'Mozilla/5.0'},cookies=c).text
    comment = content.replace("-->", "").replace("<!--", "")
    soup = BeautifulSoup(comment,"lxml")
    name = soup.select_one("#fb-timeline-cover-name > a").text
    print(name)

我认为,使用会话是您想要使其变得更强大的方法:

s = requests.Session()
[s.cookies.set(cookie['name'],cookie['value']) for cookie in driver.get_cookies()]

for link in links:
    content = s.get(link,headers={'User-Agent':'Mozilla/5.0'}).text
    comment = content.replace("-->", "").replace("<!--", "")
    soup = BeautifulSoup(comment,"lxml")
    name = soup.select_one("#fb-timeline-cover-name > a").text
    print(name)