我正在尝试使用selenium
登录到Facebook,然后将cookies
转移到requests
模块,以便我可以使用requests
从两个URL中收集配置文件名称。 。这两个网址中可用的配置文件名称不是动态的,但确实需要登录。
我的以下脚本可以成功登录,但是在传输cookies
时可能出了点问题,这也许就是脚本在碰到此行AttributeError
时抛出name = soup.select_one("#fb-timeline-cover-name > a").text
的原因。
到目前为止,我已经写过:
import requests
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
url = "https://www.facebook.com/"
links = [
"https://www.facebook.com/hillsendagain?fref=gm&dti=157300781073597&hc_location=group",
"https://www.facebook.com/mark.porton.9?fref=gm&dti=157300781073597&hc_location=group"
]
chrome_options = webdriver.ChromeOptions()
prefs = {"profile.default_content_setting_values.notifications" : 2}
chrome_options.add_experimental_option("prefs",prefs)
driver = webdriver.Chrome(options=chrome_options)
driver.get(url)
driver.find_element_by_id("email").send_keys("username")
driver.find_element_by_id("pass").send_keys("password",Keys.RETURN)
driver_cookies = driver.get_cookies()
c = {c['name']:c['value'] for c in driver_cookies}
for link in links:
res = requests.get(link,headers={'User-Agent':'Mozilla/5.0'},cookies=c)
soup = BeautifulSoup(res.text,"lxml")
name = soup.select_one("#fb-timeline-cover-name > a").text
print(name)
driver.quit()
如何使用请求仅获取个人资料名称?
PS并不是仅使用硒来获取配置文件名称,因为我已经知道如何做到这一点。
答案 0 :(得分:1)
尽管您感兴趣的内容不是动态的,但它们已被注释掉。请尝试以下操作来实现:
for link in links:
content = requests.get(link,headers={'User-Agent':'Mozilla/5.0'},cookies=c).text
comment = content.replace("-->", "").replace("<!--", "")
soup = BeautifulSoup(comment,"lxml")
name = soup.select_one("#fb-timeline-cover-name > a").text
print(name)
我认为,使用会话是您想要使其变得更强大的方法:
s = requests.Session()
[s.cookies.set(cookie['name'],cookie['value']) for cookie in driver.get_cookies()]
for link in links:
content = s.get(link,headers={'User-Agent':'Mozilla/5.0'}).text
comment = content.replace("-->", "").replace("<!--", "")
soup = BeautifulSoup(comment,"lxml")
name = soup.select_one("#fb-timeline-cover-name > a").text
print(name)