我正在尝试从以下网站上的Zillow贷方资料中抓取所需信息:https://www.zillow.com/lender-directory/?sort=Relevance&location=Alabama%20Shores%20Muscle%20Shoals%20AL&language=English&page=1
我知道如何用漂亮的汤来抓取信息...我只是想在每个配置文件的可点击链接上创建一个列表,以便我可以迭代到每个配置文件...抓取所需的信息(我可以这样做),然后回到起始页面并转到下一个个人资料链接...可能是一个简单的解决方案,但我已经尝试了好几个小时才能获得可点击的链接列表,我想是时候问大声笑了
谢谢
ive尝试了多种方法来获取可点击链接的列表,但可能未正确实现,因此我愿意接受同样的建议进行仔细检查
from selenium import webdriver
from bs4 import BeautifulSoup
from selenium.webdriver.common.by import By
import time
#Driver to get website...need to get phantomJS going..
driver = webdriver.Chrome(r'C:\Users\mfoytlin\Desktop\chromedriver.exe')
driver.get('https://www.zillow.com/lender-directory/?sort=Relevance&location=Alabama%20Shores%20Muscle%20Shoals%20AL&language=English&page=1')
time.sleep(2)
#Get page HTML data
soup = BeautifulSoup(driver.page_source, 'html.parser')
profile_links = []
profile_links = driver.find_elements_by_xpath("//div[@class='zsg-content-item']//a")
for profile in range(len(profile_links)):
profile_links = driver.find_elements_by_xpath("//div[@class='zsg-content-item']//a")
profile_links[profile].click()
time.sleep(2)
driver.back()
time.sleep(2)
答案 0 :(得分:0)
您可以使用这种方法找到所有可点击的链接。这是用Java编写的。您可以用python编写等效内容。
List<WebElement> Links = driver.findElements(By.xpath("//div[@class='zsg-content-item']//a"));
ArrayList<String> capturedLinks = new ArrayList<>();
for(WebElement link:Links)
{
String myLink = "https://www.zillow.com"+ link.getAttribute("href")
if(!capturedLinks.contains(myLink)) //to avoid duplicates
{
capturedLinks.add(myLink);
}
}
答案 1 :(得分:0)
find_elements
参数错误,您可以尝试以下任一种方法。
这是您使用find_elements
()
def find_elements(self, by=By.ID, value=None):
"""
Find elements given a By strategy and locator. Prefer the find_elements_by_* methods when
possible.
:Usage:
elements = driver.find_elements(By.CLASS_NAME, 'foo')
:rtype: list of WebElement
"""
if self.w3c:
if by == By.ID:
by = By.CSS_SELECTOR
value = '[id="%s"]' % value
elif by == By.TAG_NAME:
by = By.CSS_SELECTOR
elif by == By.CLASS_NAME:
by = By.CSS_SELECTOR
value = ".%s" % value
elif by == By.NAME:
by = By.CSS_SELECTOR
value = '[name="%s"]' % value
# Return empty list if driver returns null
# See https://github.com/SeleniumHQ/selenium/issues/4555
return self.execute(Command.FIND_ELEMENTS, {
'using': by,
'value': value})['value'] or []
尝试以下任一选项
profile_links = driver.find_elements_by_xpath("//div[@class='zsg-content-item']//a")
OR
profile_links = driver.find_elements(By.XPATH,"//div[@class='zsg-content-item']//a")
使用上述代码时,这里是列表。
['https://www.zillow.comhttps://www.zillow.com/lender-profile/courtneyhall17/', 'https://www.zillow.comhttps://www.zillow.com/lender-profile/SouthPointBank/', 'https://www.zillow.comhttps://www.zillow.com/lender-profile/kmcdaniel77/', 'https://www.zillow.comhttps://www.zillow.com/lender-profile/jdowney75/', 'https://www.zillow.comhttps://www.zillow.com/lender-profile/fredabutler/', 'https://www.zillow.comhttps://www.zillow.com/lender-profile/justindorroh/', 'https://www.zillow.comhttps://www.zillow.com/lender-profile/aball731/', 'https://www.zillow.comhttps://www.zillow.com/lender-profile/1stfedmort/', 'https://www.zillow.comhttps://www.zillow.com/lender-profile/tstutts/', 'https://www.zillow.comhttps://www.zillow.com/lender-profile/sbeckett0/', 'https://www.zillow.comhttps://www.zillow.com/lender-profile/DebiBretherick/', 'https://www.zillow.comhttps://www.zillow.com/lender-profile/cking313/', 'https://www.zillow.comhttps://www.zillow.com/lender-profile/Gregory%20Angus/', 'https://www.zillow.comhttps://www.zillow.com/lender-profile/cbsbankmarketing/', 'https://www.zillow.comhttps://www.zillow.com/lender-profile/ajones392/', 'https://www.zillow.comhttps://www.zillow.com/lender-profile/sschulte6/', 'https://www.zillow.comhttps://www.zillow.com/lender-profile/dreamhomemortgagellc/', 'https://www.zillow.comhttps://www.zillow.com/lender-profile/DarleenBrooksHill/', 'https://www.zillow.comhttps://www.zillow.com/lender-profile/sjones966/', 'https://www.zillow.comhttps://www.zillow.com/lender-profile/BlakeRobbins4/', 'https://www.zillow.comhttps://www.zillow.com/lender-profile/zajones5746/', 'https://www.zillow.comhttps://www.zillow.com/lender-profile/adeline%20perkins/']
已编辑
正如我所说,您需要重新分配元素。
profile_links = driver.find_elements_by_xpath("//div[@class='ld-lender-info-column']//h2//a")
for profile in range(len(profile_links)):
profile_links = driver.find_elements_by_xpath("//div[@class='ld-lender-info-column']//h2//a")
driver.execute_script("arguments[0].click();", profile_links[profile])
time.sleep(2)
driver.back()
time.sleep(2)
答案 2 :(得分:0)
我想以下脚本可能会做您想要的。简而言之,该脚本将从其登录页面解析配置文件链接,然后遍历这些链接以从其目标页面中抓取该名称。
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
url = 'https://www.zillow.com/lender-directory/?sort=Relevance&location=Alabama%20Shores%20Muscle%20Shoals%20AL&language=English&page=1'
with webdriver.Chrome() as driver:
wait = WebDriverWait(driver,10)
driver.get(url)
items = [item.get_attribute("href") for item in wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR,"h2 > a[href^='/lender-profile/']")))]
for profilelink in items:
driver.get(profilelink)
name = wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, "h1.lender-name"))).text
print(name)