尝试查找某些文本是否存在

时间:2019-07-08 15:39:28

标签: python selenium web-scraping beautifulsoup

我试图查看我正在抓取的网站的个人资料页面上是否存在“国家注册”文本。它位于“许可工作于:”文本之后的位置... ...如果包含文本,我会将其许可证类型写为“国家注册”的csv文件,如果该文本不存在,我将为该文本写“ state” csv文件中的许可...这就是问题/编码逻辑,我正在使用

这里有指向个人资料页面的链接,我正在测试我的代码 https://www.zillow.com/lender-profile/zackdisinger/

它一直打印错误...下面是我正在尝试的代码

from selenium import webdriver
from bs4 import BeautifulSoup
import time

#Chrome webdriver filepath...Chromedriver version 74
driver = webdriver.Chrome(r'C:\Users\mfoytlin\Desktop\chromedriver.exe')
page = driver.get('https://www.zillow.com/lender-profile/zackdisinger/')
time.sleep(2)
show_more_button = driver.find_element_by_class_name('zsg-wrapper-footer').click()
time.sleep(2)
soup = BeautifulSoup(driver.page_source, 'html.parser')


if soup.find(text='Nationally registered'):
    print('Success')
else:
    print('False')

4 个答案:

答案 0 :(得分:2)

对于bs4 4.7.1,您可以使用:contains检查包含该字符串的p标签。我给出了对/错,虽然很容易适应成功/错

from selenium import webdriver
from bs4 import BeautifulSoup
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

#Chrome webdriver filepath...Chromedriver version 74
driver = webdriver.Chrome(r'C:\Users\mfoytlin\Desktop\chromedriver.exe')
page = driver.get('https://www.zillow.com/lender-profile/zackdisinger/')
WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR, ".zsg-wrapper-footer a"))).click()
soup = BeautifulSoup(driver.page_source, 'html.parser')
data = soup.select_one('p:contains("Nationally registered")')
print(data is not None)

答案 1 :(得分:1)

数据是通过AJAX从不同的URL加载的:

import re
import requests
import json

url = 'https://www.zillow.com/lender-profile/zackdisinger/'
screen_name = [i for i in url.split('/') if i][-1]
r = requests.get(url).text

url_json = 'https://mortgageapi.zillow.com/getRegisteredLender?partnerId=' + re.search(r'"partnerId":"(.*?)"', r).group(1)
payload = {"fields":["aboutMe","address","cellPhone","contactLenderFormDisclaimer","companyName","employerMemberFDIC","employerScreenName","equalHousingLogo","faxPhone","hideCellPhone","imageId","individualName","languagesSpoken","memberFDIC","nationallyRegistered","nmlsId","nmlsType","officePhone","rating","screenName","stateLicenses","stateSponsorships","title","totalReviews","website"],"lenderRef":{"screenName":screen_name}}
data = requests.post(url_json, json=payload).json()
print(json.dumps(data, indent=4))
print()
print('Is nationally registered =', data['lender']['nationallyRegistered'])

打印:

{
    "lender": {
        "aboutMe": "From day one I provide the utmost relational-based experience to make you feel comfortable with your home financing decisions.\n\nEmpowerment and integrity is key to successfully making a home loan a smooth process from start to finish. Acquiring a mortgage in today's market takes product knowledge and underwriting know how. Every client has their own story, their own future. I am here to match today's mortgages to clients dreams of home-ownership.\n",
        "address": {
            "address": "10412 Allisonville Rd Suite 50",
            "city": "Fishers",
            "stateAbbreviation": "IN",
            "zipCode": "46038"
        },
        "companyName": "Bank of England Mortgage",
        "employerMemberFDIC": true,
        "employerScreenName": "BoEMortgage",
        "equalHousingLogo": "EqualHousingLender",
        "faxPhone": {
            "areaCode": "317",
            "number": "3754",
            "prefix": "536"
        },
        "id": "ZU101hnzx7ntuyx_8z2sb",
        "imageId": "2910837992a9cc44d31c26bd7532d2dd",
        "individualName": {
            "firstName": "Zachary",
            "lastName": "Disinger"
        },
        "languagesSpoken": [],
        "nationallyRegistered": true,
        "nmlsId": 1053091,
        "nmlsType": "Individual",
        "officePhone": {
            "areaCode": "317",
            "number": "0416",
            "prefix": "252"
        },
        "rating": 5.0,
        "screenName": "zackdisinger",
        "stateLicenses": {},
        "stateSponsorships": {},
        "title": "Mortgage Banker",
        "totalReviews": 120,
        "website": "http://boeindy.com"
    }
}

Is nationally registered = True

答案 2 :(得分:1)

使用正则表达式re检查文本是否存在。这是您的代码。

from selenium import webdriver
from bs4 import BeautifulSoup
import time
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import re

#Chrome webdriver filepath...Chromedriver version 74
driver = webdriver.Chrome(r'C:\Users\mfoytlin\Desktop\chromedriver.exe')
page = driver.get('https://www.zillow.com/lender-profile/zackdisinger/')
show_more_button =WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, "//a[contains(.,'Show')][contains(.,'more')]")))
#driver.execute_script("arguments[0].click();", show_more_button)
show_more_button.click()
time.sleep(2)
soup = BeautifulSoup(driver.page_source, 'html.parser')


if soup.find(text=re.compile('Nationally registered')):
    print('Success')
else:
    print('False')

在控制台上打印成功。

Success

答案 3 :(得分:0)

尝试这样的条件块,

if(driver.findElement(By.xpath("//p[contains(text(),'Nationally registered')]").isDisplayed())
{
 print('Success')
}
else {
print ('False')
}