bs4 正确解析页面,但 find_all 不返回任何内容

时间:2021-01-18 08:23:44

标签: python beautifulsoup

我正在尝试抓取网站

<块引用>

https://www.automationanywhere.com/resources/customer-stories

我想获取类为 hrefa 内的每个 div 标签的 storyInfoBox

如果我打印 lxml 解析的结果,我可以看到我在结果中寻找什么。

然而, find_all 不返回任何内容。我已经尝试了以下所有内容(所有这些都返回了一个空列表):

BeautifulSoup(requests.get(
    'https://www.automationanywhere.com/resources/customer-stories').content, 'lxml').find_all('a', text=' Read case Study')

BeautifulSoup(requests.get(
    'https://www.automationanywhere.com/resources/customer-stories').content, 'lxml').find_all('a', {'target', '_self'})

BeautifulSoup(requests.get(
    'https://www.automationanywhere.com/resources/customer-stories').content, 'lxml').find_all('div', {'class': 'storyBoxInfo'})

1 个答案:

答案 0 :(得分:0)

你想要storyInfoBox,然后让它寻找storyBoxInfo

其次,这甚至无关紧要,因为页面是动态的并通过 JS 呈现,因此它甚至不会出现在初始 html 源代码中。

您需要使用 selenium 之类的东西先让页面呈现,然后才能获取 html,或者 json 格式在 html <script> 标记内。需要做一些工作才能读取有效的 json,但可以做到:

import requests
from bs4 import BeautifulSoup
import json
import re

def replace_all(text, dic):
    for i, j in dic.items():
        text = text.replace(i, j)
    return text
d = {'true':'"True"', 'false':'"False"'}

url = 'https://www.automationanywhere.com/resources/customer-stories'
response = requests.get(url)

soup = BeautifulSoup(response.text, 'html.parser')

scripts = soup.find_all('script')
for script in scripts:
    if'var customerDetails = ' in script.text:
        jsonStr = script.text
        jsonStr = jsonStr.split('var customerDetails = ')[-1].rsplit(';')[0]
        
        jsonStr = replace_all(jsonStr, d)
        jsonStr = re.sub(r'//"customerName".*?\n', r'', jsonStr)
        jsonStr = re.sub(r'//"customerQuote".*?\n', r'', jsonStr)
        old = ','
        new = ''
        maxreplace = 1
        jsonStr = new.join(jsonStr.rsplit(old, maxreplace))
        
        result = re.findall(r"\"itemId\": \d{0,9},", jsonStr)
        for each in result:
            jsonStr = jsonStr.replace(each, each[:-1])
            
        jsonData = json.loads(jsonStr)
        

links = []
for each in jsonData:
    customerName = each['customerName']
    caseStudy_link = each['caseStudy_link']['link']
    links.append(caseStudy_link)

输出:

print (links)
['https://www.automationanywhere.com/case-study/food-and-beverage-accounts-payable', 'https://www.automationanywhere.com/solutions/financial-services/global-bank-hr', 'https://www.automationanywhere.com/case-study/treasury-one', 'https://www.automationanywhere.com/case-study/synergy-energy-provider', '', 'https://www.automationanywhere.com/case-study/san-diego-county', '/solutions/manufacturing/quad-graphics-manufacturing', '', '/solutions/utilities/company-saves-25k-hours-with-rpa', 'https://www.automationanywhere.com/solutions/healthcare/nhs-lowers-cost-with-rpa', 'https://www.automationanywhere.com/case-study/data-storage-provider', 'https://www.automationanywhere.com/solutions/technology/juniper-networks', '/solutions/financial-services/global-investment-bank', 'https://www.automationanywhere.com/case-study/hgs-insurance', '', 'https://www.automationanywhere.com/case-study/imaging-tools', '/case-study/chain-solution-company', '/solutions/manufacturing/ey-mining-customer', 'https://www.automationanywhere.com/case-study/ey-banking-customer', '/case-study/global-credit-reporting-company', '/solutions/insurance/insurer-reduces-claims-processing-with-zero-errors', '', '', 'https://www.automationanywhere.com/solutions/technology/core-digital-media', 'https://www.automationanywhere.com/case-study/large-commercial-bank', 'https://www.automationanywhere.com/case-study/large-commercial-bank', 'https://www.automationanywhere.com/solutions/manufacturing/global-conglomerate', '/solutions/healthcare/global-medical-technology-company', 'https://www.automationanywhere.com/images/casestudy/Everest_ANZ_practitioner_perspective.pdf', '', '/solutions/insurance/asia-largest-insurance-company', 'https://www.automationanywhere.com/solutions/manufacturing/stant-automotive-supplier', '/solutions/lifesciences/boston-scientific', 'https://www.automationanywhere.com/case-study/zs-professional-services', 'https://www.automationanywhere.com/solutions/energy/cpfl', '', '', '', '', '', '/images/casestudy/Genpact_trucking_logistics.pdf', '', '', '/solutions/finance/bancolombia', '', 'https://www.automationanywhere.com/solutions/pharmaceutical/eli-lilly', '', 'https://www.automationanywhere.com/case-study/university-of-melbourne', 'https://www.automationanywhere.com/solutions/technology/symantec-cyber-security', 'https://www.automationanywhere.com/solutions/financeaccounting/cartus', '', '', 'https://www.automationanywhere.com/solutions/media/australia-post', '', 'https://www.automationanywhere.com/solutions/manufacturing/stanley-black-and-decker', '', 'https://www.automationanywhere.com/case-study/bouygues-telecom', '/case-study/hitachi-vantara', '/solutions/financial-services/pggm-invoice-processing', '', 'https://www.automationanywhere.com/case-study/ahold', 'https://www.automationanywhere.com/solutions/financial-services/st-james-place', '', '', '', 'https://www.automationanywhere.com/solutions/insurance/dai-ichi-life', '', '/case-study/tata-sky-content-distribution', '', '', '', 'https://www.automationanywhere.com/case-study/vale', '/solutions/financial-services/pggm-rpa-drives-productivity', 'https://www.automationanywhere.com/case-study/genworth', '', 'https://www.automationanywhere.com/solutions/manufacturing/eastman-chemical-rpa', '/case-study/bpo-nga-hr', '/solutions/bpo/maximus', '/solutions/chemicals/nouryon', '/solutions/telecom/lyse', 'https://www.automationanywhere.com/solutions/financial-services/monte-titoli', '/solutions/manufacturing/signify', '/case-study/law-practice/husch-blackwell', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '/solutions/financial-services/automation-challenge', '/solutions/hospitality/wam-group', '/solutions/manufacturing/nsg-group', 'https://www.automationanywhere.com/solutions/technology/mcafee', '', 'https://www.automationanywhere.com/solutions/it/volkswagen-india', 'https://www.automationanywhere.com/solutions/financialservices/santandar-consumer-bank', '/solutions/telecom/canadian-telecom-company', '', '', '', 'https://www.automationanywhere.com/solutions/healthcare/r1-rcm', 'https://www.automationanywhere.com/solutions/manufacturing/eastman-purchase-order-processing', '/solutions/healthcare/healthcare-provider-processes-1-billion', '/solutions/finance-accounting/rpa-processes-university-finance-applications', '/solutions/case-study/dch-digital-transformation-with-rpa', '/solutions/healthcare/rpa-deployed-by-nhs-monitors-oxygen', '/solutions/case-study/university-of-melbourne', '/solutions/finance-accounting/tricor-uses-rpa-for-digital-transformation', 'https://www.automationanywhere.com/solution/covid-19/osv-manages-covid-processes-with-rpa', 'https://www.automationanywhere.com/case-study/sprint', '/solutions/healthcare/rpa-helps-hologic-save-costs', '', 'https://www.automationanywhere.com/solutions/technology/dell-uses-rpa-for-hr-processes', '/solutions/healthcare/physical-therapy-network-saves-costs-with-rpa', 'https://www.automationanywhere.com/solutions/finance-accounting/bae-systems-scales-processes-with-rpa', '/solutions/financial-services/keybank', '/solutions/healthcare/rpa-helps-vapotherm-combat-covid-19', '', '/solutions/financial-services/wustenrot-and-wurttenbergische-gruppe', '/solutions/financial-services/patelco-credit-union', '/solutions/manufacturing/great-lakes-tapes-rpa']
相关问题