我正在尝试抓取网站
<块引用>https://www.automationanywhere.com/resources/customer-stories
我想获取类为 href
的 a
内的每个 div
标签的 storyInfoBox
。
如果我打印 lxml
解析的结果,我可以看到我在结果中寻找什么。
然而, find_all 不返回任何内容。我已经尝试了以下所有内容(所有这些都返回了一个空列表):
BeautifulSoup(requests.get(
'https://www.automationanywhere.com/resources/customer-stories').content, 'lxml').find_all('a', text=' Read case Study')
BeautifulSoup(requests.get(
'https://www.automationanywhere.com/resources/customer-stories').content, 'lxml').find_all('a', {'target', '_self'})
BeautifulSoup(requests.get(
'https://www.automationanywhere.com/resources/customer-stories').content, 'lxml').find_all('div', {'class': 'storyBoxInfo'})
答案 0 :(得分:0)
你想要storyInfoBox
,然后让它寻找storyBoxInfo
其次,这甚至无关紧要,因为页面是动态的并通过 JS 呈现,因此它甚至不会出现在初始 html 源代码中。
您需要使用 selenium 之类的东西先让页面呈现,然后才能获取 html,或者 json 格式在 html <script>
标记内。需要做一些工作才能读取有效的 json,但可以做到:
import requests
from bs4 import BeautifulSoup
import json
import re
def replace_all(text, dic):
for i, j in dic.items():
text = text.replace(i, j)
return text
d = {'true':'"True"', 'false':'"False"'}
url = 'https://www.automationanywhere.com/resources/customer-stories'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
scripts = soup.find_all('script')
for script in scripts:
if'var customerDetails = ' in script.text:
jsonStr = script.text
jsonStr = jsonStr.split('var customerDetails = ')[-1].rsplit(';')[0]
jsonStr = replace_all(jsonStr, d)
jsonStr = re.sub(r'//"customerName".*?\n', r'', jsonStr)
jsonStr = re.sub(r'//"customerQuote".*?\n', r'', jsonStr)
old = ','
new = ''
maxreplace = 1
jsonStr = new.join(jsonStr.rsplit(old, maxreplace))
result = re.findall(r"\"itemId\": \d{0,9},", jsonStr)
for each in result:
jsonStr = jsonStr.replace(each, each[:-1])
jsonData = json.loads(jsonStr)
links = []
for each in jsonData:
customerName = each['customerName']
caseStudy_link = each['caseStudy_link']['link']
links.append(caseStudy_link)
输出:
print (links)
['https://www.automationanywhere.com/case-study/food-and-beverage-accounts-payable', 'https://www.automationanywhere.com/solutions/financial-services/global-bank-hr', 'https://www.automationanywhere.com/case-study/treasury-one', 'https://www.automationanywhere.com/case-study/synergy-energy-provider', '', 'https://www.automationanywhere.com/case-study/san-diego-county', '/solutions/manufacturing/quad-graphics-manufacturing', '', '/solutions/utilities/company-saves-25k-hours-with-rpa', 'https://www.automationanywhere.com/solutions/healthcare/nhs-lowers-cost-with-rpa', 'https://www.automationanywhere.com/case-study/data-storage-provider', 'https://www.automationanywhere.com/solutions/technology/juniper-networks', '/solutions/financial-services/global-investment-bank', 'https://www.automationanywhere.com/case-study/hgs-insurance', '', 'https://www.automationanywhere.com/case-study/imaging-tools', '/case-study/chain-solution-company', '/solutions/manufacturing/ey-mining-customer', 'https://www.automationanywhere.com/case-study/ey-banking-customer', '/case-study/global-credit-reporting-company', '/solutions/insurance/insurer-reduces-claims-processing-with-zero-errors', '', '', 'https://www.automationanywhere.com/solutions/technology/core-digital-media', 'https://www.automationanywhere.com/case-study/large-commercial-bank', 'https://www.automationanywhere.com/case-study/large-commercial-bank', 'https://www.automationanywhere.com/solutions/manufacturing/global-conglomerate', '/solutions/healthcare/global-medical-technology-company', 'https://www.automationanywhere.com/images/casestudy/Everest_ANZ_practitioner_perspective.pdf', '', '/solutions/insurance/asia-largest-insurance-company', 'https://www.automationanywhere.com/solutions/manufacturing/stant-automotive-supplier', '/solutions/lifesciences/boston-scientific', 'https://www.automationanywhere.com/case-study/zs-professional-services', 'https://www.automationanywhere.com/solutions/energy/cpfl', '', '', '', '', '', '/images/casestudy/Genpact_trucking_logistics.pdf', '', '', '/solutions/finance/bancolombia', '', 'https://www.automationanywhere.com/solutions/pharmaceutical/eli-lilly', '', 'https://www.automationanywhere.com/case-study/university-of-melbourne', 'https://www.automationanywhere.com/solutions/technology/symantec-cyber-security', 'https://www.automationanywhere.com/solutions/financeaccounting/cartus', '', '', 'https://www.automationanywhere.com/solutions/media/australia-post', '', 'https://www.automationanywhere.com/solutions/manufacturing/stanley-black-and-decker', '', 'https://www.automationanywhere.com/case-study/bouygues-telecom', '/case-study/hitachi-vantara', '/solutions/financial-services/pggm-invoice-processing', '', 'https://www.automationanywhere.com/case-study/ahold', 'https://www.automationanywhere.com/solutions/financial-services/st-james-place', '', '', '', 'https://www.automationanywhere.com/solutions/insurance/dai-ichi-life', '', '/case-study/tata-sky-content-distribution', '', '', '', 'https://www.automationanywhere.com/case-study/vale', '/solutions/financial-services/pggm-rpa-drives-productivity', 'https://www.automationanywhere.com/case-study/genworth', '', 'https://www.automationanywhere.com/solutions/manufacturing/eastman-chemical-rpa', '/case-study/bpo-nga-hr', '/solutions/bpo/maximus', '/solutions/chemicals/nouryon', '/solutions/telecom/lyse', 'https://www.automationanywhere.com/solutions/financial-services/monte-titoli', '/solutions/manufacturing/signify', '/case-study/law-practice/husch-blackwell', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '/solutions/financial-services/automation-challenge', '/solutions/hospitality/wam-group', '/solutions/manufacturing/nsg-group', 'https://www.automationanywhere.com/solutions/technology/mcafee', '', 'https://www.automationanywhere.com/solutions/it/volkswagen-india', 'https://www.automationanywhere.com/solutions/financialservices/santandar-consumer-bank', '/solutions/telecom/canadian-telecom-company', '', '', '', 'https://www.automationanywhere.com/solutions/healthcare/r1-rcm', 'https://www.automationanywhere.com/solutions/manufacturing/eastman-purchase-order-processing', '/solutions/healthcare/healthcare-provider-processes-1-billion', '/solutions/finance-accounting/rpa-processes-university-finance-applications', '/solutions/case-study/dch-digital-transformation-with-rpa', '/solutions/healthcare/rpa-deployed-by-nhs-monitors-oxygen', '/solutions/case-study/university-of-melbourne', '/solutions/finance-accounting/tricor-uses-rpa-for-digital-transformation', 'https://www.automationanywhere.com/solution/covid-19/osv-manages-covid-processes-with-rpa', 'https://www.automationanywhere.com/case-study/sprint', '/solutions/healthcare/rpa-helps-hologic-save-costs', '', 'https://www.automationanywhere.com/solutions/technology/dell-uses-rpa-for-hr-processes', '/solutions/healthcare/physical-therapy-network-saves-costs-with-rpa', 'https://www.automationanywhere.com/solutions/finance-accounting/bae-systems-scales-processes-with-rpa', '/solutions/financial-services/keybank', '/solutions/healthcare/rpa-helps-vapotherm-combat-covid-19', '', '/solutions/financial-services/wustenrot-and-wurttenbergische-gruppe', '/solutions/financial-services/patelco-credit-union', '/solutions/manufacturing/great-lakes-tapes-rpa']