我正在尝试为这本website上的课程列表削价。
但是,我无法找到一个页面,在该页面上可以看到课程的全部清单及其价格。
我能够提供以下代码,从而为单门课程拉高了价格:
import pandas as pd
import requests
url = "https://www.learningconnection.philips.com/en/course/pinnacle%C2%B3-advanced-planning-education"
html = requests.get(url)
soup = BeautifulSoup(html.text, 'html.parser')
price = soup.select_one("[class*='field-price'] .even").text
print(price)
任何帮助/建议表示赞赏!
答案 0 :(得分:1)
这是循环浏览感兴趣的区域的一种方法。使用bs4 4.7.1 +可以访问:contains
import requests
from bs4 import BeautifulSoup as bs
base = 'https://www.learningconnection.philips.com'
url = f'{base}/en/catalog/profession/biomedical-engineers'
courses = []
results = []
with requests.Session() as s:
r = s.get(url)
soup = bs(r.content, 'lxml')
links = [base + i['href'] for i in soup.select('h3 a')]
for link in links:
r = s.get(link)
soup = bs(r.content, 'lxml')
courses+=[i['href'] for i in soup.select('.title a')]
for course in courses:
r = s.get(course)
soup = bs(r.content, 'lxml')
price = soup.select_one('em:contains("Tuition:")')
if price is None:
price = 'Not listed'
else:
price = price.text.replace('\xa0',' ')
result = {'Title':soup.select_one('#page-title').text.replace('\xa0',' ')
,'Description': soup.select_one('.field-item p').text.replace('\xa0',' ')
,'Price': price
, 'Url':course}
results.append(result)
print(results)
答案 1 :(得分:0)
您可以通过将搜索锚定在商品的父包装上来找到价格:
import requests
from bs4 import BeautifulSoup as soup
d = soup(requests.get('https://www.learningconnection.philips.com/en/course/pinnacle%C2%B3-advanced-planning-education').text, 'html.parser')
prices = [i.find_all('div', {'class':'field-item even'})[2].text for i in d.find_all('fieldset', {'class':' group-overview field-group-fieldset panel panel-default form-wrapper'})]
输出:
['5141.00']
答案 2 :(得分:0)
使用关键字搜索项目,然后根据该搜索结果获取所有网址。一旦获得url循环,就结束了。
from bs4 import BeautifulSoup
import requests
Search_key='pinnacle'
url = "https://www.learningconnection.philips.com/en/search/site/{}".format(Search_key)
html = requests.get(url)
soup = BeautifulSoup(html.text, 'html.parser')
urls=[item['href'] for item in soup.select('h3.title > a')]
price=[]
for url in urls:
soup = BeautifulSoup(requests.get(url).text, 'html.parser')
if soup.select_one("[class*='field-price'] .even"):
price.append(soup.select_one("[class*='field-price'] .even").text)
print(price)
输出:
['5171.00', '5171.00', '3292.00', '5141.00', '4309.00', '2130.00', '2130.00', '2130.00']
您还可以打印课程标题。
from bs4 import BeautifulSoup
import requests
Search_key='pinnacle'
url = "https://www.learningconnection.philips.com/en/search/site/{}".format(Search_key)
html = requests.get(url)
soup = BeautifulSoup(html.text, 'html.parser')
urls=[item['href'] for item in soup.select('h3.title > a')]
price=[]
title=[]
for url in urls:
soup = BeautifulSoup(requests.get(url).text, 'html.parser')
if soup.select_one("[class*='field-price'] .even"):
title.append(soup.select_one("h1#page-title").text)
price.append(soup.select_one("[class*='field-price'] .even").text)
print(title)
print(price)
输出:
['Pinnacle³ Auto Segmentation with SPICE', 'Pinnacle³ Dynamic Planning', 'Pinnacle³ Additional Education', 'Pinnacle³ Advanced Planning Education', 'Pinnacle³ Basic Planning Education', 'Pinnacle³ Physics Modeling', 'Pinnacle³ Level I Basic Planning Education', 'Pinnacle³ Level II Education']
['5171.00', '5171.00', '3292.00', '5141.00', '4309.00', '2130.00', '2130.00', '2130.00']
已编辑
from bs4 import BeautifulSoup
import requests
Search_key='biomed'
url = "https://www.learningconnection.philips.com/en/search/site/{}".format(Search_key)
html = requests.get(url)
soup = BeautifulSoup(html.text, 'html.parser')
urls=[item['href'] for item in soup.select('h3.title > a')]
print(len(urls))
price=[]
title=[]
for url in urls:
soup = BeautifulSoup(requests.get(url).text, 'html.parser')
if soup.select_one("[class*='field-price'] .even"):
title.append(soup.select_one("h1#page-title").text)
price.append(soup.select_one("[class*='field-price'] .even").text)
print(title)
print(price)
输出:
28
['NETWORK CONCEPTS (BIOMED)']
['4875.00']