JSON 数据网页抓取

时间:2021-06-22 17:52:13

标签: python json web-scraping python-requests

我试图从 here 中抓取职位名称。

本网站的第一页包含 50 个职位。使用请求,我试图从第一页抓取职位名称。我只有 10 个职位。我无法从第一页抓取所有 50 个职位。使用 Developertool > 网络,我了解到内容类型是 JSON。

from bs4 import BeautifulSoup
import requests
import json

s = requests.Session()

headers = {
    'Connection': 'keep-alive',
    'Pragma': 'no-cache',
    'Cache-Control': 'no-cache',
    'sec-ch-ua': '^\\^',
    'Accept': 'application/json, text/javascript, */*; q=0.01',
    'sec-ch-ua-mobile': '?0',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.106 Safari/537.36',
    'Origin': 'https://jobs.porsche.com',
    'Sec-Fetch-Site': 'same-site',
    'Sec-Fetch-Mode': 'cors',
    'Sec-Fetch-Dest': 'empty',
    'Referer': 'https://jobs.porsche.com/',
    'Accept-Language': 'en-US,en;q=0.9',
}

r1 = s.get('https://api-jobs.porsche.com/search/?data=^%^7B^%^22LanguageCode^%^22^%^3A^%^22DE^%^22^%^2C^%^22SearchParameters^%^22^%^3A^%^7B^%^22FirstItem^%^22^%^3A1^%^2C^%^22CountItem^%^22^%^3A50^%^2C^%^22Sort^%^22^%^3A^%^5B^%^7B^%^22Criterion^%^22^%^3A^%^22PublicationStartDate^%^22^%^2C^%^22Direction^%^22^%^3A^%^22DESC^%^22^%^7D^%^5D^%^2C^%^22MatchedObjectDescriptor^%^22^%^3A^%^5B^%^22ID^%^22^%^2C^%^22PositionTitle^%^22^%^2C^%^22PositionURI^%^22^%^2C^%^22PositionLocation.CountryName^%^22^%^2C^%^22PositionLocation.CityName^%^22^%^2C^%^22PositionLocation.Longitude^%^22^%^2C^%^22PositionLocation.Latitude^%^22^%^2C^%^22PositionLocation.PostalCode^%^22^%^2C^%^22PositionLocation.StreetName^%^22^%^2C^%^22PositionLocation.BuildingNumber^%^22^%^2C^%^22PositionLocation.Distance^%^22^%^2C^%^22JobCategory.Name^%^22^%^2C^%^22PublicationStartDate^%^22^%^2C^%^22ParentOrganizationName^%^22^%^2C^%^22ParentOrganization^%^22^%^2C^%^22OrganizationShortName^%^22^%^2C^%^22CareerLevel.Name^%^22^%^2C^%^22JobSector.Name^%^22^%^2C^%^22PositionIndustry.Name^%^22^%^2C^%^22PublicationCode^%^22^%^2C^%^22PublicationChannel.Id^%^22^%^5D^%^7D^%^2C^%^22SearchCriteria^%^22^%^3A^%^5B^%^7B^%^22CriterionName^%^22^%^3A^%^22PublicationChannel.Code^%^22^%^2C^%^22CriterionValue^%^22^%^3A^%^5B^%^2212^%^22^%^5D^%^7D^%^2C^%^7B^%^22CriterionName^%^22^%^3A^%^22PublicationChannel.Code^%^22^%^2C^%^22CriterionValue^%^22^%^3A^%^5B^%^2212^%^22^%^5D^%^7D^%^5D^%^7D', headers=headers).json()

data1 = json.dumps(r1)
print(data1)
d1 = json.loads(data1)
#print(d1.keys)
for x in d1.keys():
    print(x)

非常感谢您对此的任何帮助。

不幸的是,我目前仅限于使用请求或其他流行的 Python 库。提前致谢。

1 个答案:

答案 0 :(得分:0)

要获得所有 50 个结果,您可以使用下一个示例:

import json
import requests

api_url = "https://api-jobs.porsche.com/search/"
query = {
    "data": '{"LanguageCode":"DE","SearchParameters":{"FirstItem":1,"CountItem":50,"Sort":[{"Criterion":"PublicationStartDate","Direction":"DESC"}],"MatchedObjectDescriptor":["ID","PositionTitle","PositionURI","PositionLocation.CountryName","PositionLocation.CityName","PositionLocation.Longitude","PositionLocation.Latitude","PositionLocation.PostalCode","PositionLocation.StreetName","PositionLocation.BuildingNumber","PositionLocation.Distance","JobCategory.Name","PublicationStartDate","ParentOrganizationName","ParentOrganization","OrganizationShortName","CareerLevel.Name","JobSector.Name","PositionIndustry.Name","PublicationCode","PublicationChannel.Id"]},"SearchCriteria":[{"CriterionName":"PublicationChannel.Code","CriterionValue":["12"]},{"CriterionName":"PublicationChannel.Code","CriterionValue":["12"]}]}'
}

data = requests.get(api_url, params=query).json()

# uncomment this to print all data:
# print(json.dumps(data, indent=4))

for i, r in enumerate(data["SearchResult"]["SearchResultItems"], 1):
    print(i, r["MatchedObjectDescriptor"]["PositionTitle"])

打印:

1 Praktikant (m/w/d) Leitung Werk Zuffenhausen
2 Designer (m/w/d) Interieur
3 Assistent Service (m/w/d)
4 Werkstudent (m/w/d) HR Business Partner (operatives Personalwesen)
5 Fahrzeugdisponent (m/w/d)
6 Warehouse Quality Control Specialist
7 Berater Ersatzteile und Zubehör (m/w/d)
8 Consultant Industry 4.0 & Operational Excellence
9 Manager (m/w/d) Automobilindustrie (Zulieferer und OEMs)
10 (Senior-) Informationssicherheitsmanager (m/w/d) zur Absicherung von IT- Systemen
11 Mitarbeiter/in Engineering (m/w/d)
12 IT Product Manager (m/w/d) - Digital Workplace Cloud & Artificial Intelligence
13 Automation Software Test Engineer (f/m/d)
14 Executive Assistant to Board of Management and Special Projects
15 Sicherheitsmanager (m/w/d) – Business Continuity Management für IT
16 Senior Information Security Manager (m/w/d)
17 Head of Public Relations (PR)
18 Data Scientist - Advanced Optimization (m/f/d)
19 Praktikant Projektmanagement Accounting (m/w/d) im Bereich Financial Services
20 Praktikant Accounting Financial Services (m/w/d)
21 Ausbildung zum Industriemechaniker (m/w/d)
22 Mitarbeiter im Marketing (m/w/d) - sachgrundbefristet in Teilzeit
23 HR Professional Administration (f/m/d)
24 Entwicklungsingenieur (m/w/d) Simulation HV-Batterie Zelle
25 Praktikant Digitale Transformation & Projekte (m/w/d) im Bereich Financial Services
26 Bachelorand oder Masterand (m/w/d) Methoden zur Reduzierung von zukünftig neu limitierten Emissionen
27 Praktikant Kreditanalyse - Bonitätsprüfung Retailgeschäft (m/w/d) im Bereich Financial Services
28 Praktikant (m/w/d) Produktionsplanung Neue Fahrzeugprojekte
29 Internship Sales
30 Produktionsmitarbeiter (m/w/d) Karosseriefertigung (befristet auf 6 Monate)
31 Personalreferent m/w/d
32 Java/Kotlin Backend Developer (f/m/d) - New Mobility
33 Werkstudent (m/w/d) im After Sales
34 Werkstudent (m/w/d) Projektmanagement Fahrwerk
35 (Senior) Consultant (m/w/d) Mobility – Future Transportation Strategy
36 Kundenempfang (m/w/d)
37 Human Resource Coordinator
38 Kundenempfang (m/w/d)
39 DevOps / Site Reliability Engineer (f/m/d)
40 Entwicklungsingenieur (m/w/d) Fahrwerkelektronik
41 Masterand (m/w/d): Evaluierung einer robotergeführten Computertomographie-Methode
42 Auszubildende/r Automobilkauffrau/mann (m/w/d)
43 Python Developer for Automotive Systems (f/m/d)
44 Service Techniker (m/w/d)
45 Aushilfe Minijob (m/w/d) in Berlin-City
46 Praktikant (m/w/d) Werkplanung/Fertigungsplanung
47 Praktikant (m/w/d) Leitung Produktion & Logistik
48 Meister (m/w/d) Abfallwirtschaftszentrum
49 Praktikant (m/w/d) Qualitätsmanagement Produktion
50 System Engineer (f/m/d)

编辑:从多个页面获取数据:

import json
import requests

api_url = "https://api-jobs.porsche.com/search/"


for page in range(4):  # <-- increase number of pages here
    query = {
        "data": '{"LanguageCode":"DE","SearchParameters":{"FirstItem":'
        + str((page * 50) + 1)
        + ',"CountItem":50,"Sort":[{"Criterion":"PublicationStartDate","Direction":"DESC"}],"MatchedObjectDescriptor":["ID","PositionTitle","PositionURI","PositionLocation.CountryName","PositionLocation.CityName","PositionLocation.Longitude","PositionLocation.Latitude","PositionLocation.PostalCode","PositionLocation.StreetName","PositionLocation.BuildingNumber","PositionLocation.Distance","JobCategory.Name","PublicationStartDate","ParentOrganizationName","ParentOrganization","OrganizationShortName","CareerLevel.Name","JobSector.Name","PositionIndustry.Name","PublicationCode","PublicationChannel.Id"]},"SearchCriteria":[{"CriterionName":"PublicationChannel.Code","CriterionValue":["12"]},{"CriterionName":"PublicationChannel.Code","CriterionValue":["12"]}]}'
    }

    data = requests.get(api_url, params=query).json()

    for i, r in enumerate(
        data["SearchResult"]["SearchResultItems"], (page * 50) + 1
    ):
        print(i, r["MatchedObjectDescriptor"]["PositionTitle"])

打印:

1 Entwicklungsingenieur (m/w/d) Erprobung Mittelkonsole / Kinematikteile
2 Projektleiter Vertrieb / OEM Bereich Straße
3 Service Techniker mit Schwerpunkt System- und Hochvolttechnik (m/w/d)
4 Doktorand (m/w/d) zum Thema „Methodik zur Steuerung der Nachhaltigkeits-Ziele in Fahrzeugprojekten“
5 Senior (f/m) Process Management Consultant
6 (Senior) Consultant (m/f/d) Life Sciences
7 Entwicklungsingenieur (m/w/d) Strömungssimulation
8 Wagenpfleger/in 100%
9 (Senior) Consultant (m/w/d) Mobility – Transportation mit Schwerpunkt maritime Industrie
10 Praktikant (m/w/d) Leitung Werk Zuffenhausen
11 Designer (m/w/d) Interieur

...

198 Praktikant (m/w/d) - Projektmanagement in der Porsche IT
199 HR Working Student with German (f/m/d)
200 (Senior) Consultant (m/f/d) After Sales Strategy | Mobility Sector