如何将字典抓取到链接中?

时间:2019-02-19 14:15:10

标签: python beautifulsoup

我正在为学校使用BS4进行报废培训,并且想从链接锚中提取字典的内容。如何提取字典ctdata的内容?

以下是详细信息:

链接:a ct="result_offer_content"

ctdata = {"ad_id_solr": "1a7d243c3610c62012159b7c9d4e900382bbe446", 
  "ad_id_mongo": "", "ad_segment_id": 1723, "ad_partner": "wizbii.com_premium",  
  "ad_sector": "Ing\u00e9nierie", "ad_subsector": "", 
  "ad_jobtitle": "Ing\u00e9nieur d\u00e9veloppeur", "ad_company": "SII",
  "ad_type": "exact", "ad_position": 1, "ad_locality": "Bordeaux"}

我尝试过

for offers in soup.find_all("a", {'ct':'result_offer_content'}):
   offre = offers.find('ctdata')
   print(jobtitle)

但输出为“无无...”。

1 个答案:

答案 0 :(得分:1)

由于它处于json结构中,因此将作为json读取。我对jobtitle所引用的内容感到有些困惑,因为您没有提供完整的代码。另外,由于这里没有完整的代码,因此我只能提供一个通用的解决方案,因此您需要进行调整,但这是您在以下方式中看到的内容:

import json

json_str = '{"ad_id_solr":"1a7d243c3610c62012159b7c9d4e900382bbe446","ad_id_mongo":"","ad_segment_id":1723,"ad_partner":"wizbii.com_premium","ad_sector":"Ing\u00e9nierie","ad_subsector":"","ad_jobtitle":"Ing\u00e9nieur d\u00e9veloppeur","ad_company":"SII","ad_type":"exact","ad_position":1,"ad_locality":"Bordeaux"}'

json_dict = json.loads(json_str)

附加

现在您已经提供了网址,我可以看到此问题。您要为属性.get()使用.find而不是'ctdata'

import json
import requests
import bs4


req = requests.get("https://www.jobijoba.com/fr/query/?what=data&where=Bordeaux&where_type=city%22")

soup = bs4.BeautifulSoup(req.text, 'html.parser')

offers = soup.find_all("a", {'ct':'result_offer_content'})

for offers in soup.find_all("a", {'ct':'result_offer_content'}):
    offre = offers.get('ctdata')

    json_dict = json.loads(offre)
    jobtitle = json_dict['ad_jobtitle']
    print (jobtitle)

输出:

Ingénieur développeur

Ingénieur développeur
Data Scientist
Data Scientist

Développeur big data


Data Scientist
Data Scientist

Ingénieur développeur
Data Scientist
Data Scientist
Data Scientist



Ingénieur décisionnel

Architecte
Data Scientist
Data Scientist
Data Scientist

Développeur informatique

某些标签没有职位名称,因此您可以通过检查职位名称是否为空白来跳过/不打印它们:

import json
import requests
import bs4


req = requests.get("https://www.jobijoba.com/fr/query/?what=data&where=Bordeaux&where_type=city%22")

soup = bs4.BeautifulSoup(req.text, 'html.parser')

offers = soup.find_all("a", {'ct':'result_offer_content'})

for offers in soup.find_all("a", {'ct':'result_offer_content'}):
    offre = offers.get('ctdata')

    json_dict = json.loads(offre)
    jobtitle = json_dict['ad_jobtitle']
    if jobtitle != '':
        print (jobtitle)

输出:

Ingénieur développeur
Ingénieur développeur
Data Scientist
Data Scientist
Développeur big data
Data Scientist
Data Scientist
Ingénieur développeur
Data Scientist
Data Scientist
Data Scientist
Ingénieur décisionnel
Architecte
Data Scientist
Data Scientist
Data Scientist
Développeur informatique