Question

我正在使用硒+ beautifulsoup。

我需要存储找到的数据，最初我想到的是数组，但是现在我认为json可能更好，但是我不知道如何从所掌握的内容中编写数据。

        doc = []
        spec = []
        for i in range(1, 2):
            driver.get('https://local.data/doctors/%d' % i)
            driver.execute_script("$('mark').remove()")
            time.sleep(3)
            html = driver.page_source
            soup = BeautifulSoup(html, 'html.parser')
            for doctors in soup.find_all('a', attrs={"data-ga-label": "profile_name"}):
                doc.append(doctors.text)
            for specialties in soup.find_all('p', attrs={"class": "specialities"}):
                spec.append(specialties.text.strip())
            for cities in soup.find_all('span', class_="city"):
                c = cities.text.split('-')[0].replace(":", "")
                print(c)

我不想为它编写数组，而是为在doctor，specialties和cities上找到的所有值编写一个json条目。

所以那是这样的：

{
 doctor_name: "john hopkins",
 specialty: "surgeon",
 city: "new york"
}

对于我用beautifulsoup掌握的每个值。

我该怎么做？

Answer 1

以下代码将起作用。但是，这仍然不是您所要求的正确方法。如果您共享要抓取的页面的html结构，那就更好了。

docs = [doctors.text for doctors in soup.find_all('a', attrs={"data-ga-label": "profile_name"})]
spec = [specialties.text.strip() for specialties in soup.find_all('p', attrs={"class": "specialities"})]
cities = [cities.text.split('-')[0].replace(":", "") for cities in soup.find_all('span', class_="city")]
doc_profiles = []
for index, data in docs:
    doc_profile ={'doctor_name': data,
                  'specialty': spec[index],
                  '': cities[index]}
    doc_profiles.append(doc_profile)

使用支持性数据正确应对您的挑战将有助于我们更好地帮助您。

beautifulsoup-多个表示多个值，以及如何将它们保存到JSON中

1 个答案: