将ChainMap数据输出导出到CSV

时间:2018-08-12 00:12:13

标签: python web-scraping beautifulsoup export-to-csv

下面是一个链图/美丽汤刮刀,它从此website中刮取医生的个人资料信息。

x <- c("CAN",  "ON",  "NL", "PE", "NS", "NB", "QC")     
for(i in x){
          y <- treemap(weights[weights$Region == i, ], index=c("Level.0","Level.1"), vSize="X2015", type="index")
          assign(i, y) 
        } 

代码运行没有错误,但是,csv输出未在我的IDE中显示。我认为这是因为我没有正确考虑chainmap变量,但我不确定。有人知道为什么是这样吗?预先感谢!

2 个答案:

答案 0 :(得分:1)

要编写csv词典,您可以使用@Configuration public class MongoClientWrapper { @Bean public MongoClient mongo() { //credentials: MongoCredential credential = MongoCredential.createCredential("user", "auth-db", "password".toCharArray()); MongoClientOptions options = MongoClientOptions.builder() .addConnectionPoolListener(new MyConnectionPoolListener()) .build(); return new MongoClient( new ServerAddress("localhost", 27017), //replica-set Arrays.asList(credential) ,options ); } @Bean public MongoTemplate mongoTemplate() { return new MongoTemplate(mongo(), database); } ... } docs herecsv.DictWriter只是词典的一个版本):

ChainMap

这会将所有数据输出到from bs4 import BeautifulSoup import requests import csv from collections import ChainMap def get_data(soup): default_data = {'name': 'n/a', 'clinic': 'n/a', 'profession': 'n/a', 'region': 'n/a', 'city': 'n/a'} for doctor in soup.select('.view-practitioners .practitioner'): doctor_data = {} if doctor.select_one('.practitioner__name').text.strip(): doctor_data['name'] = doctor.select_one('.practitioner__name').text if doctor.select_one('.practitioner__clinic').text.strip(): doctor_data['clinic'] = doctor.select_one('.practitioner__clinic').text if doctor.select_one('.practitioner__profession').text.strip(): doctor_data['profession'] = doctor.select_one('.practitioner__profession').text if doctor.select_one('.practitioner__region').text.strip(): doctor_data['region'] = doctor.select_one('.practitioner__region').text if doctor.select_one('.practitioner__city').text.strip(): doctor_data['city'] = doctor.select_one('.practitioner__city').text yield ChainMap(doctor_data, default_data) url = 'https://sportmedbc.com/practitioners?field_profile_first_name_value=&field_profile_last_name_value=&field_pract_profession_tid=All&city=&taxonomy_vocabulary_5_tid=All&page=%s' with open('data.csv', 'w', newline='') as csvfile: fieldnames = ['name', 'clinic', 'profession', 'region', 'city'] writer = csv.DictWriter(csvfile, fieldnames=fieldnames) writer.writeheader() for i in range(5): page=requests.get(url % i) soup = BeautifulSoup(page.text, 'lxml') writer.writerows(get_data(soup) 文件。我的LibreOffice的屏幕截图:

enter image description here

答案 1 :(得分:1)

这是您可以考虑尝试的另一种方法:

import requests
from bs4 import BeautifulSoup
import csv

def get_data(link):
    for pagelink in [link.format(page) for page in range(5)]:
        res = requests.get(pagelink)
        soup = BeautifulSoup(res.text,"lxml")

        data = []
        for doctor in soup.select('.view-practitioners .practitioner'):
            doctor_data = {}

            doctor_data['name'] = doctor.select_one('.practitioner__name').text
            doctor_data['clinic'] = doctor.select_one('.practitioner__clinic').text
            doctor_data['profession'] = doctor.select_one('.practitioner__profession').text
            doctor_data['region'] = doctor.select_one('.practitioner__region').text
            doctor_data['city'] = doctor.select_one('.practitioner__city').text
            data.append(doctor_data)

        for item in data:
            writer.writerow(item)

if __name__ == '__main__':
    url = 'https://sportmedbc.com/practitioners?field_profile_first_name_value=&field_profile_last_name_value=&field_pract_profession_tid=All&city=&taxonomy_vocabulary_5_tid=All&page={}'
    with open("doctorsinfo.csv","w",newline="") as infile:
        fieldnames = ['name', 'clinic', 'profession', 'region', 'city']
        writer = csv.DictWriter(infile, fieldnames=fieldnames)
        writer.writeheader()
        get_data(url)