下面是一个链图/美丽汤刮刀,它从此website中刮取医生的个人资料信息。
x <- c("CAN", "ON", "NL", "PE", "NS", "NB", "QC")
for(i in x){
y <- treemap(weights[weights$Region == i, ], index=c("Level.0","Level.1"), vSize="X2015", type="index")
assign(i, y)
}
代码运行没有错误,但是,csv输出未在我的IDE中显示。我认为这是因为我没有正确考虑chainmap变量,但我不确定。有人知道为什么是这样吗?预先感谢!
答案 0 :(得分:1)
要编写csv词典,您可以使用@Configuration
public class MongoClientWrapper {
@Bean
public MongoClient mongo()
{
//credentials:
MongoCredential credential = MongoCredential.createCredential("user", "auth-db", "password".toCharArray());
MongoClientOptions options = MongoClientOptions.builder()
.addConnectionPoolListener(new MyConnectionPoolListener())
.build();
return new MongoClient(
new ServerAddress("localhost", 27017), //replica-set
Arrays.asList(credential)
,options
);
}
@Bean
public MongoTemplate mongoTemplate()
{
return new MongoTemplate(mongo(), database);
}
...
}
(docs here,csv.DictWriter
只是词典的一个版本):
ChainMap
)
这会将所有数据输出到from bs4 import BeautifulSoup
import requests
import csv
from collections import ChainMap
def get_data(soup):
default_data = {'name': 'n/a', 'clinic': 'n/a', 'profession': 'n/a', 'region': 'n/a', 'city': 'n/a'}
for doctor in soup.select('.view-practitioners .practitioner'):
doctor_data = {}
if doctor.select_one('.practitioner__name').text.strip():
doctor_data['name'] = doctor.select_one('.practitioner__name').text
if doctor.select_one('.practitioner__clinic').text.strip():
doctor_data['clinic'] = doctor.select_one('.practitioner__clinic').text
if doctor.select_one('.practitioner__profession').text.strip():
doctor_data['profession'] = doctor.select_one('.practitioner__profession').text
if doctor.select_one('.practitioner__region').text.strip():
doctor_data['region'] = doctor.select_one('.practitioner__region').text
if doctor.select_one('.practitioner__city').text.strip():
doctor_data['city'] = doctor.select_one('.practitioner__city').text
yield ChainMap(doctor_data, default_data)
url = 'https://sportmedbc.com/practitioners?field_profile_first_name_value=&field_profile_last_name_value=&field_pract_profession_tid=All&city=&taxonomy_vocabulary_5_tid=All&page=%s'
with open('data.csv', 'w', newline='') as csvfile:
fieldnames = ['name', 'clinic', 'profession', 'region', 'city']
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writeheader()
for i in range(5):
page=requests.get(url % i)
soup = BeautifulSoup(page.text, 'lxml')
writer.writerows(get_data(soup)
文件。我的LibreOffice的屏幕截图:
答案 1 :(得分:1)
这是您可以考虑尝试的另一种方法:
import requests
from bs4 import BeautifulSoup
import csv
def get_data(link):
for pagelink in [link.format(page) for page in range(5)]:
res = requests.get(pagelink)
soup = BeautifulSoup(res.text,"lxml")
data = []
for doctor in soup.select('.view-practitioners .practitioner'):
doctor_data = {}
doctor_data['name'] = doctor.select_one('.practitioner__name').text
doctor_data['clinic'] = doctor.select_one('.practitioner__clinic').text
doctor_data['profession'] = doctor.select_one('.practitioner__profession').text
doctor_data['region'] = doctor.select_one('.practitioner__region').text
doctor_data['city'] = doctor.select_one('.practitioner__city').text
data.append(doctor_data)
for item in data:
writer.writerow(item)
if __name__ == '__main__':
url = 'https://sportmedbc.com/practitioners?field_profile_first_name_value=&field_profile_last_name_value=&field_pract_profession_tid=All&city=&taxonomy_vocabulary_5_tid=All&page={}'
with open("doctorsinfo.csv","w",newline="") as infile:
fieldnames = ['name', 'clinic', 'profession', 'region', 'city']
writer = csv.DictWriter(infile, fieldnames=fieldnames)
writer.writeheader()
get_data(url)