Question

我正在尝试学习如何使用Python和BeautifulSoup来抓取网站。我已经能够收集所有名称/职称，我正在尝试将它们保存到csv文件中。我要么需要某种类型的循环或追加，以便将它们全部放入csv文件中。现在，只有最终名称和作业标题保存在csv文件中。

#import libraries
import csv
import urllib2
from bs4 import BeautifulSoup

#specify the url
buzzly_page = 'http://buzzlymedia.com/ourwork/'

#query the website and return the html to the variable 'page'
page = urllib2.urlopen(buzzly_page)

#parse the html
soup = BeautifulSoup(page, 'html.parser')

#query to get value of name
for name_box in soup.find_all('strong', attrs={'itemprop': 'name'}):
    name = name_box.text.strip() #remove starting and trailing
    print name

#query to get value of job-title
for job_box in soup.find_all('span', attrs={'itemprop': 'jobTitle'}):
    job = job_box.text.strip() #remove starting and trailing
    print job

#write into csv-file
with open('buzzly_clients.csv', 'a') as csv_file:
   writer = csv.writer(csv_file)
   writer.writerow([name, job])

Answer 1

找到包含所需元素的div，并像这样迭代它们。

SHChangeNotify(SHCNE_ASSOCCHANGED, SHCNF_IDLIST, NULL, NULL);

使用BeautifulSoup中的find_all将数据从网站保存到csv

1 个答案: