我试图通过this link提取美国所有高尔夫球场的清单。我需要提取高尔夫球场的名称,地址和电话号码。我的脚本假设从网站中提取所有数据,但看起来它只在我的csv文件中打印一行。我注意到,当我打印“名称”字段时,它只打印一次,尽管find_all
功能。我需要的只是数据,而不仅仅是网站上多个链接的一个字段。
如何修复脚本以便将所有需要的数据打印到CSV文件中。
这是我的剧本:
import csv
import requests
from bs4 import BeautifulSoup
courses_list = []
for i in range(1):
url="http://www.thegolfcourses.net/page/1?ls&location=California&orderby=title&radius=6750#038;location=California&orderby=title&radius=6750" #.format(i)
r = requests.get(url)
soup = BeautifulSoup(r.content)
g_data2=soup.find_all("div",{"class":"list"})
for item in g_data2:
try:
name= item.contents[7].find_all("a",{"class":"entry-title"})[0].text
print name
except:
name=''
try:
phone= item.contents[7].find_all("p",{"class":"listing-phone"})[0].text
except:
phone=''
try:
address= item.contents[7].find_all("p",{"class":"listing-address"})[0].text
except:
address=''
course=[name,phone,address]
courses_list.append(course)
with open ('PGN_Final.csv','a') as file:
writer=csv.writer(file)
for row in courses_list:
writer.writerow([s.encode("utf-8") for s in row])
答案 0 :(得分:0)
以下是您的代码的简洁实现。您可以使用库urllib2
代替requests
。而bs4
的工作方式相同。
import csv
import urllib2
from BeautifulSoup import *
url="http://www.thegolfcourses.net/page/1?ls&location=California&orderby=title&radius=6750#038;location=California&orderby=title&radius=6750" #.format(i)
r = urllib2.urlopen(url).read()
soup = BeautifulSoup(r)
courses_list = []
courses_list.append(("Course name","Phone Number","Address"))
names = soup.findAll('h2', attrs={'class':'entry-title'})
phones = soup.findAll('p', attrs={'class':'listing-phone'})
address = soup.findAll('p', attrs={'class':'listing-address'})
for na, ph, add in zip(names,phones, address):
courses_list.append((na.text,ph.text,add.text))
with open ('PGN_Final.csv','a') as file:
writer=csv.writer(file)
for row in courses_list:
writer.writerow([s.encode("utf-8") for s in row])