我正在尝试这一行代码;但是,我迷失了如何让python代码刮掉一个循环并保存所有内容,以便我可以.csv一切。任何帮助将不胜感激:)
import requests
from bs4 import BeautifulSoup
url = url = "http://www.yellowpages.com/search?search_terms=bodyshop&geo_location_terms=Fort+Lauderdale%2C+FL"
soup = BeautifulSoup(r.content)
links = soup.find_all("a")
from link in links:
print "<a href='%s'>%s</a>" %(link.get("href"), link.text)
g_data = soup.find_all("div", {"class", "info"})
from item in g_data:
print item.content[0].find_all("a", {"class": "business-name"})[0].text
try:
print item.contents[1].find_all("span", {"itemprop": "streetAddress"})[0].text
except:
pass
try:
print item.contents[1].find_all("span", {"itemprop": "adressLocality"})[0].text.replace(',', '')
except:
pass
try:
print item.contents[1].find_all("span", {"itemprop": "adressRegion"})[0].text
except:
pass
try:
print item.contents[1].find_all("span", {"itemprop": "postalCode"})[0].text
except:
pass
try:
print item.contents[1].find_all("li", {"class": "primary"})[0].text
我知道这个代码:
url_page2 = url + '&page=' + str(2) '&s=relevance'
我可以循环到第二页,但是如何循环到网站的所有页面结果并使结果在.csv文件中可用?
答案 0 :(得分:0)
无限循环从1
开始递增页码,并在没有结果时退出。定义要提取的字段列表,并依赖itemprop
属性来获取字段值。收集字典列表中的项目,稍后可以将其写入csv文件:
from pprint import pprint
import requests
from bs4 import BeautifulSoup
url = "http://www.yellowpages.com/search?search_terms=bodyshop&geo_location_terms=Fort%20Lauderdale%2C%20FL&page={page}&s=relevance"
fields = ["name", "streetAddress", "addressLocality", "addressRegion", "postalCode", "telephone"]
data = []
index = 1
while True:
url = url.format(page=index)
index += 1
response = requests.get(url)
soup = BeautifulSoup(response.content)
page_results = soup.select('div.result')
# exiting the loop if no results
if not page_results:
break
for item in page_results:
result = dict.fromkeys(fields)
for field in fields:
try:
result[field] = item.find(itemprop=field).get_text(strip=True)
except AttributeError:
pass
data.append(result)
break # DELETE ME
pprint(data)
对于第一页,它会打印:
[{'addressLocality': u'Fort Lauderdale,',
'addressRegion': u'FL',
'name': u"Abernathy's Paint And Body Shop",
'postalCode': u'33315',
'streetAddress': u'1927 SW 1st Ave',
'telephone': u'(954) 522-8923'},
...
{'addressLocality': u'Fort Lauderdale,',
'addressRegion': u'FL',
'name': u'Mega Auto Body Shop',
'postalCode': u'33304',
'streetAddress': u'828 NE 4th Ave',
'telephone': u'(954) 523-9331'}]