我似乎无法使它正常工作。我有我的脚本去某个站点并将数据刮到我的info
变量中,但是当我尝试从特定的类中提取href
时,我得到了None
,或者只是我尝试各种不同的组合时都无法正常工作。我在哪里搞砸?当我将其抓取到我的info变量中时,里面有一个class='business-name'
和href
。
import requests
from bs4 import BeautifulSoup
count = 0
search_terms = "Bars"
location = "New Orleans, LA"
url = "https://www.yellowpages.com/search"
q = {'search_terms': search_terms, 'geo_location_terms': location}
page = requests.get(url, params=q)
url_link = page.url
page_num = str(count)
searched_page = url_link + '&page=' + str(count)
page = requests.get(searched_page)
soup = BeautifulSoup(page.text, 'html.parser')
info = soup.findAll('div', {'class': 'info'})
for each_business in info:
# This is the spot that is broken. I can't make it work!
yp_bus_url = each_business.get('class_','business-name')['href']
print(yp_bus_url)
答案 0 :(得分:1)
以下代码应为您工作:
(name=April, sortValue=30)
答案 1 :(得分:1)
我认为这就是您所需要的:
for each_business in info:
yp_bus_url = each_business.find('a', {'class': 'business-name'}).get('href')
print(yp_bus_url)
答案 2 :(得分:1)
signaturestring<-paste0("GET",paste(rep("\n",12),collapse=""),
"x-ms-date:",requestdate,"\n", # miss "\n"
"x-ms-version:",msapiversion,"\n",
"/mikeecutest/mike-ecu-test", "\n", # should be /accountname/containername
"comp:list","\n",
"restype:container")
headerstuff<-add_headers(Authorization=paste0("SharedKey mikeecutest:", # only need accountname here
RCurl::base64(digest::hmac(key=RCurl::base64Decode(sak, mode="raw"),
object=enc2utf8(signaturestring),
algo= "sha256", raw=TRUE))),
`x-ms-date`=requestdate,
`x-ms-version`=msapiversion)
在这里进行更改(请确保将列表分配给您想要的任何内容):
import requests
from bs4 import BeautifulSoup
count = 0
search_terms = "Bars"
location = "New Orleans, LA"
url = "https://www.yellowpages.com/search"
q = {'search_terms': search_terms, 'geo_location_terms': location}
page = requests.get(url, params=q)
url_link = page.url
page_num = str(count)
searched_page = url_link + '&page=' + str(count)
page = requests.get(searched_page)
soup = BeautifulSoup(page.text, 'html.parser')
返回:
#info = soup.findAll('div', {'class': 'info'})
info = soup.select("[class~=business-name]")
[i.get('href') for i in info]