我没有在这里得到地址。它给我每个人的地址是'NA'。.我想获取每个人的地址。该代码提供除地址外的所有其他详细信息 从bs4导入BeautifulSoup
import requests
for count in range(1,2):
r = requests.get('https://www.ratemds.com/best-doctors/?
country=in&page='+str(count))
soup = BeautifulSoup(r.text,'lxml')
for links in soup.find_all('a',class_='search-item-doctor-link'):
link = "https://www.ratemds.com"+links['href']
r2 = requests.get(link)
soup2 = BeautifulSoup(r2.text,'lxml')
try:
name = soup2.select_one('h1').text
print "NAME:"+name
except:
print "NAME:NA"
try:
speciality = soup2.select_one('.search-item-info a').text
print "SPECIALITY:"+speciality
except:
print "SPECIALITY:NA"
try:
gender = soup2.select_one('i + a').text
print "GENDER:"+gender
except:
print "GENDER:NA"
try:
speciality1 = soup2.select_one('i ~ [itemprop=name]').text
print "SPECIALTY1:"+speciality1
except:
print"SPECIALITY1:NA"
try:
contact = soup2.select_one('[itemprop=telephone]')['content']
print "CONTACT:"+contact
except:
print "CONTACT:NA"
try:
website = soup2.select_one('[itemprop=sameAs]')['href']
print "WEBSITE:"+website
except:
print "WEBSITE:NA"
try:
add = [item['content'] for item in soup2.select('[itemprop=address] meta')]
print "ADDESS:"+add
except:
print "ADDRESS:NA"
答案 0 :(得分:0)
假设您执行过pip install lxml
和pip install beautifulsoup4
,那么您正在使用的代码就可以正常工作。
此处的工作示例(单击“运行”):https://repl.it/repls/DarkorangeFinishedSoftwaresuite
如果您得到的结果与我的工作示例不同,则可能是request.get()
网址中的多余空间。在这种情况下,您可以复制我使用的代码,然后查看是否适合您。
答案 1 :(得分:0)
这是更广泛的信息选择器的示例
request 2.88.0
您还可以将脚本标签作为获取大量信息的目标,这些信息可以转换为json。可悲的是,一个不错的库转换hex> ascii似乎没有用,所以已经完成了对dict的替换。
from bs4 import BeautifulSoup
import requests
r = requests.get('https://www.ratemds.com/doctor-ratings/dr-dilip-raja-mumbai-mh-in', headers={'User-Agent': 'Mozilla/5.0'})
soup = BeautifulSoup(r.text,'lxml')
details = {
'name' : soup.select_one('h1').text,
'speciality' : soup.select_one('.search-item-info a').text,
'rating' : soup.select_one('.star-rating')['title'],
'gender' : soup.select_one('i + a').text,
'specialty_full' : soup.select_one('i ~ [itemprop=name]').text,
'phone' : soup.select_one('[itemprop=telephone]')['content'],
'address' : [item['content'] for item in soup.select('[itemprop=address] meta')],
'website' : soup.select_one('[itemprop=sameAs]')['href']
}
print(details)