我想获取个人的详细信息

时间:2019-04-06 05:20:32

标签: python web-scraping beautifulsoup

我没有在这里得到地址。它给我每个人的地址是'NA'。.我想获取每个人的地址。该代码提供除地址外的所有其他详细信息     从bs4导入BeautifulSoup

import requests
for count in range(1,2):
   r = requests.get('https://www.ratemds.com/best-doctors/? 
   country=in&page='+str(count))
   soup = BeautifulSoup(r.text,'lxml')
   for links in soup.find_all('a',class_='search-item-doctor-link'):
   link = "https://www.ratemds.com"+links['href']
   r2 = requests.get(link)
   soup2 = BeautifulSoup(r2.text,'lxml')
   try:
         name = soup2.select_one('h1').text
         print "NAME:"+name
    except:
         print "NAME:NA"
    try:
         speciality = soup2.select_one('.search-item-info a').text
         print "SPECIALITY:"+speciality
    except:
         print "SPECIALITY:NA"
    try:  
         gender = soup2.select_one('i + a').text
         print "GENDER:"+gender
    except:
         print "GENDER:NA"
    try:
         speciality1 = soup2.select_one('i ~ [itemprop=name]').text
         print "SPECIALTY1:"+speciality1
    except:
         print"SPECIALITY1:NA"
    try:
         contact = soup2.select_one('[itemprop=telephone]')['content']
         print "CONTACT:"+contact
    except:
         print "CONTACT:NA"
    try:     
        website = soup2.select_one('[itemprop=sameAs]')['href']
        print "WEBSITE:"+website
    except:
        print "WEBSITE:NA"
    try:
        add = [item['content'] for item in soup2.select('[itemprop=address] meta')]
        print "ADDESS:"+add
    except:
        print "ADDRESS:NA"

2 个答案:

答案 0 :(得分:0)

假设您执行过pip install lxmlpip install beautifulsoup4,那么您正在使用的代码就可以正常工作。

此处的工作示例(单击“运行”):https://repl.it/repls/DarkorangeFinishedSoftwaresuite

如果您得到的结果与我的工作示例不同,则可能是request.get()网址中的多余空间。在这种情况下,您可以复制我使用的代码,然后查看是否适合您。

答案 1 :(得分:0)

这是更广泛的信息选择器的示例

request 2.88.0

您还可以将脚本标签作为获取大量信息的目标,这些信息可以转换为json。可悲的是,一个不错的库转换hex> ascii似乎没有用,所以已经完成了对dict的替换。

from bs4 import BeautifulSoup
import requests
r = requests.get('https://www.ratemds.com/doctor-ratings/dr-dilip-raja-mumbai-mh-in', headers={'User-Agent': 'Mozilla/5.0'})
soup = BeautifulSoup(r.text,'lxml')

details = {
'name' : soup.select_one('h1').text,
'speciality' : soup.select_one('.search-item-info a').text,
'rating' : soup.select_one('.star-rating')['title'],
'gender' : soup.select_one('i + a').text,
'specialty_full' : soup.select_one('i ~ [itemprop=name]').text,
'phone' : soup.select_one('[itemprop=telephone]')['content'],
'address' : [item['content'] for item in soup.select('[itemprop=address] meta')],
'website' : soup.select_one('[itemprop=sameAs]')['href']
}

print(details)