我正在尝试从下面的链接中提取数据,但是我没有得到它,代码显示错误
from bs4 import BeautifulSoup
import requests
r =requests.get('http://www.smcasurat.org/Member/DirectorySearch#')
soup = BeautifulSoup(r.text,'lxml')
data = soup.find('section',class_='part_one')
name = data.find('h4')
print name.text
qual = data.find('h5')
print qual.text
contact = data.find('div',class_='media')
contact1 = contact.find('p')
print contact1.text
email = data.find('div',class_='media-body')
email1 = email.find('p')
print email1.text
ERROR-Traceback(最近一次通话最近): 在第19行的文件“ C:\ Python27 \ smcasurat.py”中 名称= data.find('h4') AttributeError:'NoneType'对象没有属性'find'
答案 0 :(得分:0)
访问呈现该数据的json响应要容易得多。
import requests
url = 'http://www.smcasurat.org/Member/DirecotrySerach'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36'}
payload = {
'type': 'ALL',
'value': '',
'pageindex': '1',
'pagesize': '9999'}
jsonData = requests.post(url, headers=headers, params=payload).json()
for member in jsonData['Data']:
name = member['FirstName'] + ' ' + member['LastName']
qual = member['MemberDegree'].strip()
email = member['Email1']
try:
contact = '\n'.join([v.strip() for k, v in member['Clinicinfo'][0].items() if v != ''])
except:
contact = '-'
print('%s\n%s\n%s\n%s\n' %(name, qual, contact, email))
要查看输出:
for member in jsonData['Data']:
name = member['FirstName'] + ' ' + member['LastName']
qual = member['MemberDegree'].strip()
try:
contact = member['Clinicinfo'][0]['Phone1']
except:
contact = '-'
email = member['Email1']
print('%s\n%s\n%s\n%s\n' %(name, qual, contact, email))
或者您可以使用json_normalize
并将其转换为数据框
from pandas.io.json import json_normalize
df = json_normalize(jsonData['Data'])
如果要浏览文件,只需使用它并在记事本++中打开
import json
with open('C:/data.json', 'w') as outfile:
json.dump(jsonData, outfile, indent=4)