我正在尝试提取以下成员的链接
from bs4 import BeautifulSoup
import requests
r = requests.get('https://www.aapkiawaz.in/about/doctor-hospital-directory-medical-directory-doctors-doctor-hospital-listing-medical-directory-doctors-listing-medical-directory-doctors-doctor-hospital-guide-medical-directory-d/0')
soup = BeautifulSoup(r.text,'lxml')
##for link in soup.find('span',class_='person_name'):
for link1 in soup.find_all('span',class_='person_name' ):
link2 = link1.find('a')
print link2['href']
我期望成员的个人资料链接,但得到以下输出:
{{project.mainbtnLink}}
答案 0 :(得分:4)
页面正在使用javascript更新内容,并且您看到的基本上是这些更新的占位符。您可以模仿页面用于获取此内容的POST请求API调用
import requests
data = {
'type': 'social_data',
'page' : 1,
'size' : 50,
'assigned_group' : 1061,
'categoryid' : 1070
}
r = requests.post('https://www.aapkiawaz.in/api/social_data.php', data = data).json()
links = [item['mainbtnLink'] for item in r['rec']]
print(links)
名称:
info = [(item['data']['person_name'], item['mainbtnLink']) for item in r['rec']]
将元组解包为2个列表:
names, links = zip(*[(item['data']['person_name'], item['mainbtnLink']) for item in r['rec']])
数据框:
import pandas as pd
info = [(item['data']['person_name'], item['mainbtnLink']) for item in r['rec']]
df = pd.DataFrame(info, columns = ['name' , 'link'])
print(df)