我正试图从http://www.emoryhealthcare.org/locations/offices/advanced-digestive-care-1.html抓取信息。
我想刮掉页面下三分之一的特色菜,即" Gastroenterology"和"内科"。当我检查元素时,我发现它是li
的{{1}}但是当我尝试遍历汤并打印每个找到的项目时,会返回与预期不同的结果。
<div class="module bordered specialist">
当我在浏览器中打开网站时,我看到上面的值在内容切换到预期结果之前闪烁。有没有办法让我提高我能够刮掉我想要的物品的可能性?
答案 0 :(得分:3)
只需使用selenium等几秒钟,然后就像你以前那样解析。这似乎成功了。
from selenium import webdriver
import os
import time
from bs4 import BeautifulSoup
chromedriver = "/Users/Rafael/chromedriver"
os.environ["webdriver.chrome.driver"] = chromedriver
driver = webdriver.Chrome(chromedriver)
driver.get('http://www.emoryhealthcare.org/locations/offices/advanced-digestive-care-1.html')
time.sleep(5)
html = driver.page_source
soup = BeautifulSoup(html, 'lxml')
results = soup.find_all("div", { "class" : "module bordered specialist" })
print(results[0].text) #prints GastroenterologyInternal Medicine
答案 1 :(得分:0)
您不需要selenium,简单的帖子请求可以获取数据:
所以你需要的只是模仿那个请求:
import sys, os
def rename(folder):
//folder = os.path.normpath(folder)
print "Folder: = ", folder
if not folder.endswith('\\'): #Check for windows path divider at the end
if folder.find('\\') > 0: #Check if it is windows
folder += '\\' #Add path divider to the end
print "Folder: = ", folder
#Rename files
for filename in os.listdir(folder): #for every file in the folder
infilepath = os.path.join(folder, filename) #make a path out of it
if os.path.isfile(infilepath) == False: #Check if it is valid
continue #if not: next file
nameParts = filename.split(".")
print "\nNew File:"
print "nameParts: ", nameParts
print "Length: ", len(nameParts)
print "Length - 1: ", nameParts[len(nameParts) - 1]
test_folder = "C:/Users/Mendel/Desktop/rmdot"
rename(test_folder_2)
如果我们运行它会给你:
import requests
# you can change there fields to get different results
data = {"selectFields":["Name","URL","Specialists"],"filters":{},"orderBy":{"Name":-1}}
post = "http://www.emoryhealthcare.org/service/findPhysician/api/locations/retrieve"
# post the data as json and create a dict from the returned json.
js = requests.post(post, json=data).json()
print(js[u'locations'][0][u'Specialists'])
json中有很多数据,你可能想要的任何东西都在那里。