I'm having some trouble scraping specific content from the following webpage.
http://www.librarything.com/search.php?search=The+Fellowship+of+the+Ring
我需要的数据是'工作'第一本书的编号:http://prntscr.com/hfkiku
我尝试过使用Beautiful Soup和Selenium,但未能找到获取该信息的方法。
任何帮助将不胜感激
编辑:附加代码。
def getWebpage(bookName):
#website = 'http://www.librarything.com/title/' + bookName
website = 'http://www.librarything.com/search.php?search=The+Fellowship+of+the+Ring'
#print(website)
http = urllib3.PoolManager()
request = http.request('GET', website)
soup = BeautifulSoup(request.data)
websiteP = soup.prettify()
driver = webdriver.Chrome()
driver.get(website)
delay = 5
try:
WebDriverWait(driver, delay).until(EC.presence_of_element_located((By.CSS_SELECTOR, 'p.item')))
print('Page is Ready!')
for element in driver.find_elements_by_css_selector('p.item'):
print(element.text)
except TimeoutException:
print('couldnt load page')
finally:
driver.quit()
html结果:
Page is Ready!
The Fellowship of the Ring: Being the First Part of The Lord of the Rings by J.R.R. Tolkien
The Lord of the Rings: The Fellowship of the Ring [2001 film] by Peter Jackson
The Fellowship of the Ring
The Fellowship of the Ring Journeybook by Matthew Ward
The Fellowship of the ring by J.R.R. Tolkien
The Fellowship of the Ring by J. R. R.
The Fellowship of the Ring Sourcebook by decipherrpg
The Lord of the Rings: The Fellowship of the Ring: Original Motion Picture Soundtrack by Howard Shore
The Fellowship of the Ring by Coleman Charlton
The Fellowship of the Ring {American dramatization} by J.R.R. Tolkien
The Fellowship of the Ring by aa
The Fellowship of the Ring Insiders' Guide (The Lord of the Rings Movie Tie-In) by Brian Sibley
The Lord of the Rings {complete} by J.R.R. Tolkien
The Hobbit and The Lord of the Rings by J.R.R. Tolkien
The Fellowship of the Ring by John Ronald Reuel Tolkien; Alan Lee
J.R.R. Tolkien Reads and Sings The Hobbit and The Fellowship of the Ring by J.R.R. Tolkien
The Fellowship of the Ring - Part One - Ballantine
The Fellowship of the Ring {unspecified}
The Fellowship Of The Ring Isbn 0261102311
The Fellowship of the Ring [Videorecording]
The Fellowship of the Ring Sourcebook (The Lord of the Rings Roleplaying Game) by Decipher RPG
The Fellowship of the Ring Book One
The Lord of the Rings: The Fellowship of the Ring: Piano, Vocal, and Chords by Howard Shore
尝试改变代码,但我无法到达任何地方。
答案 0 :(得分:0)
这是driver.page_source没有显示预期HTML的情况之一,但是如果你选择body标签的innerHTML,你会得到你期望的结果。
from selenium import webdriver
from bs4 import BeautifulSoup
import time
#driver = webdriver.Firefox()
driver = webdriver.Chrome()
url = "http://www.librarything.com/search.php?search=The+Fellowship+of+the+Ring"
driver.get(url)
time.sleep(5)
#This next line does not show the expected html.
# print (driver.page_source)
# But this finds it.
body = driver.find_element_by_tag_name("body").get_attribute('innerHTML')
driver .quit()
soup = BeautifulSoup(body, "html.parser")
ps = soup.find_all("p", {"class": "item"})
for p in ps:
print (p.find("a")['href'].split('/')[2])
输出:
3203347
1354927
20066223
4819791
7170476
...
P.S。欢迎收到StackOverflow的建议,请将您的代码发布在您的问题中,以便更好地接收这些代码,这使得其他人更容易运行代码,而不是将其发布到无法轻易复制的屏幕截图中到IDE。