我想从网站上获取html动态内容的一部分,我可以在“检查元素”中看到此内容,但不能在“查看源代码”中看到。我尝试使用BeautifulSoup和Selenium库没有成功,因为加载页面后,我需要按一些屏幕按钮来加载内容。
例如,在网站http://play.typeracer.com中,我可以加载其html源代码,但是在按网页上的“ Practice”后,无法加载显示的内容。 (表格和文字)
希望我很明确,谢谢您的关注
答案 0 :(得分:2)
以下是使用Selenium和Firefox的解决方案:
NavigationView navigationView = findViewById(R.id.nav_view);
//R.id.nav_view the id of the navigation drawer
View drawerHead = navigationView.getHeaderView(0);
//0 index of the header
TextView userName = drawerHead.findViewById(R.id.username);
更新
以防万一,您以后还要自动输入内容;)
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
url = 'http://play.typeracer.com/'
browser = webdriver.Firefox()
browser.get(url)
try: # waiting till link is loaded
element = WebDriverWait(browser, 30).until(
EC.presence_of_element_located((By.LINK_TEXT, 'Practice')))
finally: # link loaded -> click it
element.click()
try: # wait till text is loaded
WebDriverWait(browser, 30).until(
EC.presence_of_element_located((By.XPATH, '//span[@unselectable="on"]')))
finally: # extract text
spans = browser.find_elements_by_xpath('//span[@unselectable="on"]')
if len(spans) == 2: # first word has only one letter
text = f'{spans[0].text} {spans[1].text}'
elif len(spans) == 3: # first word has more than one letter
text = f'{spans[0].text}{spans[1].text} {spans[2].text}'
else:
text = ' '.join([span.text for span in spans])
print('special case that is not handled yet: {text}')
print(text)
>>> 'Scissors cuts paper. Paper covers rock. Rock crushes lizard. Lizard poisons Spock. Spock smashes scissors. Scissors decapitates lizard. Lizard eats paper. Paper disproves Spock. Spock vaporizes rock. And as it always has, rock crushes scissors.'
try:
txt_input = WebDriverWait(browser, 30).until(
EC.presence_of_element_located((By.XPATH,
'//input[@class="txtInput" and @autocorrect="off"]')))
finally:
for letter in text:
txt_input.send_keys(letter)
块的原因是,我们必须等到内容加载完毕-有时可能要花很多时间。