如何使用硒将文本嵌入div类

时间:2018-12-02 08:15:17

标签: python python-3.x selenium google-chrome selenium-webdriver

我想从下面的html代码中获取Pi的文字。我尝试了find的许多不同变体,但它们似乎都找不到合适的div或能够从html代码中提取我想要的文本。

我尝试过:

movie_name = browser.find_element_by_class_name("_14Rip")
movie = movie_name.get_attribute('text')

movie_name = browser.find_element('_14Rip').getText()

这些都不起作用。此外,.getText引发与“ WebElement”对象有关的错误。我想知道如何在仍使用python和selenium的同时提取文本?

<div id="reactApp"><div data-reactroot=""><!-- react-empty: 2 --><!-- react-empty: 3 --><div class="nr-medium-page main-page-container"><span><div class="_1gHrf"></div><nav class="_2ng5l"><div class="_1nd6r"><span class="_3Jg52 _2Dfpe"><a href="/content/movies/home"><span class="_2RaqC "></span></a></span><span><span class="_3Jg52 _2Dfpe"><span class="_1sHRG"><a href="/content/movies/movieslist">Movies</a></span></span><div class="_34DjI" style="opacity: 0; top: 0px; transform: translateY(-100%); z-index: 1; left: 116.875px;"><div class="_3kwAy nr-m-30"><div class="_2h_ha"><a href="/content/movies/movieslist">VIEW ALL MOVIES</a></div><div><div class="_223L6"></div></div></div></div></span><span><span class="_3Jg52 _2Dfpe"><span class="_1sHRG"><a href="/content/movies/tvlist">TV</a></span></span><div class="_34DjI" style="opacity: 0; top: 0px; transform: translateY(-100%); z-index: 1; left: 180.974px;"><div class="_3kwAy nr-m-30"><div class="_2h_ha"><a href="/content/movies/tvlist">VIEW ALL TV</a></div><div><div class="_223L6"></div></div></div></div></span><span><span class="_3Jg52 _2Dfpe"><span class="_1sHRG"><a href="/content/movies/myvudu">My Vudu</a></span></span><div class="_34DjI" style="opacity: 0; top: 0px; transform: translateY(-100%); z-index: 1; left: 252.224px;"><div class="ZvRao"><a href="/content/movies/mymovies">My Movies</a><a href="/content/movies/mytv">My TV</a><a href="/content/movies/mywishlist">My Wishlist</a><a href="/content/movies/mypreorders">My Pre-orders</a><a href="/content/movies/myoffers">My Offers</a></div></div></span><span><span class="_3Jg52 _2Dfpe"><span class="_1sHRG"><a href="/content/movies/free">Free</a></span></span><div class="_34DjI" style="opacity: 0; top: 0px; transform: translateY(-100%); z-index: 1; left: 349.104px;"><div class="_3kwAy nr-m-30"><div class="_2h_ha"><a href="/content/movies/free">VIEW ALL FREE MOVIES &amp; TV</a></div><div><div class="_223L6"></div></div></div></div></span><span class="_3Jg52 _2Dfpe _3uvRn"><span class="kzzv5"><span class="_1sHRG"><span class="_72isQ"></span></span></span></span><span><span class="_3Jg52 _2Dfpe _3uvRn _28Da_ _18Vtj"><span class="_1sHRG"><span><div><div class="_3yhAh"><span class="glyphicon glyphicon-user"></span></div><div class="_3p4_i"><!-- react-text: 71 -->Hi, Zachary!<!-- /react-text --></div></div></span></span></span><div class="_34DjI" style="opacity: 0; top: 0px; transform: translateY(-100%); z-index: 1; left: 1035px;"><div class="ZvRao"><a href="https://www.vudu.com/content/AccountManage.html#accountInfo">Account Settings</a><a href="javascript:void(0);">Payment Info</a><a href="https://www.vudu.com/content/AccountManage.html#balanceHistory">Balance &amp; History</a><a href="https://www.vudu.com/content/MyDevices.html">Manage Devices</a><a href="http://support.vudu.com/?supportPage=home">Support</a><a href="javascript:void(0);">Log Out</a></div></div></span><span><span class="_3Jg52 _2Dfpe _3uvRn _28Da_ _18Vtj"><span class="_1sHRG"><span>Redeem</span></span></span><div class="_34DjI _231pT" style="opacity: 0; top: 0px; transform: translateY(-100%); z-index: 1; left: 509.88px;"><div class="ZvRao"><a href="https://www.vudu.com/content/redeem.html">Digital Copy</a><a href="https://www.vudu.com/vuducodes">Vudu Code</a></div></div></span></div></nav></span><div class="nr-page-body"><div><div class="_2vzt3"></div><div class="_1_90a"><div class="_36fun"><div class="container nr-width-100 nr-mt-20 nr-p-0"><div class="row"><div class="col-xs-12"><div class="_11CIH nr-mr-20"><button class="_3wvTg _1PvrS _2WUnj _2Jfzj"><span class="_2O7IK"><span class="_6D7oD">Filters</span><span class="_29qeU"><span class="glyphicon glyphicon-triangle-bottom"></span></span></span></button></div></div></div></div><div class="page-section nr-mt-10 nr-pt-20 sb-t2"></div><div class="_3oON0"><div class="_1RtHb"><div class="ki1tU" style="width: calc(34% - 0px);">Recently Purchased</div><div style="width: calc(34% - 0px);">A - Z</div><div style="width: calc(34% - 0px);">Release Date</div></div></div><!-- react-empty: 150 --></div></div><div class="mLd3t"><div class="_2qGVw "><span>My Movies</span><span><!-- react-text: 155 -->&nbsp;(<!-- /react-text --><!-- react-text: 156 -->124<!-- /react-text --><!-- react-text: 157 -->)<!-- /react-text --></span></div><div><div style="position: fixed; width: 100%; left: 0px; top: 253.333px; z-index: 2;"><div style="overflow: hidden; width: 100%;"><div class="nr-pt-40" style="max-height: 1156px; overflow-y: scroll; width: 100%; height: 324.667px; padding-left: calc(50% - 314px);"><div style="position: relative; min-height: 7347px; width: 628px;"><div class="contentPosterWrapper" style="width: 142px; height: 237px; left: 0px; top: 0px; position: absolute; z-index: 2;"><div class="_1-zjZ"><a href="/content/movies/details/The-Social-Network/182239"><div class="_3YJBG  content-poster"><div class="_1witT"><span class="_2KzEp"><div class="_1LPN- _3YCP4"><img src="https://images2.vudu.com/poster2/182239-142" alt="The Social Network"></div></span></div><div class="_20xkP"></div></div></a></div></div><div class="contentPosterWrapper" style="width: 142px; height: 237px; left: 162px; top: 0px; position: absolute; z-index: 2;"><div class="_1-zjZ"><a href="/content/movies/details/Snatch/21789"><div class="_3YJBG  content-poster"><div class="_1witT"><span class="_2KzEp"><div class="_1LPN- _3YCP4"><img src="https://images2.vudu.com/poster2/21789-142" alt="Snatch"></div></span></div><div class="_20xkP"></div></div></a></div></div><div class="contentPosterWrapper" style="width: 142px; height: 237px; left: 324px; top: 0px; position: absolute; z-index: 2;"><div class="_1-zjZ"><a href="/content/movies/details/Life-of-Pi/391851"><div class="_3YJBG dkbEJ content-poster"><div class="_1witT"><span class="_2KzEp"><div class="_1LPN- _3YCP4"><img src="https://images2.vudu.com/poster2/391851-142" alt="Life of Pi"></div></span></div><div class="_20xkP dkbEJ"><div class="_1KXuV"><div class="_2QBtI _1kJwC"><div class="_1yDEt _33qR9 _4mlb1 _25_Pp"><div class="_14Rip"><!-- react-text: 464 -->Life of Pi<!-- /react-text --><!-- react-text: 465 --> <!-- /react-text --></div></div><div class="_33qR9 "><span class="_2kH_h">2012</span><span class="_2kH_h"><span class="_2W2ik ">PG</span></span><span class="_2kH_h"><span class="_35HrQ BE5VD"></span></span><span class="_2kH_h"><span class="Xmp5C gLWM-"></span></span></div>

如果我只尝试.text,也会出现此错误:

File "C:\Users\\AppData\Local\Programs\Python\Python37\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 978, in find_element
'value': value})['value']
File "C:\Users\\AppData\Local\Programs\Python\Python37\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 321, in execute
self.error_handler.check_response(response)
File "C:\Users\\AppData\Local\Programs\Python\Python37\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: invalid argument: 'value' must be a string

4 个答案:

答案 0 :(得分:1)

您还可以结合使用attribute=value CSS选择器和元素选择器。假设您要第一个与该类名称匹配的div元素。如果不止一个,则需要使用find_elements,而以后需要一个。然后,您将索引到该集合以返回适当的匹配项。共享更多HTML / URL将有助于优化这一点。

print(browser.find_element_by_css_selector("div[class='_14Rip']").text)

答案 1 :(得分:1)

第一次尝试:

.get_attribute('text')无效,Element.text仅适用于可见文本,请尝试使用属性'textContent'

movie_name = browser.find_element_by_class_name("_14Rip")
movie = movie_name.get_attribute('textContent')

第二次尝试:

在您的HTML上方,只有一个类_14Rip,但它有3部电影,难道它只是在悬停时才附加的元素吗?但是让我们尝试通过alt图片提取电影标题并添加WebDriverWait

的替代方法
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait 
from selenium.webdriver.support import expected_conditions as EC

.....
.....
# put this after login or clicking login button
# wait max 30 second
WebDriverWait(browser, 30).until(
           EC.presence_of_element_located((By.CSS_SELECTOR, "span[class="_2KzEp"] img")))
movie_names = browser.find_elements_by_css_selector("span[class="_2KzEp"] img")
for name in movie_names:
    print(name.get_attribute('alt'))

答案 2 :(得分:0)

getText()是一种Java方法。您正在寻找text成员。您还需要告诉驾驶员如何找到它,by_class_name在您的情况下

movie_name = browser.find_element_by_class_name('_14Rip').text

您还可以使用By类进行定位

from selenium.webdriver.common.by import By

browser.find_element((By.CLASS_NAME, '_14Rip'))

答案 3 :(得分:0)

我使用了ewwink用户提出的大部分答案。与其使用我试图获得的String名称,我不得不使用附加在图像上的名称。

我使用.find_elements_by_xpath是因为它使我对具有我想要的text属性的图像正确。建议使用的CSS选择器也抛出错误,指出找不到元素。跨度似乎有问题。再次感谢所有帮助我找到解决方案的人!

node your-filename-here