我已经做了2天了。
试图从跨度中获取所有文本,这些跨度会出现在许多div中。
所有div看起来都差不多:
<div class="_3_7SH _3DFk6 message-in">
<div class="Tkt2p">
<div class="copyable-text" data-pre-plain-text="[10:26 AM, 5/28/2019] יוסף צדוק: ">
<div class="_3zb-j ZhF0n">
<span dir="rtl" class="XELVh selectable-text invisible-space copyable-text">TEXT TO COPY IS ME</span></div></div>
<div class="_2f-RV"><div class="_1DZAH">
<span class="_1ORuP">
</span><span class="_3EFt_">10:26 AM</span></div></div></div><span></span></div>
这是尝试查找所有“消息入”元素的方法:
in_mesg_arr = driver.find_elements_by_xpath("//div[contains(@class, 'message-in')]")
我得到了数组的长度:11
然后,尝试从跨度中获取所有文本:
for index in in_mesg_arr:
last_msg = last_msg + str(index.find_element_by_xpath(
"//span[contains(@class,'selectable-text invisible-space copyable-text')]").text)
但是,我又得到了相同的文本(一遍又一遍相同的元素!)。
print(last_msg)= bla bla bla bla bla bla bla bla bla bla bla bla bla
很高兴获得一些指导。
完整的HTML:
答案 0 :(得分:2)
for index in last_msg:
last_msg = last_msg + str(in_mesg_arr[index].find_element_by_xpath(
"//span[contains(@class,'selectable-text invisible-space copyable-text')]").text)
此代码将始终返回第一个元素,因为它将搜索span
中任何位置的DOM
元素。
循环中的XPath
表达式必须以dot
开头,以便与上下文相关。请使用以下任何代码。
in_mesg_arr = driver.find_elements_by_xpath("//div[contains(@class, 'message-in')]")
for item in in_mesg_arr:
spanele=item.find_element_by_xpath(".//span[contains(@class,'selectable-text invisible-space copyable-text')]")
print(spanele.text)
OR
in_mesg_arr = driver.find_elements_by_xpath("//div[contains(@class, 'message-in')]")
for item in range(len(in_mesg_arr)):
spanele=in_mesg_arr[item].find_element_by_xpath(".//span[contains(@class,'selectable-text invisible-space copyable-text')]")
print(spanele.text)
让我知道怎么回事。
答案 1 :(得分:0)
可以使用BeautifulSoup完成这些操作
from bs4 import BeautifulSoup
content = '''
<div> class = "*something* message-in *something*" <div>
<span> class = "selectable-text invisible-space copyable-text" <span>
'''
soup = BeautifulSoup(content,"lxml")
span_text = [x.get_text() for x in soup.find_all('span')]
html_con = '''
<div class="_3_7SH _3DFk6 message-in">
<div class="Tkt2p">
<div class="copyable-text" data-pre-plain-text="[10:26 AM, 5/28/2019] יוסף צדוק: ">
<div class="_3zb-j ZhF0n">
<span dir="rtl" class="XELVh selectable-text invisible-space copyable-text">TEXT TO COPY IS ME</span></div></div>
<div class="_2f-RV"><div class="_1DZAH">
<span class="_1ORuP">
</span><span class="_3EFt_">10:26 AM</span></div></div></div><span></span></div>
<div class="_3_7SH _3DFk6123456 message-in">
<div class="Tkt2p">
<div class="copyable-text" data-pre-plain-text="[10:26 AM, 5/28/2019] יוסף צדוק: ">
<div class="_3zb-j ZhF0n">
<span dir="rtl" class="XELVh selectable-text invisible-space copyable-text">New text</span></div></div>
<div class="_2f-RV"><div class="_1DZAH">
<span class="_1ORuP">
</span><span class="_3EFt_">10:26 AM</span></div></div></div><span></span></div>
'''
soup = BeautifulSoup(html_con)
content_message_in= soup.find_all('div', {'class': 'message-in'})
span_content =[x.find_all('span') for x in content_message_in]
span_text = [x[0].get_text() for x in span_content]
#o/p
['TEXT TO COPY IS ME', 'New text']
答案 2 :(得分:0)
是否可以在获得跨度时使用
find_element_by_xpath
代替
find_elements_by_xpath
因此它每次都只返回匹配的第一个元素。
查看此问题的答案: https://sqa.stackexchange.com/questions/37380/find-elements-by-xpath-issue?answertab=votes#tab-top