我只想抓取黑框中包含的必需信息,并删除/删除/排除红框中包含的信息
我这样做是因为两个框中都存在类名“ entry”和“ partial entry”。只有第一个“部分条目”包含我需要的信息,因此我打算删除/删除/排除类名“ mgrRspnInLine”。
我的代码是:
while True:
container = driver.find_elements_by_xpath('.//*[contains(@class,"review-container")]')
for item in container:
try:
element = item.find_element_by_class_name('mgrRspnInline')
driver.execute_script("""var element = document.getElementsByClassName("mgrRspnInline")[0];element.parentNode.removeChild(element);""", element)
WebDriverWait(driver, 50).until(EC.presence_of_element_located((By.XPATH,'.//*[contains(@class,"taLnk ulBlueLinks")]')))
element = WebDriverWait(driver, 50).until(EC.element_to_be_clickable((By.XPATH,'.//*[contains(@class,"taLnk ulBlueLinks")]')))
element.click()
time.sleep(2)
rating = item.find_elements_by_xpath('.//*[contains(@class,"ui_bubble_rating bubble_")]')
for rate in rating:
rate = rate.get_attribute("class")
rate = str(rate)
rate = rate[-2:]
score_list.append(rate)
time.sleep(2)
stay = item.find_elements_by_xpath('.//*[contains(@class,"recommend-titleInline noRatings")]')
for stayed in stay:
stayed = stayed.text
stayed = stayed.split(', ')
stayed.append(stayed[0])
travel_type.append(stayed[1])
WebDriverWait(driver, 50).until(EC.presence_of_element_located((By.XPATH,'.//*[contains(@class,"noQuotes")]')))
summary = item.find_elements_by_xpath('.//*[contains(@class,"noQuotes")]')
for comment in summary:
comment = comment.text
comments.append(comment)
WebDriverWait(driver, 50).until(EC.presence_of_element_located((By.XPATH,'.//*[contains(@class,"ratingDate")]')))
rating_date = item.find_elements_by_xpath('.//*[contains(@class,"ratingDate")]')
for date in rating_date:
date = date.get_attribute("title")
date = str(date)
review_date.append(date)
WebDriverWait(driver, 50).until(EC.presence_of_element_located((By.XPATH,'.//*[contains(@class,"partial_entry")]')))
review = item.find_elements_by_xpath('.//*[contains(@class,"partial_entry")]')
for comment in review:
comment = comment.text
print(comment)
reviews.append(comment)
except (NoSuchElementException) as e:
continue
try:
element = WebDriverWait(driver, 100).until(EC.element_to_be_clickable((By.XPATH,'.//*[contains(@class,"nav next taLnk ui_button primary")]')))
element.click()
time.sleep(2)
except (ElementClickInterceptedException,NoSuchElementException) as e:
print(e)
break
基本上,在“评论容器”中,我首先搜索了类名“ mgrRspnInLine”,然后尝试使用execute_script删除它。
但是不幸的是,输出仍然显示“ mgrRspnInLine”中包含的内容。
答案 0 :(得分:2)
如果要避免通过XPath匹配第二个元素,则可以按如下所示修改XPath:
.//*[contains(@class,"partial_entry") and not(ancestor::*[@class="mgrRspnInLine"])]
仅当元素祖先没有类名"partial_entry"
时,此元素才会与类名"mgrRspnInLine"
匹配
答案 1 :(得分:0)
如果要第一次出现,可以使用css类选择器代替:
.partial_entry
并使用find_element_by_css_selector
进行检索:
find_element_by_css_selector(".partial_entry")
答案 2 :(得分:0)
您可以使用以下方法删除所有.mgrRspnInLine
元素:
driver.execute_script("[...document.querySelectorAll('.mgrRspnInLine')].map(el => el.parentNode.removeChild(el))")
答案 3 :(得分:0)
拼接Andersson的评论以及QHarr和pguardiario提供的两个答案。我终于解决了这个问题。
关键是针对容器内的一个容器,所有信息都包含在类名“ ui_column is-9”中,该类名包含在类名“ review-container”中,因此解决了Andersson对多个{{ 1}}。
在嵌套循环中,我使用了pguardianrio的建议来删除现有的多个mgrRspnInLine
,然后在.partial_entry上添加QHarr的答案
mgrRspnInLine