Question

我正在尝试保存网页的一部分并将其另存为html文件。

我可以这样手动进行：

当我在Chrome或Mozilla中使用F12（开发人员工具）并使用选择器选择所需网站的位置时，会看到div，并复制Xpath。然后，我复制该元素的HTML并粘贴到记事本编辑器中，并将其另存为HTML。

我在Selenium IDE之前就使用过，但是找不到找到该div Xpath内容的方法。

有没有办法结合使用Selenium IDE和JavaScript或Python？

也许有人可以建议我如何实现这一目标。

谢谢

Answer 1

您是说硒 IDE 还是硒+ Python？我的答案是针对Selenium IDE（不需要python）。

store Text命令可满足您的需求：

Selenium IDE软件测试工具中的storeText命令对于将页面元素的文本值存储到变量中以供将来使用很有用。因此，它是从HTML文本和表格中进行网络抓取信息的推荐命令。

请注意，对于输入框，请选择框，复选框，单选按钮或文本区域，因为您看到的文本在技术上是字段值。因此，storeText在设计上不适用于这些元素，它将返回“”。而是使用store Value从输入元素中提取文本。

通常，请参见web scraping with Selenium IDE了解所有可能的选项。

保存到文件：

在常规的硒中无法做到这一点，但是ui.vision selenium ide ++具有附加的命令可以做到这一点：

csvSave-用于创建包含数据的CSV文件，但是当然，您也可以将其与单个值一起使用：

storeText | xpath=... | var1（将值提取到var1）
store | ${var1} | !csvLine（将var1值添加到CSV文件中）
csvSave | filename（将当前CSV行写入磁盘）

如果要save the complete web page，另一种选择是将模拟CTRL + S与XType | ${KEY_CTRL+KEY_S}一起使用

Answer 2

这只是硒的例子，而不是您的特定答案。

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
import time
import random


from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.common.keys import Keys




seed = 1
random.seed(seed)

driver = webdriver.Chrome()
driver.get("https://www.myntra.com/")

element = driver.find_element_by_xpath("//*[@id='desktop-header-cnt']/div[2]/div[3]/input")

# Put the word "history" in the search box and hit enter
element.send_keys("pantaloons")
element.send_keys(Keys.RETURN)

time.sleep(3)
for i in range(1000):
    time.sleep(1)
    for i in range(120):
        actions = ActionChains(driver)
        actions.send_keys(Keys.ARROW_DOWN)
        actions.perform()
        time.sleep(0.10)

    element=driver.find_element_by_xpath(" //*[@id='desktopSearchResults']/div[2]/section/div[2]/ul/li[12]/a")
    element.click()
    time.sleep(1)




#
#
# # Get a list of elements (videos) that get returned by the search
# search_results = driver.find_elements_by_id("video-title")
#
# # Click randomly on one of the first five results
# search_results[random.randint(0,10)].click()
#
# # Go to the end of the page (I don't know if this is necessary
#
# #
# time.sleep(4)
#
# # Get the recommended videos the same way as above. This is where the problem starts, because recommended_videos essentially becomes the same thing as the previous page's search_results, even though the browser is in a new page now.
# while True:
#     recommended_videos = driver.find_elements_by_xpath("//*[@id='dismissable']/div/a")
#     print(recommended_videos)
#     recommended_videos[random.randint(1,4)].click()
#     time.sleep(4)

您可以尝试转储页面源并对其进行解析，或者仅转储元素源。

页面源到pageSource变量（Java）：

String pageSource = driver.getPageSource（）; 元素源到elementSource变量（Java）：

WebElement元素= driver.findElement（By.id（“ id”））; 字符串elementSource = element.getAttribute（“ innerHTML”）;

将网页的一部分另存为HTML

2 个答案: