Question

我有两个Python脚本：
脚本1：检查网页上的元素并将其写入文件脚本2：从此文件中读取并使用内容作为if语句的参数。
这是我不确定的部分。

文本文件在新行上至少有500个项目，我想在我重新访问该网站时检查这些项目是否仍然存在。

    def read_input_file(self):
    inFile = open("page_items.txt","r")
    if inFile == current_content:
        do.stuff

最好的方法是什么？

Answer 1

使用第一个脚本再次抓取网站并将其保存在一个集合中。然后使用.issubset检查'inFile'中的所有内容是否都包含在current_site中？

current_site = set(scraped_items)
if set(inFile).issubset(current_site):
     do.stuff

Answer 2

事实证明，那套不是我真正想要的东西。主要是因为重新启动需要经过刮擦的内容。所以文本文件是我能想到的唯一选择。

我确实找到了一个解决方案，而不是抓取current_site并将其与infile匹配，我现在从infile开始，并使用Selenium在current_site上搜索该行。

这是我提出的，它不是很干净，但也许对未来的某些人有用

import linecache            

for i in range(0, 200):
        scraped_content = linecache.getline('scraped.txt', count)
        scraped_content = str(scraped_content).rstrip()
        search_path = "//*[contains(text(),'",scraped_content,"')]"
        joined_string = "".join(str(x) for x in search_path)

        scroll_down = driver.find_element_by_tag_name('a')
        scroll_down.send_keys(Keys.PAGE_DOWN)
        scroll_to_element = None
        while not scroll_to_element:
            try:
                scroll_to_element = driver.find_element_by_xpath(joined_string)
                time.sleep(1)
            except NoSuchElementException:
                print "Searching for Content:", scraped_content
                break

        if scroll_to_element != None:
            try:
                print scraped_content,"Found!"

检查文本文件中的参数Python

2 个答案: