Question

我目前正在接受有关网页抓取的培训。为此，我正在尝试获取喜欢该帖子的人的用户名列表。到目前为止，我已经编写了一个脚本，可以处理少量用户名列表（小于100），但仍然不可靠，因为我认为平台正在检测自动化...

这是我的代码：（'https://www.instagram.com/p/BuT_u-UAKn1/'）

# Locating the likers' list
userid_element = driver.find_elements_by_xpath('//*[@id="react-root"]/section/main/div/div/article/div[2]/section[2]/div/div/a')[0].click()
time.sleep(2)

# Getting the number of likes on the given post
likes = driver.find_element_by_xpath('/html/body/span/section/main/div/div/article/div[2]/section[2]/div/div/a/span').text
no_space = likes.replace(' ','')
int_likes = int(no_space)
print(int_likes)

# Getting the actual list of users
users = []

height = driver.find_element_by_xpath("/html/body/div[2]/div/div[2]/div/div").value_of_css_property("padding-top")
match = False
while match==False:
    lastHeight = height

    # step 1
    elements = driver.find_elements_by_xpath("//*[@id]/div/a")

    # step 2
    for element in elements:
        if element.get_attribute('title') not in users:
            users.append(element.get_attribute('title'))

    # step 3
    driver.execute_script("return arguments[0].scrollIntoView();", elements[-1])
    time.sleep(0.1)

    # step 4
    height = driver.find_element_by_xpath("//html/body/div[2]/div/div[2]/div/div").value_of_css_property("padding-top")
    if int_likes == len(users) and lastHeight==height:
        match = True

print(users)
print(len(users))
driver.quit()

问题是我收到此错误：

StaleElementReferenceException：消息：的元素引用是陈旧的；要么该元素不再附加到DOM，它不在当前框架上下文中，或者文档已刷新

所以我想添加一个条件，说“当循环停止加载并且用户列表的长度不等于int_likes时，请中断，等待几秒钟，向上滚动一点（例如400像素）然后再次重新启动该过程，直到满足要求为止。”

但是我很难将这种逻辑应用于实际代码中……你们能帮我这个忙吗？

谢谢

Python：刮除硒/嵌套循环

0 个答案: