Abonnenten = ["https://www.instagram.com/therock/",
"https://www.instagram.com/selenagomez/",
"https://www.instagram.com/wizkhalifa/",
"https://www.instagram.com/kanyewest/",
"https://www.instagram.com/lilmosey/"]
在这里,我有一些instagram用户及其网址,然后我将进行for循环。
for i in range(len(Abonnenten)):
driver.implicitly_wait(5) #i made a wait so my browser can catch up
driver.get(Abonnenten[i]) #that is what i thought would be correct
# get the text from their instagram bio
wait = WebDriverWait(driver, 10)
bio = wait.until(EC.presence_of_element_located((By.XPATH, "//div[@class='-vDIg']/span"))).text
如果我启动程序,它将加载直到涉及到代码的这一部分。它获取第一个网址,等待3秒钟左右,然后加载第二个网址并停留在该网址上。然后我得到这个错误
"Traceback (most recent call last):
File "C:/Users/xxx/PycharmProjects/Website_Instagram_Browser_Scrap/main.py", line 58, in <module>
bio = wait.until(EC.presence_of_element_located((By.XPATH, "//div[@class='-vDIg']/span"))).text
File "C:\Users\xxx\PycharmProjects\Website_Instagram_Browser_Scrap\venv\lib\site-packages\selenium\webdriver\support\wait.py", line 80, in until
raise TimeoutException(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message: "
我认为在“ bio = ...”上方添加一个等待会有所帮助,但这并没有改变
答案 0 :(得分:0)
查看您的URL,似乎错误出现在Wiz Khalifa的instagram(https://www.instagram.com/wizkhalifa/
)上,因为他没有简历。这导致TimeoutException
,因为没有生物,我们搜索的元素根本不存在。我们可以添加一项检查以查看用户是否有个人简历,如果他们没有个人简历,只需转到下一个URL:
from selenium.common.exceptions import TimeoutException
# move implicit wait outside of loop, we only need to set it once
driver.implicitly_wait(5) #i made a wait so my browser can catch up
for i in range(len(Abonnenten)):
driver.get(Abonnenten[i]) #that is what i thought would be correct
# get the text from their instagram bio
try:
wait = WebDriverWait(driver, 10)
bio = wait.until(EC.presence_of_element_located((By.XPATH, "//div[@class='-vDIg']/span"))).text
# case: the user does not have a bio, so just move on to the next one
except TimeoutException:
continue
我还将您的implicitly_wait
语句移到了for循环之外,因为我们不需要多次设置。