我正在尝试使用 selenium python 从 Youtube 中抓取数据。我正在抓取的数据具有诸如订阅者、位置、加入和查看之类的字段。尽管给出了正确的 xpath,我还是收到了这样的错误
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"//*[@id="subscriber-count"]"}
(Session info: chrome=90.0.4430.93)
我尝试过的方法是使用 css_selectors、完整的 xpath、id、类名,但它们都没有真正起作用。都返回与上述相同的错误。
这是我在我的 python 脚本中编写它的方式:
youtube_subscribers = browser.find_element_by_xpath('//*[@id="subscriber-count"]').text
youtube_location = browser.find_element_by_xpath('//*[@id="details-container"]/table/tbody/tr[2]/td[2]/yt-formatted-string').text
youtube_joined_on = browser.find_element_by_xpath('//*[@id="right-column"]/yt-formatted-string[2]/span[2]').text
youtube_views = browser.find_element_by_xpath('//*[@id="right-column"]/yt-formatted-string[3]').text
print('Youtube Subscribers:', youtube_subscribers)
print('Youtube Location:', youtube_location)
print('Youtube Joined on:', youtube_joined_on)
print('Youtube views:', youtube_views)
我从这里抓取https://www.youtube.com/c/adidas/about
。我到底哪里错了?
请帮忙!
编辑:这是相同的完整代码。
website = ['https://www.pinterest.com/adidas/', 'https://www.pinterest.com/nike/', 'https://www.pinterest.com/puma/']
options = webdriver.ChromeOptions()
options.add_argument('start-maximized')
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option("useAutomationExtension", False)
browser = webdriver.Chrome(ChromeDriverManager().install(), options=options)
delays = [7, 4, 6, 2, 10, 19]
delay = np.random.choice(delays)
for crawler in website:
browser.get(crawler)
time.sleep(2)
time.sleep(delay)
pinterest_brand_name = browser.find_element_by_xpath('/html/body/div[1]/div[1]/div/div/div[1]/div[1]/div[2]/div/div/div/div[2]/div/div/h1').text
pinterest_followers = browser.find_element_by_xpath('/html/body/div[1]/div[1]/div/div/div[1]/div[1]/div[2]/div/div/div/div[2]/div/div/div[2]/div/span[1]').text
pinterest_following = browser.find_element_by_xpath('/html/body/div[1]/div[1]/div/div/div[1]/div[1]/div[2]/div/div/div/div[2]/div/div/div[2]/div/div[1]/span[1]').text
youtube_subscribers = browser.find_element_by_xpath('//*[@id="subscriber-count"]').text
youtube_location = browser.find_element_by_xpath('//*[@id="details-container"]/table/tbody/tr[2]/td[2]/yt-formatted-string').text
youtube_joined_on = browser.find_element_by_xpath('//*[@id="right-column"]/yt-formatted-string[2]/span[2]').text
youtube_views = browser.find_element_by_xpath('//*[@id="right-column"]/yt-formatted-string[3]').text
print('Pinterest Brand Name:', pinterest_brand_name)
print('Pinterest Followers:', pinterest_followers)
print('Pinterest Following:', pinterest_following)
print('Youtube Subscribers:', youtube_subscribers)
print('Youtube Location:', youtube_location)
print('Youtube Joined on:', youtube_joined_on)
print('Youtube views:', youtube_views)
答案 0 :(得分:1)
我看不到您的其余代码,但假设所有这些元素都在同一页面上,并且在第一个元素上失败,您可能只需要在其中添加一个等待命令以使第一个元素可见.
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
youtube_subscribers = WebDriverWait(browser, 10).until(EC.presence_of_element_located((By.XPATH,'//*[@id="subscriber-count"]')))
假设你想要的所有元素都在那个页面上,你应该只需要等待上面的那个。你可以玩玩它,看看有什么作用
更新代码
下面应该可以工作,将循环分成两部分,这样它就不会在 Pinterest 网站上尝试查找 YouTube 网络元素时失败,反之亦然......
website = ['https://www.pinterest.com/adidas/', 'https://www.pinterest.com/nike/', 'https://www.pinterest.com/puma/',
'https://www.youtube.com/c/adidas/about']
for crawler in website:
if "pinterest" in crawler:
browser.get(crawler)
sleep(3)
#time.sleep(delay)
pinterest_brand_name = browser.find_element_by_xpath('/html/body/div[1]/div[1]/div/div/div[1]/div[1]/div[2]/div/div/div/div[2]/div/div/h1').text
pinterest_followers = browser.find_element_by_xpath('/html/body/div[1]/div[1]/div/div/div[1]/div[1]/div[2]/div/div/div/div[2]/div/div/div[2]/div/span[1]').text
pinterest_following = browser.find_element_by_xpath('/html/body/div[1]/div[1]/div/div/div[1]/div[1]/div[2]/div/div/div/div[2]/div/div/div[2]/div/div[1]/span[1]').text
print('Pinterest Brand Name:', pinterest_brand_name)
print('Pinterest Followers:', pinterest_followers)
print('Pinterest Following:', pinterest_following)
elif "youtube" in crawler:
browser.get(crawler)
sleep(3)
#time.sleep(delay)
youtube_subscribers = browser.find_element_by_xpath('//*[@id="subscriber-count"]').text
youtube_location = browser.find_element_by_xpath('//*[@id="details-container"]/table/tbody/tr[2]/td[2]/yt-formatted-string').text
youtube_joined_on = browser.find_element_by_xpath('//*[@id="right-column"]/yt-formatted-string[2]/span[2]').text
youtube_views = browser.find_element_by_xpath('//*[@id="right-column"]/yt-formatted-string[3]').text
print('Youtube Subscribers:', youtube_subscribers)
print('Youtube Location:', youtube_location)
print('Youtube Joined on:', youtube_joined_on)
print('Youtube views:', youtube_views)
结果:
Pinterest Brand Name: adidas
Pinterest Followers: 623,732
Pinterest Following: 9
Pinterest Brand Name: Nike
Pinterest Followers: 765,138
Pinterest Following: 3
Pinterest Brand Name: PUMA
Pinterest Followers: 88,759
Pinterest Following: 280
Youtube Subscribers: 925K subscribers
Youtube Location: United States
Youtube Joined on: Oct 29, 2005
Youtube views: 168,725,975 views