Selenium - 如何从具有相同类名的元素获取信息

时间:2017-03-02 03:50:41

标签: python python-3.x selenium video youtube

我试图创建一个python应用程序来提取YouTube视频的所有YouTube视频。

我目前正在尝试使用selenium。

def getVideoTitles():
    driver = webdriver.Chrome("/Users/{username}/PycharmProjects/YoutubeChannelVideos/chromedriver")
    driver.get(googleYoutubePage())

    titleElement = driver.find_element_by_class_name("yt-lockup-content")
    print(titleElement.text) #it prints out title, + views, hours ago, and "CC"
     #I suck at selenium so lets just store the title and cut everything after it

class_name yt-lockup-content是youtube频道/视频页面上每个视频的类名。 在上面的代码中,我可以获得该页面上第一个YouTube视频的标题。但是我想遍历所有的youtube标题(换句话说,我想迭代每一个yt-lockup-content元素)以存储.text。

但我想知道如何访问yt-lockup-content [2] persay。换句话说,该页面上的第二个视频具有相同的类名

这是我的完整代码。 随意玩

'''

'''
import selenium
from selenium import webdriver

def getChannelName():
    print("Please enter the channel that you would like to scrape video titles...")
    channelName = input()
    googleSearch = "https://www.google.ca/search?q=%s+youtube&oq=%s+youtube&aqs=chrome..69i57j0l5.2898j0j4&sourceid=chrome&ie=UTF-8#q=%s+youtube&*" %(channelName, channelName, channelName)
    print(googleSearch)
    return googleSearch

def googleYoutubePage():
    driver = webdriver.Chrome("/Users/{username}/PycharmProjects/YoutubeChannelVideos/chromedriver")
    driver.get(getChannelName())
    element = driver.find_element_by_class_name("s") #this is where the link to the proper youtube page lives
    keys = element.text #this grabs the link to the youtube page + other crap that will be cut

    splitKeys = keys.split(" ") #this needs to be split, because aside from the link it grabs the page description, which we need to truncate
    linkToPage = splitKeys[0] #this is where the link lives

    for index, char in enumerate(linkToPage): #this loops over the link to find where the stuff beside the link begins (which is unecessary)
        if char == "\n":
            extraCrapStartsHere = index #it starts here, we know everything beyond here can be cut


    link = ""
    for i in range(extraCrapStartsHere): #the offical link will be everything in the linkToPage up to where we found suitable to cut
        link = link + linkToPage[i]

    videosPage = link + "/videos"
    print(videosPage)
    return videosPage

def getVideoTitles():
    driver = webdriver.Chrome("/Users/{username}/PycharmProjects/YoutubeChannelVideos/chromedriver")
    driver.get(googleYoutubePage())

    titleElement = driver.find_element_by_class_name("yt-lockup-content")
    print(titleElement.text) #it prints out title, + views, hours ago, and "CC"
                            #I suck at selenium so lets just store the title and cut everything after it


def main():
    getVideoTitles()

main()

3 个答案:

答案 0 :(得分:1)

您可以使用driver.find_element_by_class_name而不是使用driver.find_elements_by_class_name,它将返回具有指定类名的所有元素的列表。

从那里,您可以遍历列表并获取每个YouTube视频的标题。

答案 1 :(得分:1)

这似乎是一种过于复杂的方式。您可以使用网址https://www.youtube.com/user/ {频道名称} /视频直接导航到视频页面,循环浏览标题,然后打印它们。

print("Please enter the channel that you would like to scrape video titles...")
channelName = input()
videosUrl = "https://www.youtube.com/user/%s/videos" % channelName
driver = webdriver.Chrome("/Users/{username}/PycharmProjects/YoutubeChannelVideos/chromedriver")
driver.get(videosUrl)
for title in driver.find_elements_by_class_name("yt-uix-tile-link")
    print(title.text)

答案 2 :(得分:0)

您是否尝试过driver.find_elements_by_css_selector(".yt-lockup-content")