我试图创建一个python应用程序来提取YouTube视频的所有YouTube视频。
我目前正在尝试使用selenium。
def getVideoTitles():
driver = webdriver.Chrome("/Users/{username}/PycharmProjects/YoutubeChannelVideos/chromedriver")
driver.get(googleYoutubePage())
titleElement = driver.find_element_by_class_name("yt-lockup-content")
print(titleElement.text) #it prints out title, + views, hours ago, and "CC"
#I suck at selenium so lets just store the title and cut everything after it
class_name yt-lockup-content是youtube频道/视频页面上每个视频的类名。 在上面的代码中,我可以获得该页面上第一个YouTube视频的标题。但是我想遍历所有的youtube标题(换句话说,我想迭代每一个yt-lockup-content元素)以存储.text。
但我想知道如何访问yt-lockup-content [2] persay。换句话说,该页面上的第二个视频具有相同的类名
这是我的完整代码。 随意玩
'''
'''
import selenium
from selenium import webdriver
def getChannelName():
print("Please enter the channel that you would like to scrape video titles...")
channelName = input()
googleSearch = "https://www.google.ca/search?q=%s+youtube&oq=%s+youtube&aqs=chrome..69i57j0l5.2898j0j4&sourceid=chrome&ie=UTF-8#q=%s+youtube&*" %(channelName, channelName, channelName)
print(googleSearch)
return googleSearch
def googleYoutubePage():
driver = webdriver.Chrome("/Users/{username}/PycharmProjects/YoutubeChannelVideos/chromedriver")
driver.get(getChannelName())
element = driver.find_element_by_class_name("s") #this is where the link to the proper youtube page lives
keys = element.text #this grabs the link to the youtube page + other crap that will be cut
splitKeys = keys.split(" ") #this needs to be split, because aside from the link it grabs the page description, which we need to truncate
linkToPage = splitKeys[0] #this is where the link lives
for index, char in enumerate(linkToPage): #this loops over the link to find where the stuff beside the link begins (which is unecessary)
if char == "\n":
extraCrapStartsHere = index #it starts here, we know everything beyond here can be cut
link = ""
for i in range(extraCrapStartsHere): #the offical link will be everything in the linkToPage up to where we found suitable to cut
link = link + linkToPage[i]
videosPage = link + "/videos"
print(videosPage)
return videosPage
def getVideoTitles():
driver = webdriver.Chrome("/Users/{username}/PycharmProjects/YoutubeChannelVideos/chromedriver")
driver.get(googleYoutubePage())
titleElement = driver.find_element_by_class_name("yt-lockup-content")
print(titleElement.text) #it prints out title, + views, hours ago, and "CC"
#I suck at selenium so lets just store the title and cut everything after it
def main():
getVideoTitles()
main()
答案 0 :(得分:1)
您可以使用driver.find_element_by_class_name
而不是使用driver.find_elements_by_class_name
,它将返回具有指定类名的所有元素的列表。
从那里,您可以遍历列表并获取每个YouTube视频的标题。
答案 1 :(得分:1)
这似乎是一种过于复杂的方式。您可以使用网址https://www.youtube.com/user/ {频道名称} /视频直接导航到视频页面,循环浏览标题,然后打印它们。
print("Please enter the channel that you would like to scrape video titles...")
channelName = input()
videosUrl = "https://www.youtube.com/user/%s/videos" % channelName
driver = webdriver.Chrome("/Users/{username}/PycharmProjects/YoutubeChannelVideos/chromedriver")
driver.get(videosUrl)
for title in driver.find_elements_by_class_name("yt-uix-tile-link")
print(title.text)
答案 2 :(得分:0)
您是否尝试过driver.find_elements_by_css_selector(".yt-lockup-content")
?