Question

我正在尝试从此页面上的链接获取视频网址。视频链接可以在https://in.news.yahoo.com/video/jaguar-fighter-aircraft-crashes-near-084300217.html上看到。（在Chrome中打开）

为此我写了如下的chrome web driver相关代码：

from bs4 import BeautifulSoup
from selenium import webdriver
from pyvirtualdisplay import Display

chromedriver = '/usr/local/bin/chromedriver'
os.environ['webdriver.chrome.driver'] = chromedriver
display = Display(visible=0, size=(800,600))
display.start()
driver = webdriver.Chrome(chromedriver)

        driver.get('https://in.news.yahoo.com/video/jaguar-fighter-aircraft-crashes-near-084300217.html')
        try:
            element = WebDriverWait(driver, 20).until(lambda driver: driver.find_elements_by_class_name('yvp-main'))
            self.yahoo_video_trend = []
            for s in driver.find_elements_by_class_name('yvp-main'):
                print "Processing link  - ", item['link']
                trend = item
                print item['description']
                trend['video_link'] = s.find_element_by_tag_name('video').get_attribute('src')
                print 
                print s.find_element_by_tag_name('video').get_attribute('src')
                self.yahoo_video_trend.append(trend)
        except:
            return

这在我的本地系统上工作正常，但是当我在我的天蓝色服务器上运行时，它不会在s.find_element_by_tag_name('video').get_attribute('src')

给出任何结果

我在我的azureserver上安装了chrome。

更新：

请参阅，请求和 Beautifulsoup 我已经尝试过了，但是当雅虎从json动态加载html内容时，我无法使用它们。

是的天蓝色服务器是简单的Linux系统，具有命令行访问权限。没有任何申请。

Answer 1

我尝试使用您的代码重现您的问题。但是，我发现该页面中没有名为video的标签（'https://in.news.yahoo.com/video/jaguar-fighter-aircraft-crashes-near-084300217.html'）（使用IE和Chrome进行测试）。我使用开发人员工具检查HTML代码，如下图所示：

enter image description here 看来这个页面使用flash播放器播放视频，而不是HTML5视频控件。因此，我建议您检查代码是否使用了正确的标记名称。如有任何疑虑，请随时告诉我。

Answer 2

我们试图在我们这边重现错误。我无法让Chrome驱动程序工作，但我确实尝试了firefox驱动程序，它工作正常。它能够加载页面并通过URL获取链接。

您是否可以更改代码以打印异常并将其发送给我们，以查看脚本失败的位置？

更改您的代码：

except:
    return

try

DO

except Exception,e: print str(e)

向我们发送例外，我们可以看看。

selenium适用于本地而非天蓝服务器

2 个答案: