下载视频

时间:2015-12-17 21:59:25

标签: python video web-scraping

我想获得此处视频的副本:

http://www.sirecam.com/sales/Tattersalls_October_Yearling_Sale/2009

我使用了inspect元素,视频应位于: http://www.sirecam.com/2009/TOYS/TOYS2009_2.flv

但每次我转到此链接时都会收到404错误。

我打算用urlretrieve的小python脚本下载这个视频,但我似乎无法找到要从中检索的链接。

任何有关如何查找视频的帮助都将是一次精彩的学习体验。

1 个答案:

答案 0 :(得分:2)

我是如何使用Chrome找到真实网址的?

  1. 加载http://www.sirecam.com/sales/Tattersalls_October_Yearling_Sale/2009后,请打开Developer Tools
  2. enter image description here

    1. 根据案例情况选择“网络”选项卡选项。
    2. enter image description here

      1. 点击播放(在视频上),查看已加载的资源。然后选择“有趣的”并查看其加载位置的URL:
      2. enter image description here

        最后,您可以在另一个视频上测试这些步骤,看看加载视频的前缀/域是否相同(http://cdn.sirecam.com/)。如果它是相同的,那么只需刮掉视频路径,添加前缀并下载它。如果它不一样,你需要进一步挖掘。

        进一步挖掘:

        在源代码中,正如您在<param name="flashvars"... 中看到的那样,有一些配置:

        config = {
            "key": "#$a13c066f3e6146a6195",
            "clip": {
                "scaling": "orig",
                "autoPlay": true,
                "urlResolvers": "cluster",
                "bufferLength": 6,
                "autoBuffering": true,
                "url": "/2009/TOYS/TOYS2009_2.flv"
            },
            "contextMenu": [{
                "About Sirecam ...": "function()"
            }],
            "canvas": {
                "backgroundImage": "url(images/sirecam/player_bg_sales.png)",
                "backgroundColor": "#ffffff"
            },
            "plugins": {
                "cluster": {
                    "debug": true,
                    "url": "images/flowplayer/flowplayer.cluster-3.1.1.swf",
                    "hosts": ["http://cdn.sirecam.com", "http://d103cgplnnab87.cloudfront.net", "http://s3.sirecam.com", "http://vdo.sirecam.com"],
                    "connectTimeout": 20000,
                    "failureExpiry": 20000
                },
                "controls": {
                    "borderRadius": 0,
                    "timeColor": "rgba(253, 185, 49, 1)",
                    "slowForward": true,
                    "bufferGradient": "none",
                    "backgroundColor": "rgba(120, 120, 120, 1)",
                    "volumeSliderGradient": "none",
                    "slowBackward": false,
                    "timeBorderRadius": 20,
                    "time": true,
                    "progressGradient": "none",
                    "height": 22,
                    "volumeColor": "rgba(0, 51, 153, 1)",
                    "tooltips": {
                        "marginBottom": 5,
                        "scrubber": true,
                        "volume": true,
                        "buttons": false
                    },
                    "opacity": 1,
                    "fastBackward": false,
                    "timeFontSize": 12,
                    "border": "0px",
                    "bufferColor": "rgba(0, 51, 153, 1)",
                    "volumeSliderColor": "rgba(253, 185, 49, 1)",
                    "buttonColor": "rgba(209, 209, 209, 1)",
                    "mute": false,
                    "autoHide": {
                        "enabled": false,
                        "hideDelay": 500,
                        "hideStyle": "move",
                        "mouseOutDelay": 500,
                        "hideDuration": 400,
                        "fullscreenOnly": true
                    },
                    "backgroundGradient": [0.5, 0.2, 0],
                    "width": "100pct",
                    "display": "block",
                    "sliderBorder": "1px solid rgba(128, 128, 128, 0.7)",
                    "buttonOverColor": "#ffffff",
                    "fullscreen": true,
                    "timeBgColor": "rgba(0, 0, 0, 0.55)",
                    "scrubberBarHeightRatio": 0.2,
                    "bottom": 0,
                    "stop": false,
                    "zIndex": 1,
                    "sliderColor": "#000000",
                    "scrubberHeightRatio": 0.6,
                    "tooltipTextColor": "rgba(51, 51, 51, 1)",
                    "spacing": {
                        "time": 6,
                        "volume": 8,
                        "all": 2
                    },
                    "sliderGradient": "none",
                    "timeBgHeightRatio": 0.8,
                    "volumeSliderHeightRatio": 0.6,
                    "timeSeparator": " ",
                    "name": "controls",
                    "volumeBarHeightRatio": 0.2,
                    "left": "50pct",
                    "tooltipColor": "rgba(253, 185, 49, 1)",
                    "playlist": false,
                    "durationColor": "rgba(255, 255, 255, 1)",
                    "play": true,
                    "fastForward": true,
                    "progressColor": "rgba(253, 185, 49, 1)",
                    "timeBorder": "0px solid rgba(0, 0, 0, 0.3)",
                    "volume": true,
                    "scrubber": true,
                    "builtIn": false,
                    "volumeBorder": "1px solid rgba(128, 128, 128, 0.7)",
                    "margins": [2, 6, 2, 12]
                }
            },
            "playerId": "player",
            "playlist": [{
                "scaling": "orig",
                "autoPlay": true,
                "urlResolvers": "cluster",
                "bufferLength": 6,
                "autoBuffering": true,
                "url": "/2009/TOYS/TOYS2009_2.flv"
            }]
        }
        

        在该配置中,你会看到类似的内容:

        "hosts": ["http://cdn.sirecam.com", "http://d103cgplnnab87.cloudfront.net", "http://s3.sirecam.com", "http://vdo.sirecam.com"],
        

        其中包含应该投放视频的主机。因此,在http://www.sirecam.com/sales/Tattersalls_October_Yearling_Sale/2009中,您会看到路径为/2009/TOYS/TOYS2009_2.flv,如果您播放并尝试从这些主机加载视频,则所有这些都可能有效:

        你可能会看到,这是一个调查问题。然后,您可以使用您的首选语言(Python?)开发一些脚本来执行这些步骤并下载视频。