清理YouTube下载

时间:2013-02-07 15:40:30

标签: python

我有一位朋友试图清理这样的YouTube下载:

Google I_O 2012 - App Engine Overview(720p_H.264-AAC).mp4
Google I_O 2012 - App Engine Overview(360p_H.264-AAC).mp4
Google I_O 2012 - Android Fireside Chat(720p_H.264-AAC).mp4
Google I_O 2012 - Android Fireside Chat(360p_H.264-AAC).mp4
Google I_O 2012 - Android Developer Sandbox Interviews(720p_H.264-AAC).mp4
Google I_O 2012 - Android Developer Sandbox Interviews(360p_H.264-AAC).mp4
Google I_O 2012 - Android Design for Success(720p_H.264-AAC).mp4
Google I_O 2012 - Android Design for Success(360p_H.264-AAC).mp4
... hundreds of files ...

特别是,他希望保留以多种格式下载的最高视频资源。

我是一个没有任何正规学校教育的初学者,但我想为他制作一个脚本,在几百个视频中做到这一点。该脚本附在下面。在第一次执行它时,它用于杀死错误的文件,但我追踪到执行分辨率“字符串”而不是“int”的字典比较(即1024p低于360p等)。我解决了这个问题并对其内容进行了重大调整。现在,他因为杀死了很多很好的文件,保留了错误的版本而对我很生气。我不知道脚本会出现什么问题?

你能发现“当前”的错误吗?

对初学者的任何其他建议都非常赞赏!我不认为下面的代码是正确的,pythonic,有效或类似的东西..但我很乐意从中学习!

顺便说一下:如何最好地调试这样的代码?在Office中使用VBScript我曾经有一个逐步的调试器来帮助我..(根据原则可以忽略这个问题:每个帖子一个问题)

#!/usr/bin/env python2.7

import sys
import os
import re
import blist
import pprint

N_REQUIRED_ARGUMENTS = 2

YOUTUBE_FILE_REGEX = re.compile(r'^(.*)\((\d+)(p_H.264-AAC\)\.mp4)$')

def show_usage():

  script_name = os.path.basename(sys.argv[0])
  print('Usage: {} directory_to_inspect'.format(script_name))

def main():

  if len(sys.argv) != N_REQUIRED_ARGUMENTS:
    show_usage()
    sys.exit(0)

  directory_to_inspect = sys.argv[1]

  if not os.path.isdir(directory_to_inspect):
    print('Error: path does not appear to be a directory "{}".'.format(
      directory_to_inspect))
    show_usage()
    sys.exit(0)

  directory_files = os.listdir(directory_to_inspect)

  videos_seen = {}

  for filename in directory_files:

    match = YOUTUBE_FILE_REGEX.match(filename)

    if not match:
      continue

    video_title = match.group(1)
    video_resolution = int(match.group(2))

    if not video_title in videos_seen:

      videos_seen[video_title] = {
        'resolutions_seen': blist.sortedset([video_resolution]),
        'to_keep': set([filename]),
        'to_delete': set(),
      }

    else:

      video_info = videos_seen[video_title]

      resolutions_seen = video_info['resolutions_seen']
      to_delete = video_info['to_delete']
      to_keep = video_info['to_keep']

      max_resolution_seen = resolutions_seen[-1]
      if video_resolution < max_resolution_seen:
        to_delete.add(filename)
      else:
        to_delete.add(to_keep.pop())
        to_keep.add(filename)

      resolutions_seen.add(video_resolution)

  for video_group in videos_seen:
        video_info = videos_seen[video_group]

        print(video_group)

for video_group in videos_seen:
  video_info = videos_seen[video_group]

  print(video_group)

  for video_to_keep in video_info['to_keep']:
    print('- Keep: {}'.format(video_to_keep))

  for video_to_delete in video_info['to_delete']:
    print('- Delete: {}'.format(video_to_delete))


if __name__ == '__main__':
  main()

0 个答案:

没有答案