我有一位朋友试图清理这样的YouTube下载:
Google I_O 2012 - App Engine Overview(720p_H.264-AAC).mp4
Google I_O 2012 - App Engine Overview(360p_H.264-AAC).mp4
Google I_O 2012 - Android Fireside Chat(720p_H.264-AAC).mp4
Google I_O 2012 - Android Fireside Chat(360p_H.264-AAC).mp4
Google I_O 2012 - Android Developer Sandbox Interviews(720p_H.264-AAC).mp4
Google I_O 2012 - Android Developer Sandbox Interviews(360p_H.264-AAC).mp4
Google I_O 2012 - Android Design for Success(720p_H.264-AAC).mp4
Google I_O 2012 - Android Design for Success(360p_H.264-AAC).mp4
... hundreds of files ...
特别是,他希望保留以多种格式下载的最高视频资源。
我是一个没有任何正规学校教育的初学者,但我想为他制作一个脚本,在几百个视频中做到这一点。该脚本附在下面。在第一次执行它时,它用于杀死错误的文件,但我追踪到执行分辨率“字符串”而不是“int”的字典比较(即1024p低于360p等)。我解决了这个问题并对其内容进行了重大调整。现在,他因为杀死了很多很好的文件,保留了错误的版本而对我很生气。我不知道脚本会出现什么问题?
你能发现“当前”的错误吗?
对初学者的任何其他建议都非常赞赏!我不认为下面的代码是正确的,pythonic,有效或类似的东西..但我很乐意从中学习!
顺便说一下:如何最好地调试这样的代码?在Office中使用VBScript我曾经有一个逐步的调试器来帮助我..(根据原则可以忽略这个问题:每个帖子一个问题)
#!/usr/bin/env python2.7
import sys
import os
import re
import blist
import pprint
N_REQUIRED_ARGUMENTS = 2
YOUTUBE_FILE_REGEX = re.compile(r'^(.*)\((\d+)(p_H.264-AAC\)\.mp4)$')
def show_usage():
script_name = os.path.basename(sys.argv[0])
print('Usage: {} directory_to_inspect'.format(script_name))
def main():
if len(sys.argv) != N_REQUIRED_ARGUMENTS:
show_usage()
sys.exit(0)
directory_to_inspect = sys.argv[1]
if not os.path.isdir(directory_to_inspect):
print('Error: path does not appear to be a directory "{}".'.format(
directory_to_inspect))
show_usage()
sys.exit(0)
directory_files = os.listdir(directory_to_inspect)
videos_seen = {}
for filename in directory_files:
match = YOUTUBE_FILE_REGEX.match(filename)
if not match:
continue
video_title = match.group(1)
video_resolution = int(match.group(2))
if not video_title in videos_seen:
videos_seen[video_title] = {
'resolutions_seen': blist.sortedset([video_resolution]),
'to_keep': set([filename]),
'to_delete': set(),
}
else:
video_info = videos_seen[video_title]
resolutions_seen = video_info['resolutions_seen']
to_delete = video_info['to_delete']
to_keep = video_info['to_keep']
max_resolution_seen = resolutions_seen[-1]
if video_resolution < max_resolution_seen:
to_delete.add(filename)
else:
to_delete.add(to_keep.pop())
to_keep.add(filename)
resolutions_seen.add(video_resolution)
for video_group in videos_seen:
video_info = videos_seen[video_group]
print(video_group)
for video_group in videos_seen:
video_info = videos_seen[video_group]
print(video_group)
for video_to_keep in video_info['to_keep']:
print('- Keep: {}'.format(video_to_keep))
for video_to_delete in video_info['to_delete']:
print('- Delete: {}'.format(video_to_delete))
if __name__ == '__main__':
main()