正则表达式发布文件扩展名Python 2.7

时间:2016-11-22 03:49:21

标签: python regex file

import re,urllib

[('clown.gif', 'gif'), ('sleeper.jpg', 'jpg'), ('StarWarsReview.docx', 'docx'), ('wargames.jpg', 'jpg'), ('nothingtoseehere.docx', 'docx'), ('starwars.jpg', 'jpg'), ('logo.jpg', 'jpg'), ('certified.jpg', 'jpg'), ('clown.gif', 'gif'), ('essays.gif', 'gif'), ('big.jpg', 'jpg'), ('Doc100.docx', 'docx'), ('FavRomComs.docx', 'docx'), ('python.bmp', 'bmp'), ('dingbat.jpg', 'jpg')]

运行此代码后,我的正则表达式出现问题,因此答案如下:

('clown.gif', 'gif')

我不希望结果如此['clown.gif','sleeper.jpg']所有我想要的就像是$ffmpeg = "/usb/bin/local/ffmpeg"; $videos = "/videos/*.mp4"; $ouput_path = "/videos/thumbnails/"; foreach(glob($videos) as $video_file){ $lfilename = basename($video_file); $filename = basename($video_file, ".mp4"); $thumbnail = $ouput_path.$filename.'.jpg'; if (!file_exists($filename)) { #$thumbnail = str_replace("'", "%27", $thumbnail); exec("/usr/local/bin/ffmpeg -i '$video_file' -an -y -f mjpeg -ss 00:00:30 -vframes 1 '$thumbnail'"); } echo "<a href='$lfilename'>$filename<img src='thumbnails/$filename.jpg' width='350'>"; 等等

反正有吗?并得到红色的元组??

2 个答案:

答案 0 :(得分:1)

您只需将您的群组变为a non-capturing group

def get_files(page):
    a = urllib.urlopen(page)
    b = a.read()
    c = re.findall("([a-zA-Z0-9]+\.{1}(?:jpg|bmp|docx|gif))", b)

答案 1 :(得分:0)

您正在对扩展程序进行双重捕获,请尝试使用下面的正则表达式,?:表示非捕获组

re.findall("([a-zA-Z0-9]+\.{1}(?:jpg|bmp|docx|gif))", b)

我将您的正则表达式简化为以下,{1}似乎是多余的,并使用\w\d作为单词和数字组

re.findall("([\w\d]+\.(?:jpg|bmp|docx|gif))", b)