python组文件名由完全字符串组成

时间:2015-03-16 10:43:17

标签: python

我正在尝试按特定ID对文件名进行分组,这是我到目前为止所做的:

(ccc)khine@dhegdheer:~/Sandboxes/Business/continentalclothing.com$ python
Python 2.7.6 (default, Mar 22 2014, 22:59:56) 
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> cropped = ['ccc-public,assets/cropped/low_res/EP01.jpg', 'ccc-public,assets/cropped/low_res/EP01L.jpg', 'ccc-public,assets/cropped/low_res/EP01_10.jpg', 'ccc-public,assets/cropped/low_res/EP01_20_1.jpg', 'ccc-public,assets/cropped/low_res/EP01_21_1.jpg', 'ccc-public,assets/cropped/low_res/EP02.jpg', 'ccc-public,assets/cropped/low_res/EP03.jpg', 'ccc-public,assets/cropped/low_res/EP03V.jpg']
>>> styles = ['EP01', 'EP01L', 'EP02', 'EP03', 'EP03V']
>>> def get_cropped(style):
...     cropped_images = []
...     matching = [key for key in cropped if style in key.rsplit('/', 1)[1]]
...     for x in matching:
...             cropped_images.append(x)
...     return cropped_images
... 
>>> for style in styles:
...     get_s3_images = get_cropped(style)
...     for x in get_s3_images:
...             print x
... 
assets/cropped/low_res/EP01.jpg
assets/cropped/low_res/EP01L.jpg
assets/cropped/low_res/EP01_10.jpg
assets/cropped/low_res/EP01_20_1.jpg
assets/cropped/low_res/EP01_21_1.jpg
assets/cropped/low_res/EP01L.jpg
assets/cropped/low_res/EP02.jpg
assets/cropped/low_res/EP03.jpg
assets/cropped/low_res/EP03V.jpg
assets/cropped/low_res/EP03V.jpg

提取包含EP01的所有路径的正确方法是什么,所以我得到一个像

这样的列表
    "/assets/cropped/low_res/EP01.jpg",
    "/assets/cropped/low_res/EP01_10.jpg",
    "/assets/cropped/low_res/EP01_20_1.jpg",
    "/assets/cropped/low_res/EP01_21_1.jpg"

不包括"/assets/cropped/low_res/EP01L.jpg",条目,EP01L要退回,只需<Key: ccc-public,assets/cropped/low_res/EP01L.jpg>

任何建议非常感谢

1 个答案:

答案 0 :(得分:1)

您可以使用正则表达式仅匹配您需要的子字符串,例如r'.*/EP01(?:_|\b)'

import re

def get_cropped(style):
    rex = re.compile(r'.*/%s(?:\b|_)' % style)
    cropped_images = [img for img in cropped if rex.match(img)]
    return cropped_images

for style in styles:
    print '\nStyle:', style
    get_s3_images = get_cropped(style)
    for x in get_s3_images:
        print x

除了列表理解,您还可以使用内置函数filter

def get_cropped(style):
    rex = re.compile(r'.*/%s(?:\b|_)' % style)
    cropped_images = filter(rex.match, cropped)
    return cropped_images

<强>更新

另一种可能的解决方案涉及标准模块itertools

import re
import itertools

rex = re.compile(r'.*/(.*?)[_.]')
for (style, images) in itertools.groupby(cropped, lambda s: rex.match(s).group(1)):
    print '\nStyle:', style
    for img in images:
        print ' ', img

lambda 函数从图像路径中提取图像样式,返回最后一个斜杠后的字符,直到第一个下划线或点。然后itertools.groupby使用此信息按照其样式对cropped中列出的所有路径进行分组。