我正在尝试按特定ID对文件名进行分组,这是我到目前为止所做的:
(ccc)khine@dhegdheer:~/Sandboxes/Business/continentalclothing.com$ python
Python 2.7.6 (default, Mar 22 2014, 22:59:56)
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> cropped = ['ccc-public,assets/cropped/low_res/EP01.jpg', 'ccc-public,assets/cropped/low_res/EP01L.jpg', 'ccc-public,assets/cropped/low_res/EP01_10.jpg', 'ccc-public,assets/cropped/low_res/EP01_20_1.jpg', 'ccc-public,assets/cropped/low_res/EP01_21_1.jpg', 'ccc-public,assets/cropped/low_res/EP02.jpg', 'ccc-public,assets/cropped/low_res/EP03.jpg', 'ccc-public,assets/cropped/low_res/EP03V.jpg']
>>> styles = ['EP01', 'EP01L', 'EP02', 'EP03', 'EP03V']
>>> def get_cropped(style):
... cropped_images = []
... matching = [key for key in cropped if style in key.rsplit('/', 1)[1]]
... for x in matching:
... cropped_images.append(x)
... return cropped_images
...
>>> for style in styles:
... get_s3_images = get_cropped(style)
... for x in get_s3_images:
... print x
...
assets/cropped/low_res/EP01.jpg
assets/cropped/low_res/EP01L.jpg
assets/cropped/low_res/EP01_10.jpg
assets/cropped/low_res/EP01_20_1.jpg
assets/cropped/low_res/EP01_21_1.jpg
assets/cropped/low_res/EP01L.jpg
assets/cropped/low_res/EP02.jpg
assets/cropped/low_res/EP03.jpg
assets/cropped/low_res/EP03V.jpg
assets/cropped/low_res/EP03V.jpg
提取包含EP01的所有路径的正确方法是什么,所以我得到一个像
这样的列表 "/assets/cropped/low_res/EP01.jpg",
"/assets/cropped/low_res/EP01_10.jpg",
"/assets/cropped/low_res/EP01_20_1.jpg",
"/assets/cropped/low_res/EP01_21_1.jpg"
不包括"/assets/cropped/low_res/EP01L.jpg",
条目,EP01L
要退回,只需<Key: ccc-public,assets/cropped/low_res/EP01L.jpg>
任何建议非常感谢
答案 0 :(得分:1)
您可以使用正则表达式仅匹配您需要的子字符串,例如r'.*/EP01(?:_|\b)'
:
import re
def get_cropped(style):
rex = re.compile(r'.*/%s(?:\b|_)' % style)
cropped_images = [img for img in cropped if rex.match(img)]
return cropped_images
for style in styles:
print '\nStyle:', style
get_s3_images = get_cropped(style)
for x in get_s3_images:
print x
除了列表理解,您还可以使用内置函数filter
:
def get_cropped(style):
rex = re.compile(r'.*/%s(?:\b|_)' % style)
cropped_images = filter(rex.match, cropped)
return cropped_images
<强>更新强>
另一种可能的解决方案涉及标准模块itertools
:
import re
import itertools
rex = re.compile(r'.*/(.*?)[_.]')
for (style, images) in itertools.groupby(cropped, lambda s: rex.match(s).group(1)):
print '\nStyle:', style
for img in images:
print ' ', img
lambda 函数从图像路径中提取图像样式,返回最后一个斜杠后的字符,直到第一个下划线或点。然后itertools.groupby
使用此信息按照其样式对cropped
中列出的所有路径进行分组。