Question

我在一个文件夹中有数千张图片。图像命名为0.png，1.png，2.png ......

我编写了以下代码，为正样本生成平均图像，对负样本生成类似图像。

file_list = glob.glob(trainDir)
n = len(file_list)
label = np.load('labels_v2.dat')
positive = np.empty((300,400,4))
negative = np.empty((300,400,4))
labels = np.empty(n)
count_p = 0
count_n = 0

for i in range(1000):
    img = imread(file_list[i])
    lbl = label[i]
    if (lbl == 1):
        positive +=  img
        count_p += 1
        print file_list[i]

然而，这按照1,10,100,1000,10000,10001 ......的顺序读取文件，我的标签是0,1,2,3，......我怎么做它以正确的顺序阅读？

Answer 1

file_list = os.listdir(trainDir)
file_list.sort(key=lambda s: int(os.path.splitext(s)[0]))

或者，要跳过O（n lg n）排序成本，在循环内部执行

img = imread("%d.EXT" % i)

其中EXT是适当的扩展名（例如jpg）。

Answer 2

您似乎想要排序数字顺序而不是字典顺序。我的第一个想法是：

import locale
l=["11", "01", "3", "20", "0", "5"]
l.sort(key=locale.strxfrm)    # strcoll would have to repeat the transform
print l

但是，只有当你的语言环境真正按照这种方式对数字进行排序时，这才有用，而且我不知道该为什么设置它。

与此同时，一种解决方法是在排序功能中查找数字。

def numfromstr(s):
  s2=''.join(c for c in s if c.isdigit())
  return int(s2)
l.sort(key=numfromstr)

但仅此一点就是在数字上排序的缺点。可以通过拆分数字边界并对结果元组进行排序来弥补这一点......这变得越来越复杂。

import re
e=re.compile('([0-9]+|[^0-9]+)')
def sorttup(s):
  parts=[]
  for part in e.findall(s):
    try:
      parts.append(int(part))
    except ValueError:
      parts.append(part)
  return tuple(parts)
l.sort(key=sorttup)

嗯，这至少有点接近，但它既不漂亮也不快。

根据序列号而不是名称读取文件

2 个答案: