Question

我有一个图像文件，我想用python检查它是否是图像序列的一部分。

例如，我从这个文件开始：

/projects/image_0001.jpg

我想检查文件是否是序列的一部分，即

/projects/image_0001.jpg
/projects/image_0002.jpg
/projects/image_0003.jpg
...

如果我可以确定文件名是否是序列的艺术，即如果文件名的序列号是

，则检查是否有一系列图像看起来很简单

我的第一个问题是要求用户将####添加到数字应该的文件路径中，并输入一个开始和结束帧编号来替换哈希值，但这显然不是非常用户友好。有没有办法用正则表达式或类似的东西检查字符串中的数字序列？

Answer 1

使用python的re模块来查看字符串是否包含数字序列相对容易。你可以这样做：

mo = re.findall('\d+', filename)

这将返回filename中所有数字序列的列表。如果：

只有一个结果（即文件名只包含一个数字序列），AND
后续文件名具有相同长度的单个数字序列AND
第二个数字序列比前一个

......那么也许他们是序列的一部分。

Answer 2

我认为问题更多的是能够区分磁盘上的顺序文件，而不是知道有关文件名本身的任何特定信息。

如果是这样的话，你正在寻找的东西是足够智能的，可以采取如下列表：

/path/to/file_1.png
/path/to/file_2.png
/path/to/file_3.png
...
/path/to/file_10.png
/path/to/image_1.png
/path/to/image_2.png
...
/path/to/image_10.png

得到一个结果说 - 我有两个文件序列：/ path / to / file_#.png和/path/to/image_#.png你需要2次传递 - 第一次传递来确定有效的表达式对于文件，第二次通过以确定所有其他文件满足该要求。

你还需要知道你是否会支持差距（是否需要顺序）

/path/to/file_1.png
/path/to/file_2.png
/path/to/file_3.png
/path/to/file_5.png
/path/to/file_6.png
/path/to/file_7.png

这是1个序列（/path/to/file_#.png）还是2个序列（/path/to/file_1-3.png,/path/to/file_5-7.png）

另外 - 你想如何处理序列中的数字文件？

/path/to/file2_1.png
/path/to/file2_2.png
/path/to/file2_3.png

等

考虑到这一点，我就是这样做的：

    import os.path
    import projex.sorting
    import re

    def find_sequences( filenames ):
        """
        Parse a list of filenames into a dictionary of sequences.  Filenames not
        part of a sequence are returned in the None key

        :param      filenames | [<str>, ..]

        :return     {<str> sequence: [<str> filename, ..], ..}
        """
        local_filenames   = filenames[:]
        sequence_patterns = {}
        sequences         = {None: []}

        # sort the files (by natural order) so we always generate a pattern
        # based on the first potential file in a sequence
        local_filenames.sort(projex.sorting.natural)

        # create the expression to determine if a sequence is possible
        # we are going to assume that its always going to be the 
        # last set of digits that makes a sequence, i.e.
        #
        #    test2_1.png
        #    test2_2.png
        #
        # test2 will be treated as part of the name
        # 
        #    test1.png
        #    test2.png
        #
        # whereas here the 1 and 2 are part of the sequence
        #
        # more advanced expressions would be needed to support
        # 
        #    test_01_2.png
        #    test_02_2.png
        #    test_03_2.png

        pattern_expr = re.compile('^(.*)(\d+)([^\d]*)$')

        # process the inputed files for sequences
        for filename in filenames:
            # first, check to see if this filename matches a sequence
            found = False
            for key, pattern in sequence_patterns.items():
                match = pattern.match(filename)
                if ( not match ):
                    continue

                sequences[key].append(filename)
                found = True
                break

            # if we've already been matched, then continue on
            if ( found ):
                continue

            # next, see if this filename should start a new sequence
            basename      = os.path.basename(filename)
            pattern_match = pattern_expr.match(basename)
            if ( pattern_match ):
                opts = (pattern_match.group(1), pattern_match.group(3))
                key  = '%s#%s' % opts

                # create a new pattern based on the filename
                sequence_pattern = re.compile('^%s\d+%s$' % opts)

                sequence_patterns[key] = sequence_pattern
                sequences[key] = [filename]
                continue

            # otherwise, add it to the list of non-sequences
            sequences[None].append(filename)

        # now that we have grouped everything, we'll merge back filenames
        # that were potential sequences, but only contain a single file to the
        # non-sequential list
        for key, filenames in sequences.items():
            if ( key is None or len(filenames) > 1 ):
                continue

            sequences.pop(key)
            sequences[None] += filenames

        return sequences

一个示例用法：

>>> test =   ['test1.png','test2.png','test3.png','test4.png','test2_1.png','test2_2.png','test2_3.png','test2_4.png']
>>> results = find_sequences(test)
>>> results.keys()
[None, 'test#.png', 'test2_#.png']

其中有一种方法引用自然排序，这是一个单独的主题。我刚从我的projex库中使用了我的自然排序方法。它是开源的，所以如果您想使用或查看它，请点击此处：http://dev.projexsoftware.com/projects/projex

但是这个主题已在论坛的其他地方介绍过，所以只使用了库中的方法。

什么是确定图像是否是序列的一部分的最佳方法

2 个答案: