Question

假设我有一个音频文件名列表（可能是其中包含连续数字的任何字符串列表），它们具有不同的命名方案，但所有这些都包含其文件名中的曲目编号。

我想提取不断变化的数字。

示例1

Fooband 41 - Live - 1. Foo Title
...
Fooband 41 - Live - 11. Another Foo Title

期望的结果

数字列表：1,2,3,...,11

示例2

02. Barband - Foo Title with a 4 in it
05. Barband - Another Foo Title
03. Barband - Bar Title
...
17. Barband - Yet another Foo Title

期望的结果

数字列表：2,5,3,...,17

由于索引号的位置不固定，我（想）我不能在那里使用正则表达式。

我有什么

查找字符串的公共前缀和后缀并将其删除
查看字符串左/右侧是否有数字
使用该数字获取指数

但是有一个问题：如果我找到示例1 的公共前缀，则公共前缀将是 Fooband 41 - Live - 1，因此1会丢失（同样适用于Song X - 10, Song X - 11, ...)这样的命名方案。

问题

在字符串列表中检测和提取更改数字（在类似位置上）的好方法是什么？

我正在使用Python（不是这个问题很重要）

如果我能够检测出罗马数字，也会获得奖金，但我怀疑这会更加困难。

Answer 1

f = open('data.txt')
data = []

pattern = "\d+|[IVX]+"
regex = re.compile(pattern)

for line in f:
    matches = re.findall(regex, line)
    data.append(matches)

f.close()

print data
transposed_data = zip(*data)
print transposed_data

for atuple in transposed_data:
    val = atuple[0]

    if all([num==val for num in atuple]): 
        next
    else:
        print atuple
        break

data.txt中：

Fooband 41 - Live - 1. Foo Title
Fooband 41 - Live - 2. Foo Title
Fooband 41 - Live - 3. Foo Title
Fooband 41 - Live - 11. Another Foo Title

- 输出： -

[['41', '1'], ['41', '2'], ['41', '3'], ['41', '11']]
[('41', '41', '41', '41'), ('1', '2', '3', '11')]
('1', '2', '3', '11')

data.txt中：

01. Barband - Foo Title with a 4 in it
05. Barband - Another Foo Title
03. Barband - Bar Title
17. Barband - Yet another Foo Title

- 输出： -

[['01', '4'], ['05'], ['03'], ['17']]
[('01', '05', '03', '17')]
('01', '05', '03', '17')

data.txt中：

01 Barband - Foo Title with a (I) in it
01 Barband - Another Foo (II) Title
01. Barband - Bar Title (IV)
01. Barband - Yet another (XII) Foo Title

- 输出： -

[['01', 'I'], ['01', 'II'], ['01', 'IV'], ['01', 'XII']]
[('01', '01', '01', '01'), ('I', 'II', 'IV', 'XII')]
('I', 'II', 'IV', 'XII')

Answer 2

如果格式相似，则可以使用python的re module。从字符串列表中提取这些数字的简短代码如下所示：

import re
regex = re.compile(".*([0-9]+).*")

number = regex.match("Fooband 41 - Live - 1. Foo Title").group(1)

检测并提取字符串列表中的更改数字

问题

2 个答案: