Question

我有一个像这样的文件

301 my name is joe
303 whatsup
306 how are you doing today
308 what happened?
308 going home
309 let's go

我想将标签301, 303, 306, 308, 308, 309转换为1, 2, 3, 4, 4, 5

如何按顺序重命名这些标签，以便类似标签获得相同的数字？

Answer 1

使用字典存储从原始标签到新标签的映射，并使用len将字典的当前setdefault用于尚未映射的值。

>>> labels = 301, 303, 306, 308, 308, 309
>>> names = {}
>>> for l in labels:
...     names.setdefault(l, len(names)+1)
...
>>> names
{301: 1, 303: 2, 306: 3, 308: 4, 309: 5}

更完整的例子：

text = """301 my name is joe
303 whatsup
306 how are you doing today
308 what happened?
308 going home
309 let's go""".splitlines()

import re
names = {}
replacer = lambda x: str(names.setdefault(x.group(), len(names) + 1))
for line in text:
    replaced = re.sub(r'^\d+', replacer, line)
    print(replaced)

输出：

1 my name is joe
2 whatsup
3 how are you doing today
4 what happened?
4 going home
5 let's go

Answer 2

您可以使用仅在标签与最后一个标签不同时递增的索引：

data = ["301 my name is joe", "303 whatsup", "306 how are you doing today", "308 what happened?", "308 going home", "309 let's go"]

idx = 0
last_index = ""
for i in range(len(data)):
    if last_index != data[i].split(" ")[0]: idx += 1
    print str(idx) + " " + ' '.join(data[i].split(" ")[1:])
    last_index = data[i].split(" ")[0]

结果：

1 my name is joe
2 whatsup
3 how are you doing today
4 what happened?
4 going home
5 let's go

Answer 3

使用dict收集前缀和计数器。

data = """301 my name is joe
303 whatsup
306 how are you doing today
308 what happened?
308 going home
309 let's go"""

prefixes = {}
i = 1 

for line in data.split("\n"):
    prefix, rest = line.split(" ", 1)
    pr = int(prefix)
    if not pr in prefixes:
        prefixes[pr] = i
        i = i + 1
    newPrefix = prefixes[pr]
    print("{} {}".format(newPrefix, rest))

输出：

1 my name is joe
2 whatsup
3 how are you doing today
4 what happened?
4 going home
5 let's go

Answer 4

def update_text(data):
    labels = sorted(set([line.split()[0] for line in data.splitlines()]))
    for inx, line in enumerate(data.splitlines()):
        yield str(labels.index(line.split()[0]) + 1) + ' ' + ' '.join(line.split()[1:])

data = '''301 my name is joe
303 whatsup
306 how are you doing today
308 what happened?
308 going home
309 let's go'''

print '\n'.join(update_text(data))

<强>输出：

1 my name is joe
2 whatsup
3 how are you doing today
4 what happened?
4 going home
5 let's go

另一个简单的解决方案：

>>> keys = sorted(set([line.split()[0] for line in data.splitlines()]))
>>> for k, v in enumerate(keys):
...     data = data.replace(v, str(k + 1))
... 
>>> print data
1 my name is joe
2 whatsup
3 how are you doing today
4 what happened?
4 going home
5 let's go

如何重命名列表中的标签？

4 个答案: