我有一个包含这样的行的文件:
abcd1/klk2reg/lolba3
abcd2/klk34reg/lolba56
wxyz5/klk6reg/temp1
wxyz5/klk99reg
我想要这样的输出:
abcd*/klk*reg/lolba*
wxyz5/klk6reg/temp1
wxyz5/klk99reg
单词之间用斜杠分隔,每个斜杠代表一个层次结构。因此wxyz5 / klk99reg和wxyz5 / klk6reg / temp1无法结合在一起。
答案 0 :(得分:0)
您可以为每个输入字符串创建一个对象,然后使用itertools.groupby
:
import re, itertools
class Path:
def __init__(self, _path):
self.path, self._org = _path.split('/'), _path
def __len__(self):
return len(self.path)
def __iter__(self):
yield from (re.sub('\d+', '*', i) for i in self.path)
def __eq__(self, _path_obj):
if len(self) != len(_path_obj):
return False
return all(a == b for a, b in zip(self, _path_obj))
d = ['abcd1/klk2reg/lolba3', 'abcd2/klk34reg/lolba56', 'wxyz5/klk6reg/temp1', 'wxyz5/klk99reg']
new_data = [[a, list(b)] for a, b in itertools.groupby(list(map(Path, sorted(d))))]
final_data = ['/'.join(a) if len(b) > 1 else a._org for a, b in new_data]
输出:
['abcd*/klk*reg/lolba*', 'wxyz5/klk6reg/temp1', 'wxyz5/klk99reg']