我有一个文件夹名称列表为1d数组: 即:
folderList=['A1_001', 'A1_002', 'A1_003', 'A1_004',
'A2_001', 'A2_002', 'A2_003', 'A2_004',
'A3_001', 'A3_002', 'A3_003', 'A3_004']
并希望按前两个字符对列表进行分组,如“A1”,“A2”和“A3”。 我认为这应该通过groupby完成,但我的代码不起作用
sectionName=[] #to get the first two characters of each element into a new list
for file in folderList:
sectionName.append(file.split('_')[0])
for key, group in groupby(folderList,sectionName):
print key
for record in group:
print record
我收到了一个错误:
for key, group in groupby(folderList,sectionName):
TypeError: 'list' object is not callable
我想得到的是这样的结果:
A1
['A1_001', 'A1_002', 'A1_003', 'A1_004']
A2
['A2_001', 'A2_002', 'A2_003', 'A2_004']
A3
['A3_001', 'A3_002', 'A3_003', 'A3_004']
我认为groupby
函数需要第二个输入作为关键函数,但到目前为止未能将sectionName
实现为keyfunction。
如果你能提供帮助,请提前致谢。
答案 0 :(得分:0)
In [40]: folderList=['A1_001', 'A1_002', 'A1_003', 'A1_004','A2_001', 'A2_002', 'A2_003', 'A2_004','A3_001', 'A3_002', 'A3_003', 'A3_004','B1_001','B1_002','B1_003','B2_001','B2_002','B2_003']
In [41]: for k, v in groupby(folderList, lambda x:x[:2]):
...: print k, [x for x in v]
...:
A1 ['A1_001', 'A1_002', 'A1_003', 'A1_004']
A2 ['A2_001', 'A2_002', 'A2_003', 'A2_004']
A3 ['A3_001', 'A3_002', 'A3_003', 'A3_004']
B1 ['B1_001', 'B1_002', 'B1_003']
B2 ['B2_001', 'B2_002', 'B2_003']
或以简单的方式:
In [42]: result={}
In [43]: for v in folderList:
...: result.setdefault(v[:2],[]).append(v)
...:
In [44]: result
Out[44]:
{'A1': ['A1_001', 'A1_002', 'A1_003', 'A1_004'],
'A2': ['A2_001', 'A2_002', 'A2_003', 'A2_004'],
'A3': ['A3_001', 'A3_002', 'A3_003', 'A3_004'],
'B1': ['B1_001', 'B1_002', 'B1_003'],
'B2': ['B2_001', 'B2_002', 'B2_003']}
答案 1 :(得分:0)
例如:
grouped = {prefix: list(folders) for
prefix, folders in itertools.groupby(folderList, lambda x: x[:2])}
替代方法,不需要对folderList
进行排序:
from collections import defaultdict
grouped = defaultdict(list)
for folder in folderList:
grouped[folder[:2]].append(folder)
答案 2 :(得分:0)
一个简单的循环和defaultdict
将执行:
from collections import defaultdict
folderList=['A1_001', 'A1_002', 'A1_003', 'A1_004',
'A2_001', 'A2_002', 'A2_003', 'A2_004',
'A3_001', 'A3_002', 'A3_003', 'A3_004']
sections = defaultdict(lambda: [])
for folder in folderList:
sections[folder[:2]].append(folder)
print sections.values()
打印:
[['A1_001', 'A1_002', 'A1_003', 'A1_004'], ['A3_001', 'A3_002', 'A3_003', 'A3_004'], ['A2_001', 'A2_002', 'A2_003', 'A2_004']]
groupby
的缺点是必须对输入进行排序,并输出迭代器。在你的情况下,听起来你想要列表,所以你需要采取list
的额外步骤来判断它们。上面的循环是实现你想要的简单方法。
答案 3 :(得分:0)
folderList.sort()
def sectionName(sec):
return sec.split('_', 1)[0]
for key, lst in groupby(folderList, sectionName):
print key
for record in lst:
print record