我试图找到一种有效的方法来遍历列表中的元素,并将公共元素组合到另一个列表grouplist中。
〔实施例
In[]: grouplist = []
In[]: filelist
Out[]:['C:\\West-California-North-10.xlsx',
'C:\\West-California-North-5.xlsx',
'C:\\West-California-East-1.xlsx',
'C:\\West-California-South-1.xlsx',
'C:\\South-California-North-5.xlsx',
'C:\\West-California-South-3.xlsx']
我想找到一组具有不同整数的常见模式。所以在这种情况下,
第一次迭代grouplist =
C:\\West-California-North-10.xlsx
C:\\West-California-North-5.xlsx
第二次迭代=
C:\\West-California-East-1.xlsx
第三次迭代=
C:\\West-California-South-1.xlsx
C:\\West-California-South-3.xlsx
答案 0 :(得分:2)
itertools.groupby
是你的朋友:
from itertools import groupby
filelist = [
'C:\\West-California-North-10.xlsx',
'C:\\West-California-North-5.xlsx',
'C:\\West-California-East-1.xlsx',
'C:\\West-California-South-1.xlsx',
'C:\\South-California-North-5.xlsx',
'C:\\West-California-South-3.xlsx']
key_fn = lambda s: s.rsplit('-',1)[0]
# before grouping, list has to be sorted
filelist = sorted(filelist, key=key_fn)
# usually use the same key_fn for grouping as was used for sorting
for key, grouped_file_names in groupby(filelist, key=key_fn):
# groupby returns an iterator of tuples
# the first element of the tuple is the grouped key value
# the second element is a generator of the items that matched that key
# (YOU MUST CONSUME THIS GENERATOR BEFORE MOVING ON TO THE NEXT KEY)
print '\n'.join(list(grouped_file_names))
print
打印
C:\South-California-North-5.xlsx
C:\West-California-East-1.xlsx
C:\West-California-North-10.xlsx
C:\West-California-North-5.xlsx
C:\West-California-South-1.xlsx
C:\West-California-South-3.xlsx
答案 1 :(得分:1)
您可以使用字典来根据位置名称对路径进行分类。
要将位置名称与尾随ID分开,您可以使用str.rsplit()
,然后通过在其中传递dict.setdefault()
对象来使用set()
方法,以保留唯一名称:
>>> lst=['C:\\West-California-North-10.xlsx', 'C:\\West-California-North-5.xlsx','C:\\West-California-East-1.xlsx','C:\\West-California-South-1.xlsx','C:\\South-California-North-5.xlsx','C:\\West-California-South-3.xlsx']
>>> d = {}
>>> new = [path.rsplit('-',1) for path in lst]
>>> for i,j in new:
... d.setdefault(i,set()).add(i+'-'+j)
...
>>> d.values()
[set(['C:\\West-California-East-1.xlsx']),
set(['C:\\West-California-North-10.xlsx','C:\\West-California-North-5.xlsx']),
set(['C:\\South-California-North-5.xlsx']),
set(['C:\\West-California-South-1.xlsx', 'C:\\West-California-South-3.xlsx'])]
>>>
答案 2 :(得分:1)
使用defaultdict
:
from collections import defaultdict
d = defaultdict(set)
for fle in l:
k, rest = fle.rsplit("-", 1)
d[k].add("{}-{}".format(k, rest))
for k,v in d.items():
print "\n".join(v)
print
输出:
C:\West-California-East-1.xlsx
C:\West-California-North-10.xlsx
C:\West-California-North-5.xlsx
C:\South-California-North-5.xlsx
C:\West-California-South-1.xlsx
C:\West-California-South-3.xlsx
如果您想保留首次看到元素的顺序,请使用OrderedDict
:
from collections import OrderedDict
d = OrderedDict()
for fle in l:
k, rest = fle.rsplit("-", 1)
d.setdefault(k,set()).add("{}-{}".format(k, rest))
for k,v in d.items():
print "\n".join(v)
print
输出:
C:\West-California-North-10.xlsx
C:\West-California-North-5.xlsx
C:\West-California-East-1.xlsx
C:\West-California-South-1.xlsx
C:\West-California-South-3.xlsx
C:\South-California-North-5.xlsx
如果您的姓名中没有数字,您也可以str.translate
而不是分割:
from collections import defaultdict
d = defaultdict(set)
for fle in l:
d[fle.translate(None,"0123456789")].add(fle)
for k,v in d.items():
print "\n".join(v)
print
输出:
C:\West-California-East-1.xlsx
C:\West-California-South-1.xlsx
C:\West-California-South-3.xlsx
C:\South-California-North-5.xlsx
C:\West-California-North-10.xlsx
C:\West-California-North-5.xlsx
答案 3 :(得分:1)
使用sorted
和regex
怎么样?您可以修改并对此排序有更多控制权 - 只需更改sorter
功能。
import re
d = ['C:\\West-California-North-10.xlsx',
'C:\\West-California-North-5.xlsx',
'C:\\West-California-East-1.xlsx',
'C:\\West-California-South-3.xlsx',
'C:\\West-California-South-1.xlsx',
'C:\\South-California-North-5.xlsx',
'C:\\West-California-South-3.xlsx']
def sorter(s):
direction1 = re.findall(r'(\w+)-California-',s)[0]#first West/South
direction2 = re.findall(r'California-(\w+)',s)[0]#second West/South
num = int(re.findall(r'California-\w+-(\w+)',s)[0])#10 r 5 or 1 or 3
return direction1,direction2,num
dd = sorted(d,key=sorter)
for t in dd:
print t
输出 -
C:\South-California-North-5.xlsx
C:\West-California-East-1.xlsx
C:\West-California-North-5.xlsx
C:\West-California-North-10.xlsx
C:\West-California-South-1.xlsx
C:\West-California-South-3.xlsx
C:\West-California-South-3.xlsx
自定义sorter
函数的示例 -
如果您更改以下分拣机功能,即根据数字丢弃分拣 -
def sorter(s):
direction1 = re.findall(r'(\w+)-California-',s)[0]#first West/South
direction2 = re.findall(r'California-(\w+)',s)[0]#second West/South
num = int(re.findall(r'California-\w+-(\w+)',s)[0])#10 r 5 or 1 or 3
return direction1,direction2# omitted num here
然后输出 -
C:\South-California-North-5.xlsx
C:\West-California-East-1.xlsx
C:\West-California-North-10.xlsx
C:\West-California-North-5.xlsx
C:\West-California-South-3.xlsx
C:\West-California-South-1.xlsx
C:\West-California-South-3.xlsx
毕竟你可以按照下面的方式迭代它们 -
import re
from collections import defaultdict,OrderedDict
d = ['C:\\West-California-North-10.xlsx',
'C:\\West-California-North-5.xlsx',
'C:\\West-California-East-1.xlsx',
'C:\\West-California-South-3.xlsx',
'C:\\West-California-South-1.xlsx',
'C:\\South-California-North-5.xlsx',
'C:\\West-California-South-3.xlsx']
group_data = defaultdict(list)
def sorter(s):
direction1 = re.findall(r'(\w+)-California-',s)[0]#first West/South
direction2 = re.findall(r'California-(\w+)',s)[0]#second West/South
num = int(re.findall(r'California-\w+-(\w+)',s)[0])#10 r 5 or 1 or 3
return direction1,direction2,num
dd = sorted(d,key=sorter)
for t in dd:
key = re.findall(r'([^\d]+)\d',t)[0]
group_data[key].append(t)
dt = OrderedDict(sorted(group_data.items(),key=lambda x: x[0]))
for it in dt:
print '\n'.join(dt[it])+'\n'
输出 -
C:\South-California-North-5.xlsx
C:\West-California-East-1.xlsx
C:\West-California-North-5.xlsx
C:\West-California-North-10.xlsx
C:\West-California-South-1.xlsx
C:\West-California-South-3.xlsx
C:\West-California-South-3.xlsx
答案 4 :(得分:1)
这是使用正则表达式和itertools.groupby的另一种方法:
import re
from itertools import groupby
filelist = ['C:\\West-California-North-10.xlsx',
'C:\\West-California-North-5.xlsx',
'C:\\West-California-East-1.xlsx',
'C:\\West-California-South-1.xlsx',
'C:\\South-California-North-5.xlsx',
'C:\\West-California-South-3.xlsx']
keyfunc = lambda x: re.match('(.*)-\d+\.xlsx', x).group(1)
keys = [ keyfunc(f) for f in filelist]
grouplist = [list(v) for k,v in groupby(sorted(filelist), key = keyfunc)][::-1]
for group in grouplist: print group, '\r\n'
输出:
['C:\\West-California-South-1.xlsx', 'C:\\West-California-South-3.xlsx']
['C:\\West-California-North-10.xlsx', 'C:\\West-California-North-5.xlsx']
['C:\\West-California-East-1.xlsx']
['C:\\South-California-North-5.xlsx']