我正在尝试将2D数组拆分为特定格式,但无法弄清楚最后一步。我的数据样本结构如下:
# Original Data
fileListCode = [['Seq3.xls', 'B08524_057'],
['Seq3.xls', 'B08524_053'],
['Seq3.xls', 'B08524_054'],
['Seq98.xls', 'B25034_001'],
['Seq98.xls', 'D25034_002'],
['Seq98.xls', 'B25034_003']]
我正在尝试拆分它,看起来像这样:
# split into [['Seq3.xls', {'B08524_057':1,'B08524_053':2, 'B08524_054':3},
# ['Seq98.xls',{'B25034_001':1,'D25034_002':2, 'B25034_003':3}]
字典键1,2,3基于条目的原始位置,从文件名第一次出现开始。为此,我首先创建了一个数组来获取所有唯一的文件名(任何.xls
都是文件名)
tmpFileList = []
tmpCodeList = []
arrayListDict = []
# store unique filelist in a tempprary array:
for i in range( len(fileListCode)):
if fileListCode[i][0] not in tmpFileList:
tmpFileList.append( fileListCode[i][0] )
然而,我正在努力进行下一步。我无法找出一个很好的方法来提取代号(例如B08524_052
),并将它们转换为带有基于其位置的索引的字典。
# make array to store filelist, and codes with dictionary values
for i in range( len(tmpFileList)):
arrayListDict.append([tmpFileList[i], {}])
此代码只生成[['Seq3.xls', {}], ['Seq98.xls', {}]]
;我不确定是否应首先生成结构,然后尝试添加代码和字典值,或者是否有更好的方法。
-
编辑:我只是通过更改fileListCode
答案 0 :(得分:4)
使用,itertools.groupby这个过程会更简单:
>>> key = operator.itemgetter(0)
>>> grouped = itertools.groupby(sorted(fileListCode, key=key), key=key)
>>> [(i, {k[1]: n for n, k in enumerate(j, 1)}) for i, j in grouped]
[('Seq3.xls', {'B08524_052': 1, 'B08524_053': 2, 'B08524_054': 3}),
('Seq98.xls', {'B25034_001': 1, 'B25034_002': 2, 'B25034_003': 3})]
对于旧的Python版本:
>>> [(i, dict((k[1], n) for n, k in enumerate(j, 1))) for i, j in grouped]
[('Seq3.xls', {'B08524_052': 1, 'B08524_053': 2, 'B08524_054': 3}),
('Seq98.xls', {'B25034_001': 1, 'B25034_002': 2, 'B25034_003': 3})]
但我认为使用dict会更好:
>>> {i: {k[1]: n for n, k in enumerate(j, 1)} for i, j in grouped}
{'Seq3.xls': {'B08524_052': 1, 'B08524_053': 2, 'B08524_054': 3},
'Seq98.xls': {'B25034_001': 1, 'B25034_002': 2, 'B25034_003': 3}}
答案 1 :(得分:2)
你混淆了名单和dictonaries。
做更像这样的事情会更有意义:
file_list_code = [['Seq3.xls', 'B08524_052'],
['Seq3.xls', 'B08524_053'],
['Seq3.xls', 'B08524_054'],
['Seq98.xls', 'B25034_001'],
['Seq98.xls', 'B25034_002'],
['Seq98.xls', 'B25034_003']]
file_codes = {}
for name, code in file_list_code:
if name not in file_codes:
file_codes[name] = []
file_codes[name].append(code)
这会产生:
{'Seq3.xls': ['B08524_052', 'B08524_053', 'B08524_054'],
'Seq98.xls': ['B25034_001', 'B25034_002', 'B25034_003']}
这可以通过使用defaultdict进一步简化。对于这么简单的事情来说,这可能有点过分,但知道这件事很好。这是一个例子:
import collections
file_list_code = [['Seq3.xls', 'B08524_052'],
['Seq3.xls', 'B08524_053'],
['Seq3.xls', 'B08524_054'],
['Seq98.xls', 'B25034_001'],
['Seq98.xls', 'B25034_002'],
['Seq98.xls', 'B25034_003']]
file_codes = collections.defaultdict(list)
for name, code in file_list_code:
file_codes[name].append(code)
答案 2 :(得分:1)
fileListCode = [['Seq3.xls', 'B08524_052'],
['Seq3.xls', 'B08524_053'],
['Seq3.xls', 'B08524_054'],
['Seq98.xls', 'B25034_001'],
['Seq98.xls', 'B25034_002'],
['Seq98.xls', 'B25034_003']]
dico = {}
li = []
for a,b in fileListCode:
if a in dico:
li[dico[a]][1][b] = len( li[dico[a]][1] ) + 1
else:
dico[a] = len(li)
li.append([a,{b:1}])
print '\n'.join(map(str,li))