在提到我的问题here时,我设法以下面给出的格式开发了一个列表结构:
(hours,color,type,text)
[('1', '2', 'a', '564'),
('1', '3', 'b', '570'),
('1', '4', 'c', '570'),
('5', '6', 'a', '560'),
('5', '7', 'b', '570'),
('5', '8', 'c', '580'),
('9', '10', 'a', '560'),
('9', '11', 'b', '570'),
('9', '12', 'c', '580')]
我已经提到here但是无法摆脱所有的1,5和9。
我想要什么 现在比较上面链接中给出的两个文件,我想创建一个如下所述的字典结构,然后单独比较字典的内容。
{'1':[2,3,4,a,b,c,564,570,570],
'5':[6,7,8,a,b,c,560,570,580]
'9':[10,11,12,a,b,c,560,570,580]}
由于两个文件中的数据都很大,我无法简单地使用循环进行比较。所以我决定为'location'元素的每个'hour'属性制作一个特定的字典,其中包括所有'feature'。我想从很长一段时间但不能开始。你能帮我吗?
为了防止查看的复杂性,我没有从上面的链接粘贴原始xml代码。
答案 0 :(得分:3)
您可以分两步创建字典。第一步是根据每个元组中的第一个值将元组组合在一起,然后在第二步中,将现在分组的项目展平为单个列表。它被编写为使用包含两个或更多项的元组,但确切的数字并不重要。
from collections import defaultdict
from itertools import chain
from pprint import pprint
tuples = [('1', '2', 'a', '564'),
('1', '3', 'b', '570'),
('1', '4', 'c', '570'),
('5', '6', 'a', '560'),
('5', '7', 'b', '570'),
('5', '8', 'c', '580'),
('9', '10', 'a', '560'),
('9', '11', 'b', '570'),
('9', '12', 'c', '580')]
d = defaultdict(list)
for tuple in tuples:
d[tuple[0]].append(tuple[1:])
for k,v in d.items():
d[k] = list(chain.from_iterable(zip(*v)))
pprint(d)
输出:
{'1': ['2', '3', '4', 'a', 'b', 'c', '564', '570', '570'],
'5': ['6', '7', '8', 'a', 'b', 'c', '560', '570', '580'],
'9': ['10', '11', '12', 'a', 'b', 'c', '560', '570', '580']}
答案 1 :(得分:1)
首先,循环遍历元组列表,从元组元素0到剩余元素构建字典。这将生成一个字典,其键是每个元组的元素0,其值是元组列表,每个元组代表一行具有相同的元素0.然后使用itertools.chain
和{{}逐列展平每个列表1}}。
Python 2.7解决方案:
itertools.izip
结果:
#!/usr/bin/env python
from __future__ import print_function
from itertools import chain, izip
data = [
('1', '2', 'a', '564'),
('1', '3', 'b', '570'),
('1', '4', 'c', '570'),
('5', '6', 'a', '560'),
('5', '7', 'b', '570'),
('5', '8', 'c', '580'),
('9', '10', 'a', '560'),
('9', '11', 'b', '570'),
('9', '12', 'c', '580')
]
# First, sort the values in rows into lists by their first element.
step1 = {}
for row in data:
step1.setdefault(row[0], [])
step1[row[0]].append(row[1:])
print("Step 1:")
print(repr(step1))
# Now to flatten a sequence-of-sequences column-wise,
# use list(itertools.chain(*itertools.izip(*seq)))
step2 = dict((k, list(chain(*izip(*v))))
for k, v in step1.iteritems())
print("Step 2:")
print(repr(step2))
答案 2 :(得分:1)
您可以使用itertools.groupby
对元组的第0个索引处的元素进行分组,然后循环遍历它们以创建字典。
示例 -
>>> from itertools import groupby
>>> l = [('1', '2', 'a', '564'),
... ('1', '3', 'b', '570'),
... ('1', '4', 'c', '570'),
... ('5', '6', 'a', '560'),
... ('5', '7', 'b', '570'),
... ('5', '8', 'c', '580'),
... ('9', '10', 'a', '560'),
... ('9', '11', 'b', '570'),
... ('9', '12', 'c', '580')]
>>> x = groupby(l, key = lambda x: x[0])
>>> d = {}
>>> for y, z in x:
... l1 = []
... l2 = []
... l3 = []
... for a in z:
... l1.append(a[1])
... l2.append(a[2])
... l3.append(a[3])
... l1.extend(l2)
... l1.extend(l3)
... d[y] = l1
>>> d
{'5': ['6', '7', '8', 'a', 'b', 'c', '560', '570', '580'], '9': ['10', '11', '12', 'a', 'b', 'c', '560', '570', '580'], '1': ['2', '3', '4', 'a', 'b', 'c', '564', '570', '570']}
答案 3 :(得分:1)
这是另一种方法
import itertools
data = [('1', '2', 'a', '564'),
('1', '3', 'b', '570'),
('1', '4', 'c', '570'),
('5', '6', 'a', '560'),
('5', '7', 'b', '570'),
('5', '8', 'c', '580'),
('9', '10', 'a', '560'),
('9', '11', 'b', '570'),
('9', '12', 'c', '580')]
ddata = {}
for hour, color, type, text in data:
lcontent = ddata.setdefault(hour, [[],[],[]])
lcontent[0].append(color)
lcontent[1].append(type)
lcontent[2].append(text)
ddata = {hour: list(itertools.chain.from_iterable(content)) for (hour, content) in ddata.iteritems()}
print ddata
在for循环之后,字典将采用以下形式,实际上格式可能比您请求的格式更有用:
{'1': [['2', '3', '4'], ['a', 'b', 'c'], ['564', '570', '570']], '9': [['10', '11', '12'], ['a', 'b', 'c'], ['560', '570', '580']], '5': [['6', '7', '8'], ['a', 'b', 'c'], ['560', '570', '580']]}
然后我应用字典理解来将列表条目展平为您指定的格式。
{'1': ['2', '3', '4', 'a', 'b', 'c', '564', '570', '570'], '9': ['10', '11', '12', 'a', 'b', 'c', '560', '570', '580'], '5': ['6', '7', '8', 'a', 'b', 'c', '560', '570', '580']}
Python 2.7 soluton