我正在尝试阅读包含以下格式的一些列的Excel电子表格:
column1__
column1__AccountName
column1__SomeOtherFeature
column2__blabla
column2_SecondFeat
我已经将一行的值保存为元组列表,其中元组是变量x
中的(column_name,column_value)。
现在我想把它分组:
result = {
'column__1': [list of (k,v) tuples, which keys start with 'column__1'],
'column__2': [list of (k,v) tuples, which keys start with 'column__2']
}
但它没有给出预期的结果:
>>> from itertools import groupby
>>> x
[(u'My field one__AccountName', u'Lorem ipsum bla bla bla'),
(u'My field one__AccountNumber', u'1111111222255555'),
(u'My field two__Num', u'Num: 612312345'),
(u'My field two', u'asdasdafassda'),
(u'My field three__Beneficiary International Bank Account Number IBAN',
u'IE111111111111111111111'),
(u'My field one__BIC', u'BLEAHBLA1'),
(u'My field three__company name', u'Company XYZ'),
(u'My field two__BIC', u'ASDF333')]
>>> groups = groupby(x ,lambda (field, val): field.split('__')[0])
>>> grouped_fields = {key: list(val) for key, val in groups}
>>> grouped_fields
{u'My field one': [(u'My field one__BIC', u'BLEAHBLA1')],
u'My field three': [(u'My field three__company name', u'Company XYZ')],
u'My field two': [(u'My field two__BIC', u'ASDF333')]}
>>> x[0]
(u'My field one__AccountName', u'Lorem ipsum bla bla bla')
>>> x[1]
(u'My field one__AccountNumber', u'1111111222255555')
>>> x[0][0].split('__')[0] == x[1][0].split('__')[0]
True
然而,它似乎与另一个初始列表实例一起使用:
>>> y = [(u'x a b__2', 3), (u'x a b__', 1), (u'x a b__1', 2), (u'y a__1', 1), (u'y a__2', 2)]
>>> y
[(u'x__2', 3), (u'x__', 1), (u'x__1', 2), (u'y__1', 1), (u'y__2', 2)]
>>> groupes_y = groupby(y, lambda (k,v): k.split('__')[0])
>>> grouped_y = {key:list(val) for key, val in groupes_y}
>>> grouped_y
{u'x': [(u'x__2', 3), (u'x__', 1), (u'x__1', 2)],
u'y': [(u'y__1', 1), (u'y__2', 2)]}
不知道我做错了什么。
答案 0 :(得分:14)
作为the docs say,您应该将groupby
应用于已使用与key
本身相同的groupby
排序的列表:
key = lambda fv: fv[0].split('__')[0]
groups = groupby(sorted(x, key=key), key=key)
然后grouped_fields
是:
{u'My field one': [(u'My field one__AccountName', u'Lorem ipsum bla bla bla'),
(u'My field one__AccountNumber', u'1111111222255555'),
(u'My field one__BIC', u'BLEAHBLA1')],
u'My field three': [(u'My field three__Beneficiary International Bank Account Number IBAN',
u'IE111111111111111111111'),
(u'My field three__company name', u'Company XYZ')],
u'My field two': [(u'My field two__Num', u'Num: 612312345'),
(u'My field two', u'asdasdafassda'),
(u'My field two__BIC', u'ASDF333')]}
在第二个示例中,y
已经排序:
>>> y == sorted(y, key=key)
True