我有一个列表列表,需要根据列表中的字符串进行合并以适合结构。在这种情况下,它将是' date'并且' id'试图适应'领域'结构体。
字段:['date', 'id', 'impressions', 'clicks']
在:
[('2015-11-01', 'id123', 'impressions', '8'), ('2015-11-01', 'id123',
'clicks', '4'), ('2015-11-01', 'id456', 'impressions', '14'),
('2015-11-01', 'id456', 'clicks', '9')]
后:
[('2015-11-01', 'id123', '8', '4'), ('2015-11-01', 'id456', '14', '9')]
答案 0 :(得分:1)
>>> L = [('2015-11-01', 'id123', 'impressions', '8'), ('2015-11-01', 'id123',
... 'clicks', '4'), ('2015-11-01', 'id456', 'impressions', '14'),
... ('2015-11-01', 'id456', 'clicks', '9')]
>>> from collections import defaultdict
>>> D = defaultdict(list)
>>> for a, b, c, d in L:
... D[a, b].append(d)
...
>>> [k + tuple(D[k]) for k in D]
[('2015-11-01', 'id456', '14', '9'), ('2015-11-01', 'id123', '8', '4')]
如果展示次数和点击次数不一致
>>> L = [('2015-11-01', 'id123', 'impressions', '8'), ('2015-11-01', 'id123', 'clicks', '4'), ('2015-11-01', 'id456', 'clicks', '9'), ('2015-11-01', 'id456', 'impressions', '14')]
>>> from collections import defaultdict
>>> D = defaultdict(lambda: [None, None])
>>> for a, b, c, d in L:
... D[a, b][c == 'clicks'] = d
...
>>> [k + tuple(D[k]) for k in D]
[('2015-11-01', 'id456', '14', '9'), ('2015-11-01', 'id123', '8', '4')]
答案 1 :(得分:0)
itertools.groupby
在这里可以很好地工作,特别是如果真实数据与样本数据匹配(已经排序,因此日期/ id对都是相邻的):
import itertools
from operator import itemgetter
outlist = []
for (date, ID), grp in itertools.groupby(inlist, key=itemgetter(0, 1)):
grp = list(grp) # Iterating twice, so convert to sequence
impressioncnt = sum(int(cnt) for _, _, typ, cnt in grp if typ == 'impressions')
clickcnt = sum(int(cnt) for _, _, typ, cnt in grp if typ == 'clicks')
outlist.append((date, ID, str(impressioncnt), str(clickcnt)))
如果数据尚未按date
和ID
排序,则您需要先对inlist
inlist.sort(key=itemgetter(0, 1))
进行排序。如果list
很大,那可能会很昂贵,在这种情况下,您可能会考虑使用collections.defaultdict
代替:
import collections
dateID_cnts = collections.defaultdict({'impressions': 0, 'clicks': 0}.copy)
for date, ID, typ, cnt in inlist:
dateID_cnts[date, ID][typ] += int(cnt)
# Convert from defaultdict to desired list of tuples
outlist = [(date, ID, str(v['impressions']), str(v['counts'])) for (date, ID), v in dateID_cnts.items()]
答案 2 :(得分:0)
另一种方式:
data=[('2015-11-01', 'id123', 'impressions', '8'),
('2015-11-01', 'id123','clicks', '4'),
('2015-11-01', 'id456', 'impressions', '14'),
('2015-11-01', 'id456', 'clicks', '9')]
ddict={}
for t in data:
key=(t[0], t[1])
ddict.setdefault(key, []).append(t[2:])
LoT=[]
for d, id in ddict:
impressions, clicks=max(ddict[(d, id)])[1], min(ddict[(d, id)])[1]
LoT.append(tuple([d, id, impressions, clicks]))
>>> LoT
[('2015-11-01', 'id123', '8', '4'), ('2015-11-01', 'id456', '14', '9')]
如果您认为impressions
和clicks
已经有序,则可以删除max
和min
并将该行替换为:
impressions, clicks=ddict[(d, id)][0][1], ddict[(d, id)][1][1]