我有一个列表列表,每个列表都有以下项目:
site, count, time
sample data: site1, 15, 20
我正在尝试找出解决此问题的最佳方法。我想累加每个站点的数量和时间。
我想遍历每个列表时将其转换为字典,但是我不确定这会给我带来什么。
for site, count, time in lists:
#create a dictionary, then what?
最终结果是,我想要一个列表或字典(我可以使用某种数据结构),并将每个站点的计数和时间加到每个站点的“总计”列表中。
例如:
site, total_count, total_time
sample data:
site1, 50, 100 #all data for site1 added up
site2, 40, 300 #all data for site2 added up
不是在寻找编码答案,而只是寻找最佳答案的正确方法和正确的方向。
答案 0 :(得分:1)
您说过某种数据结构,所以也许从您拥有的列表中构造一个DataFrame
,然后使用groupby
后跟sum
来获得所需的内容。
示例:
import pandas as pd
data = [['site1',15,20],['site1',35,80],['site2',15,20]]
df = pd.DataFrame(data,columns=['site','time','count'])
print(df.groupby('site').sum())
输出
time count
site
site1 50 100
site2 15 20
或者:
data = [['site1',15,20],['site1',35,80],['site2',15,20]]
data_d = {}
for rec in data:
if rec[0] in data_d:
data_d[rec[0]][0] += rec[1]
data_d[rec[0]][1] += rec[2]
else:
data_d[rec[0]] = rec[1:]
答案 1 :(得分:0)
您可以遍历列表列表(最好将其改为元组列表),然后将计数和时间添加到输出字典的总计数和总时间中,并以site为键:
lists = [
('site1', 15, 20),
('site2', 10, 30),
('site1', 5, 25),
('site1', 30, 55),
('site2', 30, 270)
]
result = {}
for site, count, time in lists:
total_count, total_time = result.get(site, (0, 0))
result[site] = (total_count + count, total_time + time)
result
变为:
{'site1': (50, 100), 'site2': (40, 300)}
答案 2 :(得分:0)
这个问题仍然有点模棱两可,但是例如,您可以构建一个使用词典字典的类。通过添加数据,它可以以迭代方式聚合数据:
>>> class SiteAggregator:
... def __init__(self):
... self.sites = {}
... def __call__(self, data):
... site_name, site_counts, site_time = data
... if site_name not in self.sites:
... self.sites[site_name] = {'counts':0, 'time':0}
... self.sites[site_name]['counts'] += site_counts
... self.sites[site_name]['time'] += site_time
...
>>> site_agg = SiteAggregator()
>>> site_agg(['a', 20, 22])
>>> site_agg(['b', 10, 13])
>>> site_agg.sites['a']
{'counts': 20, 'time': 22}
>>> site_agg(['a', 10, 12])
>>> site_agg.sites['a']
{'counts': 30, 'time': 34}
>>> sites = [['a', 20, 10], ['b', 30, 15], ['c', 18, 22], ['a', 15, 22], ['b', 10, 2]]
>>> for site in sites:
... site_agg(site)
...
>>> site_agg.sites['a']
{'counts': 65, 'time': 66}
答案 3 :(得分:0)
我认为,以下是解决该问题的正确方法。
import json # For pretty priting dictionary
# List of lists where each sub list contains site, count, time in order
data_list = [
["mysite1.com", 11, 88],
["mysite1.com", 7, 6],
["google.com", 6, 23],
["mysite2.com", 9, 12],
["google.com", 4, 7],
['mysite1.com', 9, 12],
['mysite2.com', 13, 4]
];
d = {}
for l in data_list:
site, count, time = l # Unpacking
if site in d:
# APPEND/UPDATE VALUES
d[site]["count"].append(count)
d[site]["time"].append(time)
else:
# CREATE NEW KEYS WITH DATA
d[site] = {
"count": [count],
"time": [time]
}
d[site]["total_count"] = sum(d[site]["count"])
d[site]["total_time"] = sum(d[site]["time"])
print(json.dumps(d, indent=4))
# {
# "mysite1.com": {
# "count": [
# 11,
# 7,
# 9
# ],
# "time": [
# 88,
# 6,
# 12
# ],
# "total_count": 27,
# "total_time": 106
# },
# "google.com": {
# "count": [
# 6,
# 4
# ],
# "time": [
# 23,
# 7
# ],
# "total_count": 10,
# "total_time": 30
# },
# "mysite2.com": {
# "count": [
# 9,
# 13
# ],
# "time": [
# 12,
# 4
# ],
# "total_count": 22,
# "total_time": 16
# }
# }
答案 4 :(得分:0)
这是一种骇人听闻的方法(受电气工程学启发):使用其值为复数的计数器;实部是时间,虚部是计数。 ;-)