处理完数据后,我有一批格式为
的行(u'378491520468_sale', {'price': 2100000, 'built': 3815})
(u'378491119.1537520468_sale', {'price': 2100000, 'built': 3815})
(u'1306084076.1535728358_rent', {'price': 1400, 'built': 1109})
(u'1303342766.1548320090_sale', {'price': 550, 'built': 1200})
(u'1890530682.1515660872_sale', {'price': 130000, 'built': 759})
(u'8212134.1548317851_rent', {'price': 2900, 'built': 1220})
(u'1170655463.1513653914_sale', {'price': 430000, 'built': 1142})
(u'58676746.1548308550_sale', {'price': 1700000, 'built': 3000})
(u'1162578480.1474216313_sale', {'price': 10000000, 'built': 3})
(u'1860145003.1546594155_rent', {'price': 4200, 'built': 839})
(u'1640943061.1489124089_sale', {'price': 710000, 'built': 1600})
(u'1008351255.1547539066_rent', {'price': 15000, 'built': 8400})
(u'903442891.1547795833_sale', {'price': 148000, 'built': 786})
其中集合中的第一个元素是唯一ID。
我了解基本的CombineFn类,该类能够对(键,值)进行分组并在固定窗口中计算最小值,最大值和平均值。但是以字典作为值,我需要一些指导来以以下格式计算它们:
("the_unique_id", {
"price":{
"min": 0,
"max": 0,
"average": 0
},
"built": {
"min": 0,
"max": 0,
"average": 0
}
), ...
答案 0 :(得分:0)
如果您可以将数据放入下面的表格中,这是一种计算合计值的方法:
import pandas as pd
data = {'ID': [u'378491520468_sale', u'378491119.1537520468_sale', u'1306084076.1535728358_rent'],
'price': [2100000, 2100000, 1400],
'built': [3815, 3815, 1109]}
df = pd.DataFrame(data)
aggregates = {
'price': ['min', 'max', 'mean'],
'built': ['min', 'max', 'mean'],
}
df = df.groupby('ID').agg(aggregates)
res = []
for i in range(len(df)):
row = df.iloc[i]
res.append((row.name,
{'price': {'min': row['price']['min'],
'max': row['price']['max'],
'average': row['price']['mean']},
'built': {'min': row['built']['min'],
'max': row['built']['max'],
'average': row['built']['mean']}}))
print(res)