我有一个像这样的城市温度/天气的数据集历史记录:
{"city": "Barcelona", "date": "2016-10-16", "temperature": "13", "weather": "cloudy"}
{"city": "Berlin", "date": "2016-10-16", "temperature": "-1", "weather": "sunny"}
{"city": "Pekin", "date": "2016-10-16", "temperature": "19", "weather": "cloudy"}
{"city": "Paris", "date": "2016-10-16", "temperature": "-8", "weather": "sunny"}
我想创建一个前5名,按最佳平均温度排序。在这个结果我想知道天气类型(晴天阴天多雨)的天数
示例:
Rank - City - Average Temperature - Cloudy days - Sunny days - Rainy Days
1 - Barcelona - 20 - 93 - 298 - 29
如何在Python中执行此操作?
由于
马特
答案 0 :(得分:0)
我相信你需要pandas:
DataFrame
json
的第一个read_json
groupby
分列前5个城市,汇总mean
和nlargest
boolean indexing
count
s聚合,重塑unstack
reindex
index
s
正确排序insert
map
添加新列
range
import pandas as pd
import pandas as pd
df = pd.read_json('a.json', lines=True)
print (df)
city date temperature weather
0 Barcelona 2016-10-16 13 cloudy
1 Berlin 2016-10-16 -1 sunny
2 Pekin 2016-10-16 19 cloudy
3 Paris 2016-10-16 -8 sunny
s = df.groupby(['city'])['temperature'].mean().nlargest(5)
print (s)
city
Pekin 19
Barcelona 13
Berlin -1
Paris -8
Name: temperature, dtype: int64
df2 = (df[df['city'].isin(s.index)]
.groupby(['city', 'weather'])['temperature']
.size()
.unstack(fill_value=0)
.add_suffix(' days')
.reindex(s.index)
.reset_index()
.rename_axis(None, axis=1))
df2.insert(1, 'temp avg', df2['city'].map(s))
df2.insert(0, 'rank', range(1, len(df2) + 1))
print (df2)
rank city temp avg cloudy days sunny days
0 1 Pekin 19 1 0
1 2 Barcelona 13 1 0
2 3 Berlin -1 0 1
3 4 Paris -8 0 1