Python订单数据集天气

时间:2017-12-17 10:29:05

标签: python dataset ranking

我有一个像这样的城市温度/天气的数据集历史记录:

{"city": "Barcelona", "date": "2016-10-16", "temperature": "13", "weather": "cloudy"}
{"city": "Berlin", "date": "2016-10-16", "temperature": "-1", "weather": "sunny"}
{"city": "Pekin", "date": "2016-10-16", "temperature": "19", "weather": "cloudy"}
{"city": "Paris", "date": "2016-10-16", "temperature": "-8", "weather": "sunny"}

我想创建一个前5名,按最佳平均温度排序。在这个结果我想知道天气类型(晴天阴天多雨)的天数

示例:

Rank - City -      Average Temperature - Cloudy days - Sunny days - Rainy Days
1 -    Barcelona -           20 -           93 -        298 -       29 

如何在Python中执行此操作?

由于

马特

1 个答案:

答案 0 :(得分:0)

我相信你需要pandas

import pandas as pd

import pandas as pd

df = pd.read_json('a.json', lines=True)
print (df)
        city       date  temperature weather
0  Barcelona 2016-10-16           13  cloudy
1     Berlin 2016-10-16           -1   sunny
2      Pekin 2016-10-16           19  cloudy
3      Paris 2016-10-16           -8   sunny

s = df.groupby(['city'])['temperature'].mean().nlargest(5)
print (s)
city
Pekin        19
Barcelona    13
Berlin       -1
Paris        -8
Name: temperature, dtype: int64
df2 = (df[df['city'].isin(s.index)]
               .groupby(['city', 'weather'])['temperature']
               .size()
               .unstack(fill_value=0)
               .add_suffix(' days')
               .reindex(s.index)
               .reset_index()
               .rename_axis(None, axis=1))

df2.insert(1, 'temp avg', df2['city'].map(s))
df2.insert(0, 'rank', range(1, len(df2) + 1))
print (df2)
   rank       city  temp avg  cloudy days  sunny days
0     1      Pekin        19            1           0
1     2  Barcelona        13            1           0
2     3     Berlin        -1            0           1
3     4      Paris        -8            0           1