每月温度条形图

时间:2017-08-07 15:22:31

标签: python pandas dataframe bar-chart

我在pandas数据框中有下表:

0    2017/06/04 00:00:00  31.900000  26.700000
1    2017/06/04 00:30:00  31.600000  25.000000
2    2017/06/04 01:00:00  31.400000  24.300000
3    2017/06/04 01:30:00  31.200000  24.100000
4    2017/06/04 02:00:00  30.800000  26.000000
5    2017/06/04 02:30:00  30.500000  27.000000
6    2017/06/04 03:00:00  30.300000  27.300000
7    2017/06/04 03:30:00  30.100000  27.600000
8    2017/06/04 04:00:00  29.900000  27.800000
9    2017/06/04 04:30:00  29.600000  27.900000
10   2017/06/04 05:00:00  29.200000  27.900000
11   2017/06/04 05:30:00  28.900000  27.900000
12   2017/06/04 06:00:00  30.800000  27.900000
13   2017/06/04 06:30:00  35.700000  27.900000
14   2017/06/04 07:00:00  38.300000  26.100000
15   2017/06/04 07:30:00  37.500000  25.100000

使用以下查询从excel文件中提取表格:

import numpy as np
df = pd.read_excel(r\temperature.xlsx")

我已经对它们进行了分析并根据数据对它们进行了分类,并尝试根据具体范围的温度对它们进行分组,但我不知道如何创建具有所需范围的这些组,例如< = 5C,10到20C, > = 30C)。

1 个答案:

答案 0 :(得分:0)

看起来Pandas中的use localStorage and .run()可能适合您的用例。将来,为我们提供重现数据框的代码要比转储数据框好得多。类似的东西:

df = pd.DataFrame([
["2017/06/04 00:00:00",31.900000,26.700000],
["2017/06/04 00:30:00",31.600000,25.000000],
["2017/06/04 01:00:00",31.400000,24.300000],
["2017/06/04 01:30:00",31.200000,24.100000],
["2017/06/04 02:00:00",30.800000,26.000000],
["2017/06/04 02:30:00",30.500000,27.000000],
["2017/06/04 03:00:00",30.300000,27.300000],
["2017/06/04 03:30:00",30.100000,27.600000],
["2017/06/04 04:00:00",29.900000,27.800000],
["2017/06/04 04:30:00",29.600000,27.900000],
["2017/06/04 05:00:00",29.200000,27.900000],
["2017/06/04 05:30:00",28.900000,27.900000],
["2017/06/04 06:00:00",30.800000,27.900000],
["2017/06/04 06:30:00",35.700000,27.900000],
["2017/06/04 07:00:00",38.300000,26.100000],
["2017/06/04 07:30:00",37.500000,25.100000]],
columns = ['time','t_high','t_low'])

如果您想在一个温度柱上进行分析,请给它起一个名称并定义您关心的温度边界:

temps = df['t_low']
bins = [23,25,27,30]

现在,您已准备好应用熊猫' cut()函数,按您定义的存储区对数据进行分组,并查看一些统计信息。

temps.groupby(pd.cut(temps,bins)).describe()

          count       mean       std   min    25%   50%    75%   max
t_low                                                               
(23, 25]    3.0  24.466667  0.472582  24.1  24.20  24.3  24.65  25.0
(25, 27]    5.0  26.180000  0.732803  25.1  26.00  26.1  26.70  27.0
(27, 30]    8.0  27.775000  0.218763  27.3  27.75  27.9  27.90  27.9