如何每小时获取数据

时间:2018-04-13 05:51:56

标签: python-3.x pandas

我有一个csv文件,其内容低于

2018-02-28 09:48:18.884392+05:30,,
2018-03-04 10:50:34.833787+05:30,,
2018-03-05 13:04:23.634013+05:30,,
2018-03-14 05:30:14.51227+05:30,28.84,27.58
2018-03-14 05:45:14.51227+05:30,12.54,17.47
2018-03-14 06:30:14.466206+05:30,25.1,23.58
2018-03-14 06:40:14.466206+05:30,11.2,14.44
2018-03-14 07:18:14.493826+05:30,21.96,21.54
2018-03-14 08:30:14.593973+05:30,20.48,26.86
2018-03-14 09:30:14.481426+05:30,22.92,15.3
2018-03-14 10:31:20.307558+05:30,7.46,0
2018-03-14 11:30:14.556135+05:30,21,16.5
2018-03-14 12:30:14.569207+05:30,14.14,19.14
2018-03-14 13:11:14.470991+05:30,8.84,6.98
2018-03-14 14:20:14.500747+05:30,8.94,4.5
2018-03-14 15:30:14.487262+05:30,5.92,3.86
2018-03-14 16:30:14.454833+05:30,6.58,10.88
2018-03-14 17:30:14.482084+05:30,7.32,3.36
2018-03-14 18:27:14.559508+05:30,5.52,3.6
2018-03-14 19:30:14.611782+05:30,2.74,3.14
2018-03-14 20:30:14.461808+05:30,4.34,3.2
2018-03-14 21:30:14.533157+05:30,3.8,3.22
2018-03-14 22:15:14.451542+05:30,4.44,3.06
2018-03-14 23:30:14.5494+05:30,3.04,2.92
2018-03-15 00:30:14.477848+05:30,4.68,7.82

这里第一列是日期,第二列是上传速度的条目,最后一列是下载速度。

我需要按小时显示两个特定日期2018-03-052018-03-14之间所有小时的数据,以便在特定时间内完成任意数量的条目(上传和下载速度),我可以得到那些的平均值并显示特定小时的平均值。

以下是我的代码。

import pandas as pd
import numpy as np


df = pd.read_csv("file.csv", header=None,
                 names=["date", "upload", "download"], parse_dates=["date"])
df.set_index("date", inplace=True)
df.fillna(0, inplace=True)
df.index = df.index.tz_localize('UTC').tz_convert('Asia/Kolkata')
# get data for the specified dates
df2 = df.loc['2018-03-05': '2018-03-14']
# add hourly frequency
print(df2.resample('1H').last())

以下是我获得的格式

                             upload  download
date                                       
2018-03-05 13:00:00+05:30    0.00      0.00
2018-03-05 14:00:00+05:30     NaN       NaN
2018-03-05 15:00:00+05:30     NaN       NaN
2018-03-05 16:00:00+05:30     NaN       NaN
2018-03-05 17:00:00+05:30     NaN       NaN
2018-03-05 18:00:00+05:30     NaN       NaN
2018-03-05 19:00:00+05:30     NaN       NaN
2018-03-05 20:00:00+05:30     NaN       NaN
2018-03-05 21:00:00+05:30     NaN       NaN
2018-03-05 22:00:00+05:30     NaN       NaN
2018-03-05 23:00:00+05:30     NaN       NaN
2018-03-06 00:00:00+05:30     NaN       NaN
2018-03-06 01:00:00+05:30     NaN       NaN
2018-03-06 02:00:00+05:30     NaN       NaN
2018-03-06 03:00:00+05:30     NaN       NaN
2018-03-06 04:00:00+05:30     NaN       NaN
2018-03-06 05:00:00+05:30     NaN       NaN
2018-03-06 06:00:00+05:30     NaN       NaN
2018-03-06 07:00:00+05:30     NaN       NaN
2018-03-06 08:00:00+05:30     NaN       NaN
2018-03-06 09:00:00+05:30     NaN       NaN
2018-03-06 10:00:00+05:30     NaN       NaN
2018-03-06 11:00:00+05:30     NaN       NaN
2018-03-06 12:00:00+05:30     NaN       NaN
2018-03-06 13:00:00+05:30     NaN       NaN
2018-03-06 14:00:00+05:30     NaN       NaN
2018-03-06 15:00:00+05:30     NaN       NaN
2018-03-06 16:00:00+05:30     NaN       NaN
2018-03-06 17:00:00+05:30     NaN       NaN
2018-03-06 18:00:00+05:30     NaN       NaN
...                           ...       ...
2018-03-13 18:00:00+05:30     NaN       NaN
2018-03-13 19:00:00+05:30     NaN       NaN
2018-03-13 20:00:00+05:30     NaN       NaN
2018-03-13 21:00:00+05:30     NaN       NaN
2018-03-13 22:00:00+05:30     NaN       NaN
2018-03-13 23:00:00+05:30     NaN       NaN
2018-03-14 00:00:00+05:30     NaN       NaN
2018-03-14 01:00:00+05:30     NaN       NaN
2018-03-14 02:00:00+05:30     NaN       NaN
2018-03-14 03:00:00+05:30     NaN       NaN
2018-03-14 04:00:00+05:30     NaN       NaN
2018-03-14 05:00:00+05:30   12.54     17.47
2018-03-14 06:00:00+05:30   11.20     14.44
2018-03-14 07:00:00+05:30   21.96     21.54
2018-03-14 08:00:00+05:30   20.48     26.86
2018-03-14 09:00:00+05:30   22.92     15.30
2018-03-14 10:00:00+05:30    7.46      0.00
2018-03-14 11:00:00+05:30   21.00     16.50
2018-03-14 12:00:00+05:30   14.14     19.14
2018-03-14 13:00:00+05:30    8.84      6.98
2018-03-14 14:00:00+05:30    8.94      4.50
2018-03-14 15:00:00+05:30    5.92      3.86
2018-03-14 16:00:00+05:30    6.58     10.88
2018-03-14 17:00:00+05:30    7.32      3.36
2018-03-14 18:00:00+05:30    5.52      3.60
2018-03-14 19:00:00+05:30    2.74      3.14
2018-03-14 20:00:00+05:30    4.34      3.20
2018-03-14 21:00:00+05:30    3.80      3.22
2018-03-14 22:00:00+05:30    4.44      3.06
2018-03-14 23:00:00+05:30    3.04      2.92

我确实按小时获取数据,但似乎有误。如果你仔细观察,对于日期2018-03-14,原始数据在5:30说,我的阅读是 28.84 27.58 分别在5:45,我的读数 12.54 17.47 。但是格式化数据显示,在5:00,读数 12.54 17.47 。这似乎是特定时刻的最新条目。对其他人来说也是如此。持续时间也是如此。

如何按小时显示两个指定日期之间所有小时的数据,其中包含特定小时的条目的平均值,如果没有条目,则为0?

1 个答案:

答案 0 :(得分:1)

您正在使用的last()提供最后一个值,而不是使用mean()

df.resample('H').mean().fillna(0)