我有一个csv文件,其内容低于
2018-02-28 09:48:18.884392+05:30,,
2018-03-04 10:50:34.833787+05:30,,
2018-03-05 13:04:23.634013+05:30,,
2018-03-14 05:30:14.51227+05:30,28.84,27.58
2018-03-14 05:45:14.51227+05:30,12.54,17.47
2018-03-14 06:30:14.466206+05:30,25.1,23.58
2018-03-14 06:40:14.466206+05:30,11.2,14.44
2018-03-14 07:18:14.493826+05:30,21.96,21.54
2018-03-14 08:30:14.593973+05:30,20.48,26.86
2018-03-14 09:30:14.481426+05:30,22.92,15.3
2018-03-14 10:31:20.307558+05:30,7.46,0
2018-03-14 11:30:14.556135+05:30,21,16.5
2018-03-14 12:30:14.569207+05:30,14.14,19.14
2018-03-14 13:11:14.470991+05:30,8.84,6.98
2018-03-14 14:20:14.500747+05:30,8.94,4.5
2018-03-14 15:30:14.487262+05:30,5.92,3.86
2018-03-14 16:30:14.454833+05:30,6.58,10.88
2018-03-14 17:30:14.482084+05:30,7.32,3.36
2018-03-14 18:27:14.559508+05:30,5.52,3.6
2018-03-14 19:30:14.611782+05:30,2.74,3.14
2018-03-14 20:30:14.461808+05:30,4.34,3.2
2018-03-14 21:30:14.533157+05:30,3.8,3.22
2018-03-14 22:15:14.451542+05:30,4.44,3.06
2018-03-14 23:30:14.5494+05:30,3.04,2.92
2018-03-15 00:30:14.477848+05:30,4.68,7.82
这里第一列是日期,第二列是上传速度的条目,最后一列是下载速度。
我需要按小时显示两个特定日期2018-03-05
和2018-03-14
之间所有小时的数据,以便在特定时间内完成任意数量的条目(上传和下载速度),我可以得到那些的平均值并显示特定小时的平均值。
以下是我的代码。
import pandas as pd
import numpy as np
df = pd.read_csv("file.csv", header=None,
names=["date", "upload", "download"], parse_dates=["date"])
df.set_index("date", inplace=True)
df.fillna(0, inplace=True)
df.index = df.index.tz_localize('UTC').tz_convert('Asia/Kolkata')
# get data for the specified dates
df2 = df.loc['2018-03-05': '2018-03-14']
# add hourly frequency
print(df2.resample('1H').last())
以下是我获得的格式
upload download
date
2018-03-05 13:00:00+05:30 0.00 0.00
2018-03-05 14:00:00+05:30 NaN NaN
2018-03-05 15:00:00+05:30 NaN NaN
2018-03-05 16:00:00+05:30 NaN NaN
2018-03-05 17:00:00+05:30 NaN NaN
2018-03-05 18:00:00+05:30 NaN NaN
2018-03-05 19:00:00+05:30 NaN NaN
2018-03-05 20:00:00+05:30 NaN NaN
2018-03-05 21:00:00+05:30 NaN NaN
2018-03-05 22:00:00+05:30 NaN NaN
2018-03-05 23:00:00+05:30 NaN NaN
2018-03-06 00:00:00+05:30 NaN NaN
2018-03-06 01:00:00+05:30 NaN NaN
2018-03-06 02:00:00+05:30 NaN NaN
2018-03-06 03:00:00+05:30 NaN NaN
2018-03-06 04:00:00+05:30 NaN NaN
2018-03-06 05:00:00+05:30 NaN NaN
2018-03-06 06:00:00+05:30 NaN NaN
2018-03-06 07:00:00+05:30 NaN NaN
2018-03-06 08:00:00+05:30 NaN NaN
2018-03-06 09:00:00+05:30 NaN NaN
2018-03-06 10:00:00+05:30 NaN NaN
2018-03-06 11:00:00+05:30 NaN NaN
2018-03-06 12:00:00+05:30 NaN NaN
2018-03-06 13:00:00+05:30 NaN NaN
2018-03-06 14:00:00+05:30 NaN NaN
2018-03-06 15:00:00+05:30 NaN NaN
2018-03-06 16:00:00+05:30 NaN NaN
2018-03-06 17:00:00+05:30 NaN NaN
2018-03-06 18:00:00+05:30 NaN NaN
... ... ...
2018-03-13 18:00:00+05:30 NaN NaN
2018-03-13 19:00:00+05:30 NaN NaN
2018-03-13 20:00:00+05:30 NaN NaN
2018-03-13 21:00:00+05:30 NaN NaN
2018-03-13 22:00:00+05:30 NaN NaN
2018-03-13 23:00:00+05:30 NaN NaN
2018-03-14 00:00:00+05:30 NaN NaN
2018-03-14 01:00:00+05:30 NaN NaN
2018-03-14 02:00:00+05:30 NaN NaN
2018-03-14 03:00:00+05:30 NaN NaN
2018-03-14 04:00:00+05:30 NaN NaN
2018-03-14 05:00:00+05:30 12.54 17.47
2018-03-14 06:00:00+05:30 11.20 14.44
2018-03-14 07:00:00+05:30 21.96 21.54
2018-03-14 08:00:00+05:30 20.48 26.86
2018-03-14 09:00:00+05:30 22.92 15.30
2018-03-14 10:00:00+05:30 7.46 0.00
2018-03-14 11:00:00+05:30 21.00 16.50
2018-03-14 12:00:00+05:30 14.14 19.14
2018-03-14 13:00:00+05:30 8.84 6.98
2018-03-14 14:00:00+05:30 8.94 4.50
2018-03-14 15:00:00+05:30 5.92 3.86
2018-03-14 16:00:00+05:30 6.58 10.88
2018-03-14 17:00:00+05:30 7.32 3.36
2018-03-14 18:00:00+05:30 5.52 3.60
2018-03-14 19:00:00+05:30 2.74 3.14
2018-03-14 20:00:00+05:30 4.34 3.20
2018-03-14 21:00:00+05:30 3.80 3.22
2018-03-14 22:00:00+05:30 4.44 3.06
2018-03-14 23:00:00+05:30 3.04 2.92
我确实按小时获取数据,但似乎有误。如果你仔细观察,对于日期2018-03-14
,原始数据在5:30
说,我的阅读是
28.84 和 27.58 分别在5:45
,我的读数 12.54 且 17.47 。但是格式化数据显示,在5:00
,读数 12.54 且 17.47 。这似乎是特定时刻的最新条目。对其他人来说也是如此。持续时间也是如此。
如何按小时显示两个指定日期之间所有小时的数据,其中包含特定小时的条目的平均值,如果没有条目,则为0?
答案 0 :(得分:1)
您正在使用的last()
提供最后一个值,而不是使用mean()
:
df.resample('H').mean().fillna(0)