如何挑选熊猫全天24小时的日期?

时间:2018-04-25 18:28:30

标签: python-3.x pandas

我有一个csv文件,如下所示

13,2018-02-11 11:40:13.553385+00:00,CDA,10.4.100.1,KDE,2.0,3.0,4.49,0.0,,,,,,,,
14,2018-02-11 12:00:13.586360+00:00,CDA,10.4.100.1,KDE,2.0,3.0,12.16,0.0,,,,,,,,
15,2018-02-11 12:00:28.452263+00:00,CKD,100.100.100.1,LMF,0.0,19.0,0.12,0.0,,,,,,,,
16,2018-02-11 12:00:33.123310+00:00,CKD,100.100.100.1,LMF,0.0,19.0,1.28,0.0,,,,,,,,
17,2018-02-11 13:00:37.793817+00:00,CVC,100.100.100.1,KDE,0.0,19.0,2.5,0.0,,,,,,,,
18,2018-02-11 13:05:42.461774+00:00,CDA,100.100.100.1,KDE,0.0,19.0,2.79,0.0,,,,,,,,

19,2018-02-12 00:20:33.553385+00:00,CVC,10.4.100.1,KDA,2.0,3.0,4.49,0.0,,,,,,,,
20,2018-02-12 00:30:13.586360+00:00,CVC,10.4.100.1,KDA,2.0,3.0,12.16,0.0,,,,,,,,
21,2018-02-12 01:10:28.452263+00:00,CKD,100.100.100.1,LMF,0.0,19.0,0.12,0.0,,,,,,,,
22,2018-02-12 02:00:33.123310+00:00,KDE,100.100.100.1,CKD,0.0,19.0,1.28,0.0,,,,,,,,
23,2018-02-12 03:00:31.793817+00:00,LMF,100.100.100.1,CDA,0.0,19.0,2.5,0.0,,,,,,,,
24,2018-02-12 03:05:22.461774+00:00,LMF,100.100.100.1,CDA,0.0,19.0,2.79,0.0,,,,,,,,
...........................................................
..........................................................
44,2018-02-12 23:05:22.461774+00:00,CVC,100.100.100.1,KDE,0.0,19.0,2.79,0.0,,,,,,,,
44,2018-02-12 23:55:22.461774+00:00,CVC,100.100.100.1,KDE,0.0,19.0,2.79,0.0,,,,,,,,

如果您注意到,对于日期2018-02-11,数据条目仅在1114之间。但是对于日期2018-02-12,我们的所有日期都正确从0023

如何在大熊猫的情况下检查日期是否包含24小时内的所有2018-02-12小时?

我知道如何添加额外的时间让日期为24小时,我会做这样的事情

df = pd.read_csv("metrics_copy.csv", parse_dates=["date"])
df.set_index("date", inplace=True)

a = df2.resample('H')["cpu"].mean().dropna()
# create all posible hours by first min and max value floor to 0 and 23 hour
rng = pd.date_range(a.index.min().floor('d'),
      a.index.max().floor('d') + pd.Timedelta(23, unit='h'), freq='H')
# get all missing index values - missing hours
diff_idx = rng.difference(a.index)

# join new DataFrame with missing values to original, last sorting for correct ordering
df2 = pd.concat([df2, pd.DataFrame(index=diff_idx, columns=df2.columns)]).sort_index()

但我需要检查一个日期是否有24小时。我该怎么做?

1 个答案:

答案 0 :(得分:1)

使用dt

df["date"].dt.hour.groupby(df["date"].dt.date).unique().apply(len).reset_index(name="count").query("count==24")