Question

我是Python的初学者。我正在处理一个包含数年数据的日期集。这是数据集的样本。

此处，小时（LT）表示时间，而DN（LT）表示一年中的天数。

我已经尝试在Python 3.0大熊猫Anaconda中使用此数据集。我的最终目标是找出每日，每周，每月和每年的平均值，因此我更喜欢将其转换为pandas.DatetimeIndex（通过重新采样，我想很容易！）

我提供了到目前为止所写的代码。

import pandas as pd
import numpy as np

df = pd.read_csv('test_file.txt', sep=' ', delimiter=' ')

#convert the year, month, day int columns into datetime format 
year_month = pd.to_datetime(10000 * df.Year +100 * df.Month +df.Day, format='%Y%m%d')

#convert Year, Month, Day, Hour(LT) into DayTimeHour format
year_hour_convert = pd.DataFrame({
                                        'Day': np.array(year_month, dtype=np.datetime64), 
                                        'Hour': np.array(df['Hour(LT)'], dtype=np.int64)
                                })

#merge into "year-month-day-hour" format
year_hour = pd.to_datetime(year_hour_convert.Day) + pd.to_timedelta(year_hour_convert.Hour, unit='h')

#Define a new column for Time Series
df['DateTime'] = year_hour

#Drop unnecessary columns
df = df.drop(['Year', 'Month', 'Day', 'Hour(LT)', 'DN(LT)'], axis=1)

#Set YYYYMMDD HHMMSS as index
df = df.set_index('DateTime')

#Choose the data for 9 a.m. to 3 p.m.
df = df.between_time('09:00:00', '15:00:00')

我已将数据集转换为这种格式。我最终放弃了'Year', 'Month', 'Day', 'Hour(LT)', 'DN(LT)'列。我提供了这种格式的图片。enter image description here

现在，如果某天有一定数量的数据可用，我想过滤数据。例如，如果2016年1月4日以及2016年1月4日的数据量大于4，我将采用当天的数据。否则，我将删除当天的数据。

我如何在熊猫上做到这一点？

如果pandas中存在特定日期的特定数量的数据，则选择数据。DatetimeIndex

0 个答案: