我有下面的函数读取csv文件并将数据分块为Days,Weeks和Months。
我的问题是这些日子从正常的上午12点开始24小时。但是,数据应该从下午4点到下午4点(进入第二天)进行分块。
此外,自星期一数据开始以来,dt.week将在周一开始一周。我想默认为星期日下午4点 - 周五下午4点为一周。我可以通过索引天真地做到这一点,我想知道是否有更优雅的解决方案。
目标:我想制作一个数据框列表,这些数据框在几天,几周和几个月内将这5分钟的数据(请参阅df.head())分块。好几天,我需要一天开始下午4点,第二天一直持续到下午4点。几个星期以来,我希望这个星期在周日开始,问题是因为数据从星期一开始,它想要从星期一开始分周。
def read_in_files(file_names):
"""
1. Read the csv files to memory into a pandas dataframe with pd.read_csv
2. separate the df into year, month, and date objects
3. It also chunks the data by single day
"""
import os
import pandas as pd
file1 = pd.read_csv(file_names, parse_dates=[['Date', 'Time']])
df = pd.DataFrame(file1)
# Week is defined as sunday 4pm to Friday 4pm --not working correctly
# this is a timestamp obj
df['year'], df['month'] = df['Date_Time'].dt.year, df['Date_time'].dt.month
df['date'] = df['Date_Time'].dt.day
df['week'] = df['Date_Time'].dt.week
"""
these three lines below chunk the data by dates
"""
df_single_day = []
for group in df.groupby(df.Date_Time, sort=False):
df_single_day.append(group[1])
df_single_week = []
for group in df.groupby(['week', 'year'], sort=False):
df_single_week.append(group[1])
df_single_month = []
for group in df.groupby(['month', 'year'], sort=False):
df_single_month.append(group[1])
return df df_single_day, df_single_week, df_single_month
示例输出
df_single_day [0] .tail(5)
Out [11]:
Unnamed: 0 Symbol Date_Time Open High Low Close \
90 91 ABCDEF 2008-05-06 23:35 0.9480 0.9483 0.9477 0.9480
91 92 ABCDEF 2008-05-06 23:40 0.9479 0.9482 0.9476 0.9479
92 93 ABCDEF 2008-05-06 23:45 0.9478 0.9481 0.9474 0.9477
93 94 ABCDEF 2008-05-06 23:50 0.9477 0.9481 0.9472 0.9478
94 95 ABCDEF 2008-05-06 23:55 0.9479 0.9481 0.9475 0.9478
year month date week
90 2008 5 6 19
91 2008 5 6 19
92 2008 5 6 19
93 2008 5 6 19
94 2008 5 6 19
df_single_day [1]。头(5)
出[14]:
Unnamed: 0 Symbol Date_Time Open High Low Close \
95 96 ABCDEF 2008-05-07 00:00 0.9478 0.9483 0.9475 0.9481
96 97 ABCDEF 2008-05-07 00:05 0.9481 0.9484 0.9479 0.9484
97 98 ABCDEF 2008-05-07 00:10 0.9482 0.9485 0.9480 0.9482
98 99 ABCDEF 2008-05-07 00:15 0.9482 0.9485 0.9478 0.9483
99 100 ABCDEF 2008-05-07 00:20 0.9483 0.9485 0.9480 0.9484
year month date week
95 2008 5 7 19
96 2008 5 7 19
97 2008 5 7 19
98 2008 5 7 19
99 2008 5 7 19
该功能为每个列表的00:00开始分块数据,我希望它从一天的16:00开始到第二天的15:55
答案 0 :(得分:0)
df['temp'] = df['Date'].astype(str) + ' ' + df['Time']
df.temp = pd.to_datetime(df.temp, infer_datetime_format=True)
df.temp = df.temp + pd.offsets.Hour(8)
g = df.groupby(df['temp'].dt.normalize())
df_single_day = []
for group in g:
if len(group[1])> 1:
df_single_day.append(group[1])
上面的代码产生了正确的答案。我有一个轻微(但不重要的问题),本周末时间为16:00的小组是独自一人,所以我只是用if语句删除它们。
仍然想知道怎么做像dt.week这样的一周,周日来自Sun-Sun,因为我的数据是从星期一开始的,而dt.week是Mon-Mon ......