Question

我有下面的函数读取csv文件并将数据分块为Days，Weeks和Months。

我的问题是这些日子从正常的上午12点开始24小时。但是，数据应该从下午4点到下午4点（进入第二天）进行分块。

此外，自星期一数据开始以来，dt.week将在周一开始一周。我想默认为星期日下午4点 - 周五下午4点为一周。我可以通过索引天真地做到这一点，我想知道是否有更优雅的解决方案。

目标：我想制作一个数据框列表，这些数据框在几天，几周和几个月内将这5分钟的数据（请参阅df.head（））分块。好几天，我需要一天开始下午4点，第二天一直持续到下午4点。几个星期以来，我希望这个星期在周日开始，问题是因为数据从星期一开始，它想要从星期一开始分周。

def read_in_files(file_names):
    """
    1. Read the csv files to memory into a pandas dataframe with pd.read_csv
    2. separate the df into year, month, and date objects
    3. It also chunks the data by single day
    """
    import os
    import pandas as pd

    file1 = pd.read_csv(file_names, parse_dates=[['Date', 'Time']])

    df = pd.DataFrame(file1)

# Week is defined as sunday 4pm to Friday 4pm --not working correctly
# this is a timestamp obj
    df['year'], df['month'] = df['Date_Time'].dt.year, df['Date_time'].dt.month
    df['date'] = df['Date_Time'].dt.day
    df['week'] = df['Date_Time'].dt.week

    """
    these three lines below chunk the data by dates
    """
    df_single_day = []
    for group in df.groupby(df.Date_Time, sort=False):
        df_single_day.append(group[1])

    df_single_week = []
    for group in df.groupby(['week', 'year'], sort=False):
        df_single_week.append(group[1])

    df_single_month = []
        for group in df.groupby(['month', 'year'], sort=False):
        df_single_month.append(group[1])

    return df df_single_day, df_single_week, df_single_month

示例输出

df_single_day [0] .tail（5）

Out [11]：

    Unnamed: 0  Symbol     Date_Time     Open    High     Low   Close  \
90          91  ABCDEF 2008-05-06 23:35  0.9480  0.9483  0.9477  0.9480   
91          92  ABCDEF 2008-05-06 23:40  0.9479  0.9482  0.9476  0.9479   
92          93  ABCDEF 2008-05-06 23:45  0.9478  0.9481  0.9474  0.9477   
93          94  ABCDEF 2008-05-06 23:50  0.9477  0.9481  0.9472  0.9478   
94          95  ABCDEF 2008-05-06 23:55  0.9479  0.9481  0.9475  0.9478   
year  month  date  week  
90  2008      5     6    19  
91  2008      5     6    19  
92  2008      5     6    19  
93  2008      5     6    19  
94  2008      5     6    19

df_single_day [1]。头（5）

出[14]：

    Unnamed: 0  Symbol     Date_Time     Open    High     Low   Close  \
95          96  ABCDEF 2008-05-07 00:00  0.9478  0.9483  0.9475  0.9481   
96          97  ABCDEF 2008-05-07 00:05  0.9481  0.9484  0.9479  0.9484   
97          98  ABCDEF 2008-05-07 00:10  0.9482  0.9485  0.9480  0.9482   
98          99  ABCDEF 2008-05-07 00:15  0.9482  0.9485  0.9478  0.9483   
99         100  ABCDEF 2008-05-07 00:20  0.9483  0.9485  0.9480  0.9484   
    year  month  date  week  
95  2008      5     7    19  
96  2008      5     7    19  
97  2008      5     7    19  
98  2008      5     7    19  
99  2008      5     7    19

该功能为每个列表的00:00开始分块数据，我希望它从一天的16:00开始到第二天的15:55

Answer 1

df['temp'] = df['Date'].astype(str) + ' ' + df['Time']
df.temp = pd.to_datetime(df.temp, infer_datetime_format=True)
df.temp = df.temp + pd.offsets.Hour(8)

g = df.groupby(df['temp'].dt.normalize())
df_single_day = []
for group in g:
    if len(group[1])> 1:
        df_single_day.append(group[1])

上面的代码产生了正确的答案。我有一个轻微（但不重要的问题），本周末时间为16:00的小组是独自一人，所以我只是用if语句删除它们。

仍然想知道怎么做像dt.week这样的一周，周日来自Sun-Sun，因为我的数据是从星期一开始的，而dt.week是Mon-Mon ......

Pandas Resample DateTime跨越2天，周开始于星期日

1 个答案: