Question

如何对数据框进行分组

np.random.seed(42)
days = pd.date_range(start='1/1/2018', end='12/31/2019')
data = np.random.randint(1, high=100, size=len(days))
df = pd.DataFrame({ 'col1':days,'col2': data})
print(df.head())
        col1  col2
0 2018-01-01    52
1 2018-01-02    93
2 2018-01-03    15
3 2018-01-04    72
4 2018-01-05    61

一年中的某天，这样生成的数据框看起来就像

         min   
01-01    ...   
01-02    ...   
01-03    ...   
01-04    ...   
01-05    ...   
...      ...

即包含每个日期在col2上的最小值，其中索引代表月份和日期，例如1月2日是01-02？

Answer 1

我认为您需要Series.dt.strftime和%m分别为几个月的时间和%j为一年的某天：

df = df.groupby(df['col1'].dt.strftime('%m-%j'))['col2'].min()
print (df)
col1
01-001    30
01-002    93
01-003    15
01-004     6
01-005    61
          ..
12-361    18
12-362    47
12-363    17
12-364    14
12-365    15
Name: col2, Length: 365, dtype: int32

或%d天：

df = df.groupby(df['col1'].dt.strftime('%m-%d'))['col2'].min()
print (df)
col1
01-01    30
01-02    93
01-03    15
01-04     6
01-05    61
         ..
12-27    18
12-28    47
12-29    17
12-30    14
12-31    15
Name: col2, Length: 365, dtype: int32

按年份分组

1 个答案: