从Python

时间:2017-10-17 05:52:41

标签: python pandas dataframe pandas-groupby

假设我们有一个按分钟索引的时间序列如下:

df =

Time (HH:MM)     Value
01/01/2014 00:00  1 
01/01/2014 00:01  2
01/01/2014 00:02  3
01/01/2014 00:03  4
...
01/08/2014 00:00  5000
...

我期待" group"按周计算数据集如下:

df2 =

Week  Val1 Val2 Val3 Val4 ...
1     1    2    3    4    ...
2     5000 ...
3
4
...

换句话说,第1周(01/01 / 2014-01 / 08/2014)中的每个1分钟观察结果表示为df2中的列。 (每周应该有10,080分钟/列)。

我尝试过一些函数,包括groupby(),但是大多数函数似乎都聚合了数据,而不是将它分成我正在寻找的各个列。

编辑:它不一定必须采用数据帧格式,但我将其用于输入为周的函数。类似于尝试为每周创建值的直方图

2 个答案:

答案 0 :(得分:1)

您需要weekofyear + cumcount来计算新列名称,然后按set_index重新整形unstack

<强> 1 即可。如果@Configuration @EnableScheduling public class MyApplicationSchedulingConfiguration { @Bean public TaskScheduler taskScheduler() { return new ConditionalThreadPoolTaskScheduler(); } } dfDataFrame为列,则为解决方案:

Time (HH:MM)

pivot的另一个解决方案:

print (type(df))
<class 'pandas.core.frame.DataFrame'>

print (df.columns)
Index(['Time (HH:MM)', 'Value'], dtype='object')

weeks = pd.to_datetime(df['Time (HH:MM)']).dt.weekofyear.rename('Week')
countweeks = df.groupby(weeks).cumcount() + 1
df = df.set_index([weeks, countweeks])['Value'].unstack().add_prefix('Val')
print (df)
        Val1  Val2  Val3  Val4
Week                          
1        1.0   2.0   3.0   4.0
2     5000.0   NaN   NaN   NaN

如果需要通过weeks = pd.to_datetime(df['Time (HH:MM)']).dt.weekofyear.rename('Week') countweeks = df.groupby(weeks).cumcount().add(1).astype(str).radd('Val') df = pd.pivot(index=weeks, columns=countweeks, values=df['Value']).fi print (df) Val1 Val2 Val3 Val4 Week 1 1.0 2.0 3.0 4.0 2 5000.0 NaN NaN NaN 替换参数0替换NaN到fill_value=0

unstack

在第二个解决方案中使用fillna

weeks = pd.to_datetime(df['Time (HH:MM)']).dt.weekofyear.rename('Week')
countweeks = df.groupby(weeks).cumcount() + 1
df = df.set_index([weeks, countweeks])['Value'].unstack(fill_value=0).add_prefix('Val')
print (df)
      Val1  Val2  Val3  Val4
Week                        
1        1     2     3     4
2     5000     0     0     0

<强> 2 即可。如果weeks = pd.to_datetime(df['Time (HH:MM)']).dt.weekofyear.rename('Week') countweeks = df.groupby(weeks).cumcount().add(1).astype(str).radd('Val') df = pd.pivot(index=weeks, columns=countweeks, values=df['Value']).fillna(0) print (df) Val1 Val2 Val3 Val4 Week 1 1.0 2.0 3.0 4.0 2 5000.0 0.0 0.0 0.0 sSeries为索引,则为解决方案:

Time (HH:MM)

第二个解决方案:

print (s)

Time (HH:MM)
01/01/2014 00:00       1
01/01/2014 00:01       2
01/01/2014 00:02       3
01/01/2014 00:03       4
01/08/2014 00:00    5000
Name: Value, dtype: int64

print (type(s))
<class 'pandas.core.series.Series'>

print (s.index)
Index(['01/01/2014 00:00', '01/01/2014 00:01', '01/01/2014 00:02',
       '01/01/2014 00:03', '01/08/2014 00:00'],
      dtype='object', name='Time (HH:MM)')

weeks = pd.to_datetime(s.index).weekofyear.rename('Week')
countweeks = s.groupby(weeks).cumcount() + 1
df = s.to_frame().set_index([weeks, countweeks])['Value'].unstack().add_prefix('Val')
print (df)
        Val1  Val2  Val3  Val4
Week                          
1        1.0   2.0   3.0   4.0
2     5000.0   NaN   NaN   NaN

第3 即可。如果weeks = pd.to_datetime(s.index).weekofyear.rename('Week') countweeks = s.groupby(weeks).cumcount().add(1).astype(str).radd('Val') df = pd.pivot(index=weeks, columns=countweeks, values=s) print (df) Val1 Val2 Val3 Val4 Week 1 1.0 2.0 3.0 4.0 2 5000.0 NaN NaN NaN dfDataFrame为索引,则为解决方案:

Time (HH:MM)
print (df)
                  Value
Time (HH:MM)           
01/01/2014 00:00      1
01/01/2014 00:01      2
01/01/2014 00:02      3
01/01/2014 00:03      4
01/08/2014 00:00   5000

print (type(df))
<class 'pandas.core.frame.DataFrame'>

print (df.index)
Index(['01/01/2014 00:00', '01/01/2014 00:01', '01/01/2014 00:02',
       '01/01/2014 00:03', '01/08/2014 00:00'],
      dtype='object', name='Time (HH:MM)')

weeks = pd.to_datetime(df.index).weekofyear.rename('Week')
countweeks = df.groupby(weeks).cumcount() + 1
df = df.set_index([weeks, countweeks])['Value'].unstack().add_prefix('Val')

答案 1 :(得分:0)

您可以使用pivot_table之类的

In [3192]: df['Week'] = df['Time (HH:MM)'].dt.weekofyear

In [3193]: df['ValCount'] = 'Val' + df.groupby('Week').cumcount().add(1).astype(str)

In [3194]: df.pivot_table(index='Week', columns='ValCount', values='Value').reset_index()
Out[3194]:
ValCount  Week    Val1  Val2  Val3  Val4
0            1     1.0   2.0   3.0   4.0
1            2  5000.0   NaN   NaN   NaN

索引中有Week

In [3198]: df.pivot_table(index='Week', columns='ValCount',
                          values='Value').rename_axis(None, 1)
Out[3198]:
        Val1  Val2  Val3  Val4
Week
1        1.0   2.0   3.0   4.0
2     5000.0   NaN   NaN   NaN

详细

In [3202]: df
Out[3202]:
         Time (HH:MM)  Value
0 2014-01-01 00:00:00      1
1 2014-01-01 00:01:00      2
2 2014-01-01 00:02:00      3
3 2014-01-01 00:03:00      4
4 2014-01-08 00:00:00   5000

In [3203]: df.dtypes
Out[3203]:
Time (HH:MM)    datetime64[ns]
Value                    int64
dtype: object