假设我们有一个按分钟索引的时间序列如下:
df =
Time (HH:MM) Value
01/01/2014 00:00 1
01/01/2014 00:01 2
01/01/2014 00:02 3
01/01/2014 00:03 4
...
01/08/2014 00:00 5000
...
我期待" group"按周计算数据集如下:
df2 =
Week Val1 Val2 Val3 Val4 ...
1 1 2 3 4 ...
2 5000 ...
3
4
...
换句话说,第1周(01/01 / 2014-01 / 08/2014)中的每个1分钟观察结果表示为df2中的列。 (每周应该有10,080分钟/列)。
我尝试过一些函数,包括groupby(),但是大多数函数似乎都聚合了数据,而不是将它分成我正在寻找的各个列。
编辑:它不一定必须采用数据帧格式,但我将其用于输入为周的函数。类似于尝试为每周创建值的直方图。
答案 0 :(得分:1)
您需要weekofyear
+ cumcount
来计算新列名称,然后按set_index
重新整形unstack
:
<强> 1 即可。如果@Configuration
@EnableScheduling
public class MyApplicationSchedulingConfiguration {
@Bean
public TaskScheduler taskScheduler() {
return new ConditionalThreadPoolTaskScheduler();
}
}
为df
且DataFrame
为列,则为解决方案:
Time (HH:MM)
pivot
的另一个解决方案:
print (type(df))
<class 'pandas.core.frame.DataFrame'>
print (df.columns)
Index(['Time (HH:MM)', 'Value'], dtype='object')
weeks = pd.to_datetime(df['Time (HH:MM)']).dt.weekofyear.rename('Week')
countweeks = df.groupby(weeks).cumcount() + 1
df = df.set_index([weeks, countweeks])['Value'].unstack().add_prefix('Val')
print (df)
Val1 Val2 Val3 Val4
Week
1 1.0 2.0 3.0 4.0
2 5000.0 NaN NaN NaN
如果需要通过weeks = pd.to_datetime(df['Time (HH:MM)']).dt.weekofyear.rename('Week')
countweeks = df.groupby(weeks).cumcount().add(1).astype(str).radd('Val')
df = pd.pivot(index=weeks, columns=countweeks, values=df['Value']).fi
print (df)
Val1 Val2 Val3 Val4
Week
1 1.0 2.0 3.0 4.0
2 5000.0 NaN NaN NaN
替换参数0
替换NaN到fill_value=0
:
unstack
在第二个解决方案中使用fillna
:
weeks = pd.to_datetime(df['Time (HH:MM)']).dt.weekofyear.rename('Week')
countweeks = df.groupby(weeks).cumcount() + 1
df = df.set_index([weeks, countweeks])['Value'].unstack(fill_value=0).add_prefix('Val')
print (df)
Val1 Val2 Val3 Val4
Week
1 1 2 3 4
2 5000 0 0 0
<强> 2 即可。如果weeks = pd.to_datetime(df['Time (HH:MM)']).dt.weekofyear.rename('Week')
countweeks = df.groupby(weeks).cumcount().add(1).astype(str).radd('Val')
df = pd.pivot(index=weeks, columns=countweeks, values=df['Value']).fillna(0)
print (df)
Val1 Val2 Val3 Val4
Week
1 1.0 2.0 3.0 4.0
2 5000.0 0.0 0.0 0.0
为s
且Series
为索引,则为解决方案:
Time (HH:MM)
第二个解决方案:
print (s)
Time (HH:MM)
01/01/2014 00:00 1
01/01/2014 00:01 2
01/01/2014 00:02 3
01/01/2014 00:03 4
01/08/2014 00:00 5000
Name: Value, dtype: int64
print (type(s))
<class 'pandas.core.series.Series'>
print (s.index)
Index(['01/01/2014 00:00', '01/01/2014 00:01', '01/01/2014 00:02',
'01/01/2014 00:03', '01/08/2014 00:00'],
dtype='object', name='Time (HH:MM)')
weeks = pd.to_datetime(s.index).weekofyear.rename('Week')
countweeks = s.groupby(weeks).cumcount() + 1
df = s.to_frame().set_index([weeks, countweeks])['Value'].unstack().add_prefix('Val')
print (df)
Val1 Val2 Val3 Val4
Week
1 1.0 2.0 3.0 4.0
2 5000.0 NaN NaN NaN
第3 即可。如果weeks = pd.to_datetime(s.index).weekofyear.rename('Week')
countweeks = s.groupby(weeks).cumcount().add(1).astype(str).radd('Val')
df = pd.pivot(index=weeks, columns=countweeks, values=s)
print (df)
Val1 Val2 Val3 Val4
Week
1 1.0 2.0 3.0 4.0
2 5000.0 NaN NaN NaN
为df
且DataFrame
为索引,则为解决方案:
Time (HH:MM)
print (df)
Value
Time (HH:MM)
01/01/2014 00:00 1
01/01/2014 00:01 2
01/01/2014 00:02 3
01/01/2014 00:03 4
01/08/2014 00:00 5000
print (type(df))
<class 'pandas.core.frame.DataFrame'>
print (df.index)
Index(['01/01/2014 00:00', '01/01/2014 00:01', '01/01/2014 00:02',
'01/01/2014 00:03', '01/08/2014 00:00'],
dtype='object', name='Time (HH:MM)')
weeks = pd.to_datetime(df.index).weekofyear.rename('Week')
countweeks = df.groupby(weeks).cumcount() + 1
df = df.set_index([weeks, countweeks])['Value'].unstack().add_prefix('Val')
答案 1 :(得分:0)
您可以使用pivot_table
之类的
In [3192]: df['Week'] = df['Time (HH:MM)'].dt.weekofyear
In [3193]: df['ValCount'] = 'Val' + df.groupby('Week').cumcount().add(1).astype(str)
In [3194]: df.pivot_table(index='Week', columns='ValCount', values='Value').reset_index()
Out[3194]:
ValCount Week Val1 Val2 Val3 Val4
0 1 1.0 2.0 3.0 4.0
1 2 5000.0 NaN NaN NaN
索引中有Week
In [3198]: df.pivot_table(index='Week', columns='ValCount',
values='Value').rename_axis(None, 1)
Out[3198]:
Val1 Val2 Val3 Val4
Week
1 1.0 2.0 3.0 4.0
2 5000.0 NaN NaN NaN
详细
In [3202]: df
Out[3202]:
Time (HH:MM) Value
0 2014-01-01 00:00:00 1
1 2014-01-01 00:01:00 2
2 2014-01-01 00:02:00 3
3 2014-01-01 00:03:00 4
4 2014-01-08 00:00:00 5000
In [3203]: df.dtypes
Out[3203]:
Time (HH:MM) datetime64[ns]
Value int64
dtype: object