在Pandas DataFrame的组内上采样int系列

时间:2018-01-07 23:30:15

标签: python pandas resampling

我的问题是如何对多个'分组'中的每一个进行上采样。在我的数据框中。 (就我而言,对于每个团队'以及' LeadWeek'分组)。

我看到了内置函数和许多用于对时间序列进行上采样的示例,但不是用于对整数进行上采样。出于各种原因,我现在不能进入,我想用整数代替时间序列。

在我的情况下,我有'团队'和Leadweeks'我想上传'转换周'每个队伍的成绩为[0,1,2,3,4]。和' LeadWeek'组合

我认为通过multi-index / groupby + resample()可以做到这一点,但我不够聪明,几个小时后才弄明白修修补补。在这里向明智的人寻求帮助......

所以这是示例数据框:

df = pd.DataFrame([
['Team A', pd.datetime(2017, 12, 1), 0, 2]
,['Team A', pd.datetime(2017, 12, 1), 2, 1]
,['Team A', pd.datetime(2017, 12, 1), 4, 1]
,['Team A', pd.datetime(2017, 12, 8), 3, 2]
,['Team B', pd.datetime(2017, 12, 1), 0, 1]
,['Team B', pd.datetime(2017, 12, 1), 2, 3]
,['Team B', pd.datetime(2017, 12, 8), 1, 3]
,['Team B', pd.datetime(2017, 12, 8), 3, 2]
]
, columns=['Team', 'LeadWeek', 'ConversionWeek', 'Conversions']
)

我想要的输出如下,每个团队/ LeadWeek分组都有5个转换周期'行,编号为0到4:

       Team     LeadWeek     ConversionWeek     Conversions
0      Team A     2017-12-01     0     2.0
1      Team A     2017-12-01     1     0.0
2      Team A     2017-12-01     2     1.0
3      Team A     2017-12-01     3     0.0
4      Team A     2017-12-01     4     1.0
5      Team A     2017-12-08     0     0.0
6      Team A     2017-12-08     1     0.0
7      Team A     2017-12-08     2     0.0
8      Team A     2017-12-08     3     2.0
9      Team A     2017-12-08     4     0.0
10     Team B     2017-12-01     0     1.0
11     Team B     2017-12-01     1     0.0
12     Team B     2017-12-01     2     3.0
13     Team B     2017-12-01     3     0.0
14     Team B     2017-12-01     4     0.0
15     Team B     2017-12-08     0     0.0
16     Team B     2017-12-08     1     3.0
17     Team B     2017-12-08     2     0.0
18     Team B     2017-12-08     3     2.0
19     Team B     2017-12-08     4     0.0

我确实有一个解决方案,但它不是非常pythonic。它与我在SQL中解决它的方式相同,即创建一个'脚手架'使用所有不同元素的笛卡尔积,然后将我的实际转换加入其中。在Python中,此方法使用itertools.product()

我的解决方案是:

import pandas as pd
import numpy as np
import itertools as it

df = pd.DataFrame([
['Team A', pd.datetime(2017, 12, 1), 0, 2]
,['Team A', pd.datetime(2017, 12, 1), 2, 1]
,['Team A', pd.datetime(2017, 12, 1), 4, 1]
,['Team A', pd.datetime(2017, 12, 8), 3, 2]
,['Team B', pd.datetime(2017, 12, 1), 0, 1]
,['Team B', pd.datetime(2017, 12, 1), 2, 3]
,['Team B', pd.datetime(2017, 12, 8), 1, 3]
,['Team B', pd.datetime(2017, 12, 8), 3, 2]
]
, columns=['Team', 'LeadWeek', 'ConversionWeek', 'Conversions']
)

ConversionWeek = np.linspace(0, 4, 5, dtype=int)

Team = df['Team'].unique()

LeadWeek = df['LeadWeek'].unique()

scaffold_raw = []

for i in it.product(Team, LeadWeek, ConversionWeek):
    scaffold_raw.append(i)

scaffold = pd.DataFrame(scaffold_raw, columns=['Team', 'LeadWeek', 'ConversionWeek'])

new_frame = scaffold.merge(df, how='left')

new_frame = new_frame.sort_values(by=['Team', 'LeadWeek', 'ConversionWeek']).reset_index(drop=True)

new_frame['Conversions'].fillna(0, inplace=True)

感谢任何有关更优雅解决方案的帮助。

1 个答案:

答案 0 :(得分:1)

通过传递 Calendar cal = Calendar.getInstance(); cal.setTime(pubdate); -

来使用 System.out.println("Month: " + cal.get(Calendar.MONTH)); System.out.println("Day: " + cal.get(Calendar.DAY_OF_MONTH));
reindex