如何在月份中对数据框colum应用函数并在月份中以列的形式显示?

时间:2016-11-21 16:08:59

标签: python pandas dataframe ipython

我在数据框中有一些像下面这样的输入数据。

id priority owner goal changed_time delta_time 
1   P1      bob    40   2016-11-02   20
2   P2      bob    20   2016-11-02   10
3   P3      bob    30   2016-11-02   50
4   P1      alice  20   2016-10-02   70
5   P1      bob    40   2016-10-02   05
6   P1      bob    40   2016-10-02   24
7   P3      alice  40   2016-09-02   34
8   P1      bob    40   2016-09-02   20
9   P2      ross   40   2016-09-02   10
10  P1      bob    40   2016-11-02   20
11  P2      sec    40   2016-09-02   34
12  P3      bob    30   2016-11-02   90

我想要如下输出。我正在做的左侧分组

df[['owner','priority','Goal','Delta_time']].groupby(['productowner','priority']).mean()

但我想要输出如下。

                 (Average of delta_time in monthly basis)
owner priority goal  2016-11  2016-10  2016-09

bob    p1      40     
       p2      20
       p3      30
alice  p1      20
       p3      40
ross   p2      40
sec    p2      40

所以如何在delta_time列上应用均值函数,这将取决于changed_time列,并且应该按照列表显示,如上表所示。我已经通过使用

进行分组而进入左侧
df[['owner','priority','Goal','Delta_time']].groupby([df.statusChangedtime.dt.month,'owner','priority']).mean()

2 个答案:

答案 0 :(得分:0)

我会做这样的事情:

df.groupby(['priority','owner',pd.PeriodIndex(data=df.changed_time, freq='M')]]).mean().unstack()

One groupby

要获得所需的格式,我认为你需要做两个单独的groupby然后concat:

df1 = df.groupby(['priority','owner'])['goal'].mean()
# You can also do: pd.DatetimeIndex(data=df.changed_time).month
df2 = df.groupby(['priority','owner',pd.PeriodIndex(data=df.changed_time, freq='M')])['delta_time'].mean().unstack()
pd.concat([df1, df2], axis=1)

Two groupbys

答案 1 :(得分:0)

由于存在多种语言标签,因此问题的答案应该是什么语言并不清楚。一般来说,问题需要集中在一起,而且只能处理一种语言。这可能是为什么很多人都倾向于这个问题。

无论如何,如果您想要R中的答案,可以选择以下方法:

1)reshape2 添加年/月列("time")并使用mean从长到长重塑:

library(reshape2)

df2 <- transform(df, time = substr(changed_time, 1, 7))
dcast(df2, owner + priority + goal ~ time, mean, value.var = "delta_time", fill = NA_real_)

,并提供:

  owner priority goal 2016-09 2016-10 2016-11
1 alice       P1   20      NA    70.0      NA
2 alice       P3   40      34      NA      NA
3   bob       P1   40      20    14.5      20
4   bob       P2   20      NA      NA      10
5   bob       P3   30      NA      NA      70
6  ross       P2   40      10      NA      NA
7   sec       P2   40      34      NA      NA

2)dplyr / tidyr 添加月/年"time"列,并按ownerprioritygoal计算平均delta_time和time。最后从长到高转换并排序。

library(dplyr)
library(tidyr)
df %>%
   mutate(time = substr(changed_time, 1, 7)) %>%
   group_by(owner, priority, goal, time) %>%
   summarize(delta_time = mean(delta_time)) %>%
   ungroup() %>%
   spread(time, delta_time) %>%
   arrange(owner, priority)

,并提供:

# A tibble: 7 x 6
   owner priority  goal 2016-09 2016-10 2016-11
  <fctr>   <fctr> <int>   <dbl>   <dbl>   <dbl>
1  alice       P1    20      NA    70.0      NA
2  alice       P3    40      34      NA      NA
3    bob       P1    40      20    14.5      20
4    bob       P2    20      NA      NA      10
5    bob       P3    30      NA      NA      70
6   ross       P2    40      10      NA      NA
7    sec       P2    40      34      NA      NA

3)没有软件包添加年/月列("time"),使用aggregate计算均值,然后使用reshape从长转换为宽并排序:

df2 <- transform(df, time = substr(changed_time, 1, 7))
ag <- aggregate(delta_time ~ owner + priority + goal + time, df2, mean) 
nms <- unique(as.character(sort(ag$time)))
r <- reshape(ag, dir = "wide", idvar = c("owner", "priority", "goal"), varying = list(nms))
o <- order(r$owner, r$priority)
r[o, ]

,并提供:

  owner priority goal 2016-09 2016-10 2016-11
5 alice       P1   20      NA    70.0      NA
4 alice       P3   40      34      NA      NA
1   bob       P1   40      20    14.5      20
7   bob       P2   20      NA      NA      10
8   bob       P3   30      NA      NA      70
2  ross       P2   40      10      NA      NA
3   sec       P2   40      34      NA      NA

注意:可重复形式的输入数据框df为:

Lines <- "id priority owner goal changed_time delta_time 
1   P1      bob    40   2016-11-02   20
2   P2      bob    20   2016-11-02   10
3   P3      bob    30   2016-11-02   50
4   P1      alice  20   2016-10-02   70
5   P1      bob    40   2016-10-02   05
6   P1      bob    40   2016-10-02   24
7   P3      alice  40   2016-09-02   34
8   P1      bob    40   2016-09-02   20
9   P2      ross   40   2016-09-02   10
10  P1      bob    40   2016-11-02   20
11  P2      sec    40   2016-09-02   34
12  P3      bob    30   2016-11-02   90"
df <- read.table(text = Lines, header = TRUE)