尝试根据订单日期列和电子邮件创建新的data_frame
。因此,如果我有一个重复的电子邮件(例如下面示例中的cheers@web.com),我想合并电子邮件并将order_date变量放在它旁边的新列中。我想在完整的DF中做到这一点。这将介绍许多NAs,但我稍后会解决这个问题。
我的数据框如下:
Source: local data frame [6 x 4]
Groups: email [5]
email order_date `sum(price_excl_vat_euro)` `sum(total_qty)`
<chr> <date> <dbl> <int>
1 whatis@web.com 2016-09-05 140.48 2
2 myemail@web.com 2016-11-01 41.31 1
3 whereto@web.com 2016-09-18 61.98 1
4 cheers@web.com 2016-08-01 61.98 1
5 cheers@web.com 2016-08-02 61.98 1
6 hello@web.com 2016-08-02 140.49 1
我想要获得的是(我现在不关心的其他专栏):
email order_date1 order_date2
whatis@web.com 2016-09-05 NA
myemail@web.com 2016-11-01 NA
whereto@web.com 2016-09-18 NA
cheers@web.com 2016-08-01 2016-08-02
hello@web.com 2016-08-02 NA
重要的是要知道订单数量可能在1-10(平均)之间变化。我尝试了spread
包中的tidyr
函数。但无法让它发挥作用。任何提示都非常感谢!
答案 0 :(得分:3)
例如
df <- read.table(row.names=1, stringsAsFactors = F, text="
1 whatis@web.com 2016-09-05 140.48 2
2 myemail@web.com 2016-11-01 41.31 1
3 whereto@web.com 2016-09-18 61.98 1
4 cheers@web.com 2016-08-01 61.98 1
5 cheers@web.com 2016-08-02 61.98 1
6 hello@web.com 2016-08-02 140.49 1")
df <- df[order(df[,1], df[,2]), ]
lst <- split(df[,2],df[,1])
do.call(rbind, lapply(lst, "length<-", max(lengths(lst))))
# [,1] [,2]
# cheers@web.com "2016-08-01" "2016-08-02"
# hello@web.com "2016-08-02" NA
# myemail@web.com "2016-11-01" NA
# whatis@web.com "2016-09-05" NA
# whereto@web.com "2016-09-18" NA
或
library(tidyverse)
df %>%
arrange(V2, V3) %>%
group_by(V2) %>%
transmute(V3, date=paste0("date", 1:n())) %>%
spread(date, V3)
# Source: local data frame [5 x 3]
# Groups: V2 [5]
#
# V2 date1 date2
# * <chr> <chr> <chr>
# 1 cheers@web.com 2016-08-01 2016-08-02
# 2 hello@web.com 2016-08-02 <NA>
# 3 myemail@web.com 2016-11-01 <NA>
# 4 whatis@web.com 2016-09-05 <NA>
# 5 whereto@web.com 2016-09-18 <NA>