我在R中的数据争论中遇到了问题。所以我有一个这样的数据框:
CardID Date Amount ItemNumber ItemCode
1 C0100000111 2001-07-19 449.00 1 I0000000808
2 C0100000111 2001-02-20 9.99 1 I0000000622
3 C0100000111 2001-04-27 49.99 1 I0000000284
4 C0100000111 2001-02-20 69.00 1 I0000000488
5 C0100000111 2001-05-17 299.00 1 I0000000595
6 C0100000111 2001-05-19 5.99 1 I0000000078
7 C0100000199 2001-08-20 229.00 1 I0000000783
8 C0100000199 2001-12-29 229.00 1 I0000000783
9 C0100000199 2001-06-28 139.00 1 I0000000537
10 C0100000343 2001-09-07 99.00 1 I0000000532
我想在这样的结构中转换它,
CardID,FirstPurchaseDate,LastPurchaseDate,NumberOrders,NumberSKUs,TotalAmounts
新表中的每一行CardID都是唯一的。我怎样才能做到这一点?
根据上表,我预计会有这样的输出
> Ex
CardID FirstPurchaseDate LastPurchaseDate NumberOrders NumberSKUs TotalAmounts
1 C0100000111 2001-02-20 2001-07-19 6 6 882.97
2 C0100000199 2001-06-28 2001-12-29 3 2 597.00
3 C0100000343 2001-09-07 2001-09-07 1 1 99.00
答案 0 :(得分:2)
我们可以在按照CardID'分组后使用summarise
与dplyr
library(dplyr)
df1 %>%
group_by(CardID) %>%
summarise(FirstPurchaseDate = first(Date),
LastPurchaseDate = last(Date),
NumberOrders = n(),
NumberSKUs= n_distinct(ItemCode),
TotalAmount = sum(Amount) )
答案 1 :(得分:1)
以下data.table
版本:
library(data.table)
dt <- data.frame(
CardID = c("C0100000111", "C0100000111", "C0100000111", "C0100000111", "C0100000111", "C0100000111", "C0100000199", "C0100000199", "C0100000199", "C0100000343"),
Date = as.Date(c("2001-07-19", "2001-02-20", "2001-04-27", "2001-02-20", "2001-05-17", "2001-05-19", "2001-08-20", "2001-12-29", "2001-06-28", "2001-09-07")),
Amount = c(449, 9.99, 49.99, 69, 299, 5.99, 229, 229, 139, 99),
ItemNumber = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L),
ItemCode = c("I0000000808", "I0000000622", "I0000000284", "I0000000488", "I0000000595", "I0000000078", "I0000000783", "I0000000783", "I0000000537", "I0000000532")
)
# Convert to data.table
setDT(dt)
dt[, .(
FirstPurchaseDate = min(Date),
LastPurchaseDate = max(Date),
NumberOrders = .N,
NumberSKUs = length(unique(ItemCode)),
TotalAmounts = sum(Amount)
), by = CardID]
结果:
CardID FirstPurchaseDate LastPurchaseDate NumberOrders NumberSKUs TotalAmounts
1: C0100000111 2001-02-20 2001-07-19 6 6 882.97
2: C0100000199 2001-06-28 2001-12-29 3 2 597.00
3: C0100000343 2001-09-07 2001-09-07 1 1 99.00
编辑:Akrun是第一个,所以去找他的答案!留下这个仅用于data.table
参考。我应该开始使用dplyr
更多......