Question

我有以下数据框：

> dput(head(testFrame, 10))
structure(list(`df$data.founded_at` = structure(c(15492, 15639, 
15065, 15340, 15257, 13514, 14610, 14975, 15340, 11323), class = "Date")), .Names = "df$data.founded_at", row.names = c("Entertainment", 
"Publishing", "Electronics", "Software", "Software.1", "Curated Web", 
"Software.2", "Analytics", "E-Commerce", "E-Commerce.1"), class = "data.frame")

我想计算每天出现同一类别的频率，并将值添加到新列中。例如：Lets取1.1.2000，然后如果类别Software在1.1.2000上的数据集中出现5次，那么应该在最后一列中添加5。

这是表格中可视化的另一个例子：

date            category         freq
1.1.2011        E-Commerce       2
3.3.2013        Software         2
1.1.2011        E-Commerce       2
2.5.2014        Analytics        1
2.5.2014        Search           1
3.3.2013        Software         2

任何建议如何实现？

提前为你的inpu！

Answer 1

尝试data.table（您提供的数据没有任何重复项，因此我使用了所需输出中没有freq列的数据）

library(data.table)
setDT(testFrame)[, freq := .N, by = list(date, category)]
testFrame
#        date   category freq
# 1: 1.1.2011 E-Commerce    2
# 2: 3.3.2013   Software    2
# 3: 1.1.2011 E-Commerce    2
# 4: 2.5.2014  Analytics    1
# 5: 2.5.2014     Search    1
# 6: 3.3.2013   Software    2

你也可以使用非常高效的transform和ave函数

对基础R进行操作

transform(testFrame, freq = ave(seq_len(nrow(testFrame)), list(date, category), FUN = length))

#       date   category freq
# 1 1.1.2011 E-Commerce    2
# 2 3.3.2013   Software    2
# 3 1.1.2011 E-Commerce    2
# 4 2.5.2014  Analytics    1
# 5 2.5.2014     Search    1
# 6 3.3.2013   Software    2

计算一个类别在一天内发生的频率

1 个答案: