Question

dput(x)

structure(list(State = structure(c(1L, 1L, 2L, 3L, 2L, 4L, 2L, 
5L, 5L, 2L), .Label = c("Illinois", "Texas", "California", "Louisiana", 
"Michigan"), class = "factor"), Lat = structure(1:10, .Label = c("41.627", 
"41.85", "32.9588", "33.767", "33.0856", "30.4298", "29.7633", 
"42.4687", "43.0841", "29.6919"), class = "factor"), 
 Long = structure(1:10, .Label = c("-88.204", 
"-87.65", "-96.9812", "-118.1892", "-96.6115", "-90.8999", "-95.3633", 
"-83.5235", "-82.4905", "-95.6512"), class = "factor")), .Names = c("State", 
"Lat", "Long"), row.names = c(NA, 10L), class = "data.frame")

我需要另一列说明总数，即每个州的总数。我可以通过创建另一列Total：

来做到这一点

x$Total<-1

然后

library(data.table
x<-data.table(x)
x<-x[,total:=sum(Total),by=State]

是否有更好/更短/更有效的方法来计算数据框中的因子？

Answer 1

您可以像这样使用dplyr（无需创建Total列）：

（编辑：感谢@beginneR让我了解n()的存在，这可以更简洁一些）

library('dplyr')
mutate(group_by(x, State), total = n())

@ beginneR的group_by(x, State) %>% mutate(total = n())解决方案也很不错，特别是如果您需要继续对数据进行其他操作。同样，

x %>%
  group_by(State) %>%
  mutate(total = n())

也会奏效。

Answer 2

您还可以使用R base aggregate

> aggregate(.~State, FUN=length, data=x)

如何计算独特因子并将它们插入R中的相同数据框中

2 个答案: