dput(x)
structure(list(State = structure(c(1L, 1L, 2L, 3L, 2L, 4L, 2L,
5L, 5L, 2L), .Label = c("Illinois", "Texas", "California", "Louisiana",
"Michigan"), class = "factor"), Lat = structure(1:10, .Label = c("41.627",
"41.85", "32.9588", "33.767", "33.0856", "30.4298", "29.7633",
"42.4687", "43.0841", "29.6919"), class = "factor"),
Long = structure(1:10, .Label = c("-88.204",
"-87.65", "-96.9812", "-118.1892", "-96.6115", "-90.8999", "-95.3633",
"-83.5235", "-82.4905", "-95.6512"), class = "factor")), .Names = c("State",
"Lat", "Long"), row.names = c(NA, 10L), class = "data.frame")
我需要另一列说明总数,即每个州的总数。我可以通过创建另一列Total:
来做到这一点x$Total<-1
然后
library(data.table
x<-data.table(x)
x<-x[,total:=sum(Total),by=State]
是否有更好/更短/更有效的方法来计算数据框中的因子?
答案 0 :(得分:1)
您可以像这样使用dplyr
(无需创建Total
列):
(编辑:感谢@beginneR让我了解n()
的存在,这可以更简洁一些)
library('dplyr')
mutate(group_by(x, State), total = n())
@ beginneR的group_by(x, State) %>% mutate(total = n())
解决方案也很不错,特别是如果您需要继续对数据进行其他操作。同样,
x %>%
group_by(State) %>%
mutate(total = n())
也会奏效。
答案 1 :(得分:0)
您还可以使用R base aggregate
> aggregate(.~State, FUN=length, data=x)