我有一个相当大的数据集,想要计算特定列中提到的值的频率
示例:
A Home Away
D Lisa Jill
D Jack Andre
C Jack Kirk
C Jane Jill
我想添加一个新列(称为Count)并计算Home中每个名称的提及频率。
A Home Away Count
D Lisa Jill 1
D Jack Andre 2
C Jack Kirk 2
C Jane Jill 1
谢谢!
答案 0 :(得分:4)
或使用plyr
(假设您的数据放在df
中):
library(plyr)
join(df, as.data.frame(table(Home=df$Home)))
# A Home Away Freq
# 1 D Lisa Jill 1
# 2 D Jack Andre 2
# 3 C Jack Kirk 2
# 4 C Jane Jill 1
答案 1 :(得分:2)
我们可以使用dplyr
。按照' Home'进行分组后,获取行数(n()
)并将其创建为包含mutate
的新列
library(dplyr)
library(magrittr)
df1 %<>%
group_by(Home) %>%
mutate(Count = n())
# A Home Away Count
# <chr> <chr> <chr> <int>
#1 D Lisa Jill 1
#2 D Jack Andre 2
#3 C Jack Kirk 2
#4 C Jane Jill 1
或data.table
library(data.table)
setDT(df1)[, Count := .N, by = Home]
或ave
base R
df1$Count <- with(df1, ave(seq_along(Home), Home, FUN = length))