Let this be my data:
my.data<-data.frame(name=c("a","b","b","c","c","c"))
What I need is a variable that indicates for each name, their respective relative frequency in the dataset. Essentially, this would look like that:
name target
1 a 0.1666667
2 b 0.3333333
3 b 0.3333333
4 c 0.5000000
5 c 0.5000000
6 c 0.5000000
What I tried is that I computed dummy variables for each name, and then based on these dummies I calculated new variables that indicate the relative frequency of each name in the dataset. See below:
temp_dummies<-data.frame(spatstat::dummify(my.data$name))
my.data<-cbind.data.frame(my.data, temp_dummies)
rm(temp_dummies)
my.data %>%
dplyr::mutate(a_per=mean(a),
b_per=mean(b),
c_per=mean(c)) -> my.data
Now I need to extract the relative frequencies for each name and aggregate it back to get my target variable. I guess I should do something like this below but I don't know what to mutate.
my.data %>%
dplyr::group_by(name) %>%
dplyr::mutate(...) -> my.data
Questions:
function(x)
to name
.答案 0 :(得分:2)
使用base
-R,您可以使用以下单线:
my.data$target <- (table(my.data$name)/nrow(my.data))[ my.data$name ]
说明和几行代码:
我们使用table
函数获取 name 的出现次数,并用nrow
将其除以df中的行数。之后,您可以在“表格”中查找当前行的“名称”。此值保存在新列的相应行中。
t <- table(my.data$name)/nrow(my.data)
my.data$target <- t[ my.data$name ]
my.data
name target
1 a 0.1666667
2 b 0.3333333
3 b 0.3333333
4 c 0.5000000
5 c 0.5000000
6 c 0.5000000
答案 1 :(得分:1)
We can use add_count
to get count of each name
and then divide it by number of rows using n()
.
library(dplyr)
my.data %>%
add_count(name) %>%
mutate(n = n/n())
# name n
# <fct> <dbl>
#1 a 0.167
#2 b 0.333
#3 b 0.333
#4 c 0.5
#5 c 0.5
#6 c 0.5