我有一个格式如下的数据框:
user <- c(1,1,2,2,2,2,3,3,3)
answer_num <- c(1,2,3,3,4,4,5,5,6)
df <- data.frame(user,answer_num)
我正在尝试收集每个用户内有关答案实例数的统计信息。例如,我可以通过以下方式获得每个答案的平均实例数:
library(dplyr)
df %>% group_by(user) %>% summarise(inst_per_answer = n()/length(unique(answer_num)))
给了我:
user inst_per_answer
1 1 1.0
2 2 2.0
3 3 1.5
我如何得到每个答案实例数的标准差?
澄清:
我正在寻找每个答案实例数的标准差。例如,用户1具有1个答案1的实例和1个答案2的实例。因此,标准偏差为0 - sd(c(1,1))
。用户3有2个答案5的实例和1个答案6的实例,对于sd为0.7 - sd(c(2,1))
。
答案 0 :(得分:3)
试试这个
df %>%
count(user, answer_num) %>%
summarise(sd_per_user = sd(n))
# Source: local data frame [3 x 2]
#
# user sd_per_user
# 1 1 0.0000000
# 2 2 0.0000000
# 3 3 0.7071068
或更短的版本
data.table
或library(data.table)
setDT(df)[, .(sd_per_user = sd(table(answer_num))), by = user]
# user sd_per_user
# 1: 1 0.0000000
# 2: 2 0.0000000
# 3: 3 0.7071068
版本(使用@Thelas base R idea)
$('#needToOpenANativeSelectMenu').click(function(){
//what to do to open up a native select menu?
});
答案 1 :(得分:1)
对于那些对sqldf
感兴趣的人,有两个选择:
RSQLite STDEV
:
library(sqldf)
sqldf("SELECT user, STDEV(n) AS sd
FROM (SELECT user, answer_num, count(answer_num) AS n
FROM df GROUP BY user,answer_num)
GROUP BY user")
RH2,STDDEV
或STDDEV_SAMP
:
library(RH2)
sqldf("SELECT user, STDDEV(n) AS sd
FROM (SELECT user, answer_num, COUNT(answer_num) AS n
FROM df GROUP BY user,answer_num)
GROUP BY user")
输出:
user sd
1 1 0.0000000
2 2 0.0000000
3 3 0.7071068