我有一个数据框
x userId bookId rating
1 1 412 6
2 1 454 5
3 2 412 4
等。
基本上,用户评价了很多书,而且一本书有很多评分。
我需要围绕userId
提取一些描述性统计信息。给出的平均评级数量,给出的平均评级等等。
任何人都可以指出我正确的方向吗?
答案 0 :(得分:5)
您可以使用data.table
执行这些计算:
如果您的data.frame
被称为books
:
require(data.table)
setDT(books)
# average rating by user
books[, mean(rating), by=userId]
# userId V1
#1: 1 5.5
#2: 2 4.0
# average amount of ratings given :
books[, .N, by=userId][, mean(N)]
#[1] 1.5
答案 1 :(得分:5)
我不确定我是否得到您的确切问题/任务。但以下内容可以提供一些见解:
data = read.table(header = T, stringsAsFactors = F, text = "x userId bookId rating
1 1 412 6
2 1 454 5
3 2 412 4")
# Number of ratings per user
userFreq = data.frame(table(data$userId))
# Var1 Freq
# 1 1 2
# 2 2 1
# mean rating per userID
meanRatingPerUser = aggregate(data$rating, by=list(data$userId), FUN = mean )
# Group.1 x
# 1 1 5.5
# 2 2 4.0
# mean rating per book
meanRatingPerBook = aggregate(data$rating, by=list(data$bookId), FUN = mean )
# Group.1 x
# 1 412 5
# 2 454 5
# "Summary" function, applied per bookID
moreStats = aggregate(data$rating, by=list(data$bookId), FUN = summary )
# Group.1 x.Min. x.1st Qu. x.Median x.Mean x.3rd Qu. x.Max.
# 1 412 4.0 4.5 5.0 5.0 5.5 6.0
# 2 454 5.0 5.0 5.0 5.0 5.0 5.0