R如何将函数应用于多个子集

时间:2014-03-22 12:08:46

标签: r apply subset

我的数据集看起来像这样:

movie.id unknown Action Adventure rating 
1        0       0      0         3.831461 
2        0       1      1         3.416667 
3        0       0      0         3.945946
4        0       1      0         2.894737
5        1       0      0         4.358491

我想计算每种类型的平均评分。我可以手动分配每一个,但我想更自动地做到

update1:​​每部电影都可以有多种类型,对于每种类型,如果电影属于该类型,则存在值为1的列,如果不是,则为0

update2:所以我想为冒险专栏中有1个的每部电影计算评分的平均值,然后为每部有1个动作列的电影和未知列(未知也是流派)等等计算等级

2 个答案:

答案 0 :(得分:1)

我相信这看起来也很有效:

genres = names(DF)[2:4]
ret = lapply(genres, function(x) mean(DF[["rating"]][as.logical(DF[[x]])]))
cbind.data.frame(genres, means = unlist(ret)) #or whatever formating manipulation
#     genres    means
#1   unknown 4.358491
#2    Action 3.155702
#3 Adventure 3.416667

DF

DF = structure(list(movie.id = 1:5, unknown = c(0L, 0L, 0L, 0L, 1L
), Action = c(0L, 1L, 0L, 1L, 0L), Adventure = c(0L, 1L, 0L, 
0L, 0L), rating = c(3.831461, 3.416667, 3.945946, 2.894737, 4.358491
)), .Names = c("movie.id", "unknown", "Action", "Adventure", 
"rating"), class = "data.frame", row.names = c(NA, -5L))

答案 1 :(得分:0)

使用reshape2dplyr个包:

首先安装它们:

> install.packages("reshape2")
> install.packages("dplyr")
> require(reshape2)
> require(dplyr)

然后:

> m
  id unknown Action Adventure     rating
1  1       0      0         0 0.51391395
2  2       0      1         1 0.02915435
3  3       0      0         0 0.88752693
4  4       0      1         0 0.57660751
5  5       1      0         0 0.59169393

然后是一个单行:

> melt(m,measure=c("Action","Adventure","unknown")) %.% filter(value==1) %.% group_by(variable) %.% summarize(meanRating = mean(rating))
Source: local data frame [3 x 2]

   variable meanRating
1    Action 0.30288093
2 Adventure 0.02915435
3   unknown 0.59169393

只是为了检查,唯一不重要的是:

> mean(m$rating[m$Action==1])
[1] 0.3028809

如果您有多种类型,请将measure=参数设置为您的类型列的名称。

更改变量的名称以获得更好的东西:

> melt(m,measure=c("Action","Adventure","unknown"),variable.name="genre") %.% filter(value==1) %.% group_by(genre) %.% summarize(meanRating = mean(rating))
Source: local data frame [3 x 2]

      genre meanRating
1    Action 0.30288093
2 Adventure 0.02915435
3   unknown 0.59169393