获得某些大都市区的平均评分

时间:2016-11-13 18:37:18

标签: r

我有关于几个都市区的数据,其他数据应用于它们,其中一行是该区域的评级。只有我遇到的问题是行中的NA值。

数据看起来像这样:

"ID", "Name", "Type", "Amount", "Rating", "Date"
1,"Location A", "SomeType", 8000, 9.2, "2015-04-10"
2,"Location B", "SomeType", 2300, 7.4, "2015-04-10"
3,"Location C", "SomeType", 5400, NA, "2015-04-10"
4,"Location A", "SomeType", 4300, 8.5, "2015-04-10"
5,"Location B", "SomeType", 8670, 6.9, "2015-04-10"
6,"Location A", "SomeType", 7600, NA, "2015-04-10"
7,"Location A", "SomeType", 3400, 8.2, "2015-04-10"
8,"Location B", "SomeType", 6500, NA, "2015-04-10"
9,"Location C", "SomeType", 7800, 9.2, "2015-04-10"

最终我想要像这样

Name         Average Rating
Location A   {average rating}
Location B   {average rating}
Location C   {average rating}

显然,每个位置的评级,但它与NA值保持一致。数据直接从CSV读取。除了NA值之外,我如何才能获得每个位置的平均评分?

我已经使用plyr尝试了它,但它现在返回NULL:

mean_ratings = ddply(data, .(Name), summarize, Rating=mean(Rating))

1 个答案:

答案 0 :(得分:1)

library(data.table)
dt = data.table("Name"=c("Location A","Location B","Location C","Location A","Location B",
                     "Location A","Location A","Location B","Location C"), 
            "Rating"=c(9.2, 7.4, NA, 8.5,6.9,NA,8.2,NA,9.2))

> dt
         Name Rating
1: Location A    9.2
2: Location B    7.4
3: Location C     NA
4: Location A    8.5
5: Location B    6.9
6: Location A     NA
7: Location A    8.2
8: Location B     NA
9: Location C    9.2

dt[, mean(Rating, na.rm = T),by = "Name"]
        Name       V1
1: Location A 8.633333
2: Location B 7.150000
3: Location C 9.200000

plyr解决方案:

ddply(dt, "Name", function(x) mean(x$Rating,na.rm = T))