我有关于几个都市区的数据,其他数据应用于它们,其中一行是该区域的评级。只有我遇到的问题是行中的NA值。
数据看起来像这样:
"ID", "Name", "Type", "Amount", "Rating", "Date"
1,"Location A", "SomeType", 8000, 9.2, "2015-04-10"
2,"Location B", "SomeType", 2300, 7.4, "2015-04-10"
3,"Location C", "SomeType", 5400, NA, "2015-04-10"
4,"Location A", "SomeType", 4300, 8.5, "2015-04-10"
5,"Location B", "SomeType", 8670, 6.9, "2015-04-10"
6,"Location A", "SomeType", 7600, NA, "2015-04-10"
7,"Location A", "SomeType", 3400, 8.2, "2015-04-10"
8,"Location B", "SomeType", 6500, NA, "2015-04-10"
9,"Location C", "SomeType", 7800, 9.2, "2015-04-10"
最终我想要像这样
Name Average Rating
Location A {average rating}
Location B {average rating}
Location C {average rating}
显然,每个位置的评级,但它与NA值保持一致。数据直接从CSV读取。除了NA值之外,我如何才能获得每个位置的平均评分?
我已经使用plyr
尝试了它,但它现在返回NULL:
mean_ratings = ddply(data, .(Name), summarize, Rating=mean(Rating))
答案 0 :(得分:1)
library(data.table)
dt = data.table("Name"=c("Location A","Location B","Location C","Location A","Location B",
"Location A","Location A","Location B","Location C"),
"Rating"=c(9.2, 7.4, NA, 8.5,6.9,NA,8.2,NA,9.2))
> dt
Name Rating
1: Location A 9.2
2: Location B 7.4
3: Location C NA
4: Location A 8.5
5: Location B 6.9
6: Location A NA
7: Location A 8.2
8: Location B NA
9: Location C 9.2
dt[, mean(Rating, na.rm = T),by = "Name"]
Name V1
1: Location A 8.633333
2: Location B 7.150000
3: Location C 9.200000
plyr解决方案:
ddply(dt, "Name", function(x) mean(x$Rating,na.rm = T))