具有缺失值的因子的变量平均值

时间:2016-03-25 01:32:57

标签: r average missing-data factors

我正在使用大数据集来按因子计算变量的均值。示例简单数据集如下所示。

 +------+-----+------+--------+------+-------+------+------+
| year | mon | site | region |  rf  | avg 1 | avg2 | avg3 |
+------+-----+------+--------+------+-------+------+------+
| 2000 | jan | A    | high   | 28.2 |       |      |      |
| 2000 | feb | A    | high   | 26.6 |       |      |      |
| 2000 | mar | A    | high   | 30.3 |       |      |      |
| 2000 | apr | A    | high   | 33.2 |       |      |      |
| 2000 | may | A    | high   |      |       |      |      |
| 2000 | jun | A    | high   | 28.3 |       |      |      |
| 2000 | jul | A    | high   | 28.6 |       |      |      |
| 2000 | aug | A    | high   | 28.9 |       |      |      |
| 2000 | sep | A    | high   | 28.1 |       |      |      |
| 2000 | oct | A    | high   | 28.8 |       |      |      |
| 2000 | nov | A    | high   | 31.6 |       |      |      |
| 2000 | dec | A    | high   | 26.9 |       |      |      |
| 2001 | jan | A    | high   | 28.6 |       |      |      |
| 2001 | feb | A    | high   | 29.6 |       |      |      |
| 2002 | jan | B    | mid    | 21.4 |       |      |      |
| 2002 | feb | B    | mid    | 24.5 |       |      |      |
| 2002 | mar | B    | mid    | 24.2 |       |      |      |
+------+-----+------+--------+------+-------+------+------+ 

但主变量(rf)有一些缺失值。但我想计算去除缺失值的平均值(平均值1,平均值avg2 avg3)。可以使用以下dput代码访问我的数据集。

structure(list(year = c(2000L, 2000L, 2000L, 2000L, 2000L, 2000L, 
2000L, 2000L, 2000L, 2000L, 2000L, 2000L, 2001L, 2001L, 2002L, 
2002L, 2002L), mon = structure(c(5L, 4L, 8L, 1L, 9L, 7L, 6L, 
2L, 12L, 11L, 10L, 3L, 5L, 4L, 5L, 4L, 8L), .Label = c("apr", 
"aug", "dec", "feb", "jan", "jul", "jun", "mar", "may", "nov", 
"oct", "sep"), class = "factor"), site = structure(c(1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L), .Label = c("A", 
"B"), class = "factor"), region = structure(c(1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L), .Label = c("high", 
"mid"), class = "factor"), rf = c(28.2, 26.6, 30.3, 33.2, NA, 
28.3, 28.6, 28.9, 28.1, 28.8, 31.6, 26.9, 28.6, 29.6, 21.4, 24.5, 
24.2), avg_rf_site_allyears = c(NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA), avg_mon_rf_all_site = c(NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA
), avg_rf_year_ele = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA)), .Names = c("year", "mon", "site", 
"region", "rf", "avg_rf_site_allyears", "avg_mon_rf_all_site", 
"avg_rf_year_ele"), class = "data.frame", row.names = c(NA, -17L
))
平均每年平均降雨量(平均每月15年)。

avg 2是所有年份所有地点的月平均降雨量

avg 3是按地区逐年平均降雨量

我正在使用以下代码,但这些代码不适用于缺少值的网站。

avg 1

df$avg.1<- with(df,ave(rf, site)) # mean rf by sites across all years. This does not calculate values for sites if it has got even one missing value.

AVG2

df$avg2<- with(df,ave(rf, mon))#this works in this example but not with my    big dataset. When I run with my dataset, it gives all NAs.

如果有人能告诉我这个问题的潜在原因,那将会很棒。

平均3 - 我需要按地区按年计算均值。但是找不到办法。

非常感谢上述任何帮助。

1 个答案:

答案 0 :(得分:0)

我们可以在 Error | java.io.FileNotFoundException: C:\Users\οΏ½οΏ½οΏ½\Documents\workspace-ggts-3.6.4.RELEASE\.metadata\.plugins\org.grails.ide.eclipse.core\as-dependencies.txt (The system cannot find the path specified) Error | at java.io.FileOutputStream.open0(Native Method) Error | at java.io.FileOutputStream.open(FileOutputStream.java:270) Error | at java.io.FileOutputStream.<init>(FileOutputStream.java:213) Error | at java.io.FileOutputStream.<init>(FileOutputStream.java:162) Error | at org.grails.ide.eclipse.runtime.shared.DependencyFileFormat$DepWriter.<init>(DependencyFileFormat.java:55) Error | at org.grails.ide.eclipse.runtime.shared.DependencyFileFormat.write(DependencyFileFormat.java:106) Error | at org.grails.ide.eclipse.runtime.GrailsBuildSettingsDependencyExtractor.writeDependencyFile(GrailsBuildSettingsDependencyExtractor.java:206) Error | at org.grails.ide.eclipse.longrunning.process.GrailsProcess.writeDependencyFile(GrailsProcess.java:200) Error | at org.grails.ide.eclipse.longrunning.process.GrailsProcess.run(GrailsProcess.java:133) Error | at org.grails.ide.eclipse.longrunning.process.GrailsProcess.main(GrailsProcess.java:93) Error | at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) Error | at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) Error | at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) Error | at java.lang.reflect.Method.invoke(Method.java:498) Error | at org.codehaus.groovy.grails.cli.support.GrailsStarter.rootLoader(GrailsStarter.java:236) Error | at org.codehaus.groovy.grails.cli.support.GrailsStarter.main(GrailsStarter.java:264) 中指定FUN参数。默认情况下,即未指定,它会为ave提供mean。因此,使用na.rm=FALSE,可以使用FUNmin等任何其他功能。

max

,类似于&#39; avg.2&#39;。

第三种情况

df$avg.1 <- with(df, ave(rf, site, 
        FUN= function(x) mean(x, na.rm=TRUE)))

如果我们使用df$avg.3 <- with(df, ave(rf, region, year, FUN= function(x) mean(x, na.rm=TRUE))

dplyr