编写一个函数,用于导入数据并按可变条件计算汇总统计信息并写入输出文件

时间:2015-10-12 05:29:08

标签: r function import statistics

所以最初我有以下对象:

> head(gs)
  year disturbance lek_id  complex tot_male
1 2006           N     3T  Diamond        3
2 2007           N     3T  Diamond       17
3 1981           N   bare 3corners        4
4 1982           N   bare 3corners        7
5 1983           N   bare 3corners        2
6 1985           N   bare 3corners        5

我计算了一般统计数据: tot_male 的n,min,max,mean和sd,复杂中的年份。然后,我使用以下内容将这些在复合体中按年合并到一个数据集中:

gsnew <- gs %>% group_by(year, complex) %>%
summarise(n = length(tot_male), male_min = min(tot_male), male_max = max(tot_male), male_mean = mean(tot_male), male_sd = sd(tot_male))

导致:

> gsnew
Source: local data frame [119 x 7]
Groups: year [?]

    year  complex     n male_min male_max male_mean   male_sd
   (int)   (fctr) (int)    (int)    (int)     (dbl)     (dbl)
1   1967  Diamond     2       33      101 67.000000 48.083261
2   1969  Diamond     2       29       69 49.000000 28.284271
3   1970 3corners     1       26       26 26.000000        NA
4   1970  Diamond     4        3       51 26.250000 21.093048
5   1971 3corners     3        6       22 12.333333  8.504901

我如何用以下格式编写一般函数

FunctionName=function(Argument1,...,ArgumentN) {Statement1,...,StatementN}
• Argument1-N are any variable from object(s) • Statement1-N are any valid R statements

这允许我: •导入数据 •从数据中选择需要统计数据的指定年份; •计算lek complex中指定年份的平均值,2SD,n和90%置信区间 •将基于年度的输出写为单独的* .csv文件

year complex     mean     st.dev2  n   lo90ci    hi90ci 
2007 3corners    26.28571 52.04760 7  -393.50827 446.07970 
2007 Blue        18.87500 20.15476 8  -40.00856  77.75856 
2007 book_cliffs 4.50000  13.19091 6  -24.62443  33.62443 
2007 Diamond     13.25000 48.83431 20 -205.38461 231.88461

2 个答案:

答案 0 :(得分:0)

嗯,我觉得你很近。它可能看起来像这样:

read_write = function(file_name, this_year) {
  file_name %>%
  read.csv %>%
  filter(year == this_year) %>%
  summarise(n = length(tot_male), 
            male_min = min(tot_male), 
            male_max = max(tot_male), 
            male_mean = mean(tot_male), 
            male_sd = sd(tot_male),
            male_2sd = 2*male_sd,
            male_upper_bound = male_mean + 1.645*male_sd,
            male_lower_bound = male_mean - 1.645*male_sd) %>%
  write.csv("out_" %>% paste0(filename), row.names = false)
  }

答案 1 :(得分:0)

感谢@bramtayl

以下是最终代码:

> library(dplyr)
> annualleksummary = function(x1) {
+   x1 %>%
+   read.csv %>% 
+   filter(tot_male, year == 2007) %>% group_by(year, complex) %>%
+   summarise(n = length(tot_male), 
+             male_min = min(tot_male), 
+             male_max = max(tot_male), 
+             male_mean = mean(tot_male), 
+             male_sd = sd(tot_male),
+             male_2sd = 2*male_sd,
+             male_upper_bound = male_mean + 1.645*male_sd,
+             male_lower_bound = male_mean - 1.645*male_sd) %>%
+   write.csv("2007_" %>% paste0(x1), row.names = F) 
+   }
> annualleksummary("gsg_leks.csv")