在R中为多个数据集使用一个函数

时间:2016-05-04 20:58:51

标签: r function

我生成了这样的函数来检查数据集'a'的一个变量:

d <- function(x)
{
  a <- sort(levels(as.factor(x)),decreasing=T)[1:3]
  for (i in 1:length(a))
  {
    if (any(table(x[i])==a[i])<600)
    {
      returnlist <- paste(" Month(s) having less data is/are ", x[i])
      return(returnlist)
    }
    else {
      return(print(" All the recent three months have good enough data "))
         }
  }
}

d(a$YEARMONTH)

现在我还有三个要检查的数据集。 如何编写一个能够同时获取所有4个数据集并给出各自结果的函数?我必须使用这4个数据集作为参数吗? 还建议我如何编写返回语句,将标题作为相应的数据集名称,下面我需要该数据集的结果。

我传递给函数的变量如下所示:

Apr-2014 
Apr-2015
Apr-2016
Aug-2013
Aug-2014
Aug-2015
Dec-2013
Dec-2014 
Dec-2015 
Feb-2014....

这几个月是响应者与年份一起进行调查的月份。所以每个月都有很多响应者。

@Frank..Thank you for the above lapply function. It worked but I am getting only the first record of each dataset. 
My output is looking like this for now-
1  Month(s) having less data is/are  201604
2  Month(s) having less data is/are  201604
3  Month(s) having less data is/are  201604
4  Month(s) having less data is/are  201604

  For example: If my a , b,c,d datasets have yearmonth values as-

A$yearmonth
201604 201603 201602
34  652 643

B$yearmonth

201604 201603 201602
678 78  98

C$yearmonth
201604 201603 201602
675 897 678

D$yearmonth
201604 201603 201602
566 788 90

So here my function should give output for counts<600 of each dataset.
A$yearmonth
2016
34
B$yeamonth
201603 201602
78             98
D$yearmonth
201602
90

I don’t think my function is checking all the three values of ‘a’ of each argument. How should it be solved?
And also how should I get the counts also to be displayed in the output? How can I get argument name in the return statement so that I can relate my output to that dataset?

2 个答案:

答案 0 :(得分:1)

使用d函数扩展Richard Scriven的评论:

lapply(list(A$yearmonth, B$yearmonth, C$yearmonth, D$yearmonth), d)

更进一步,这是构建d函数以产生您想到的输出的另一种方式:

d <- function(df)
{
  a <- sort(levels(as.factor(df$yearmonth)),decreasing=T)[1:3]
  b <- as.data.frame(table(df[df$yearmonth %in% a,]))
  c <- b[b$Freq < 600,]$Var1
  if(length(c)>0){
    print(paste("Month(s) having less data is/are", paste(c$Var1, collapse=', ')))
  else {
    print(" All the recent three months have good enough data ")
  }
}

lapply(list(A, B, C, D), d)

答案 1 :(得分:0)

@Richard Scriven建议......

将数据框加载到工作区并运行

mydf.list <- lapply(ls(), function(x) if (class(get(x)) == "data.frame")
              get(x)) # create a list of all the data frames in workspace

# apply your function on the list of dataframe, this will return list
my.results <- lapply(mydf.list,d) 

# to get back the results as data frame
data.frame(Reduce(rbind, my.results))