在R中使用apply和用户定义的函数

时间:2014-06-29 17:13:07

标签: r user-defined-functions apply

我在r中定义了以下函数:

#A function that compares color and dates to determine if there is a match
getTagColor <- function(color, date){
    for (i in (1:nrow(TwistTieFix))){
        if ((color == TwistTieFix$color_match[i]) & 
            (date > TwistTieFix$color_match[i]) &       
            (date <= TwistTieFix$julian_cut_off_date[i])) {
          Data$color_code <- TwistTieFix$color_code[i]
          print(Data$color_code)
        }
    }
}

然后我使用apply()试图将函数应用于每一行。

#Apply the above function to the data set
testData <- apply(Data, 1, getTagColor(Data$tag_color,Data$julian_date))`

代码的目标是在Data中使用两个变量,并找到另一个值放入Data(color_code)中的新列,该列将基于TwistTieFix中的信息。当我运行代码时,我得到一个警告列表

In if ((color == TwistTieFix$color_match[i]) & (date >  ... :
  the condition has length > 1 and only the first element will be used 

我无法确定为什么函数不使用每一行中的日期和颜色并在函数中使用它(至少这是我认为这里出错的地方)。谢谢!

以下是正在使用的数据框的示例:

TwistTieFix

color_name   date          color_code     cut_off_date      color_match       julian_start      julian_cut_off_date
yellow       2013-08-12    y1             2001-07-02        yellow            75                389
blue         2000-09-28    b1             2001-08-12        blue              112               430

数据

coll_date      julian_date    tag_color
2013-08-13     76             yellow
2013-08-14     76             yellow
2000-09-29     112            blue

数据包含更多不同变量的列,但我不允许包含所有列。但是,我在函数中引用了Data中的列。使用read.csv将数据集加载到r中,并且来自Excel csv文件。

1 个答案:

答案 0 :(得分:1)

对我而言,您似乎想要加入tag_color=color_matchjulian_start <= julian_date <= julian_cut_off_date的数据和TwistTieFix。以下是dput表单

中的示例data.sets
TwistTieFix <- structure(list(color_name = structure(c(2L, 1L), .Label = c("blue", 
"yellow"), class = "factor"), date = structure(c(2L, 1L), .Label = c("2000-09-28", 
"2013-08-12"), class = "factor"), color_code = structure(c(2L, 
1L), .Label = c("b1", "y1"), class = "factor"), cut_off_date = structure(1:2, .Label = c("2001-07-02", 
"2001-08-12"), class = "factor"), color_match = structure(c(2L, 
1L), .Label = c("blue", "yellow"), class = "factor"), julian_start = c(75L, 
112L), julian_cut_off_date = c(389L, 430L)), .Names = c("color_name", 
"date", "color_code", "cut_off_date", "color_match", "julian_start", 
"julian_cut_off_date"), class = "data.frame", row.names = c(NA, 
-2L))

Data <- structure(list(coll_date = structure(c(2L, 3L, 1L), .Label = c("2000-09-29", 
"2013-08-13", "2013-08-14"), class = "factor"), julian_date = c(76L, 
76L, 112L), tag_color = structure(c(2L, 2L, 1L), .Label = c("blue", 
"yellow"), class = "factor")), .Names = c("coll_date", "julian_date", 
"tag_color"), class = "data.frame", row.names = c(NA, -3L))

执行此合并的简单方法是使用data.table库。你可以做到

#convert to data.table and set keys
ttf<-setDT(TwistTieFix)
setkey(ttf, color_match, julian_start)

dt<-setDT(Data)
setkey(dt, tag_color, julian_date)

#merge and extract columns
ttf[dt, roll=T][julian_start<julian_cut_off_date,list(coll_date, 
    julian_date=julian_start, tag_color=color_match, color_code)]

获取

    coll_date julian_date tag_color color_code
1: 2000-09-29         112      blue         b1
2: 2013-08-13          76    yellow         y1
3: 2013-08-14          76    yellow         y1