我在r中定义了以下函数:
#A function that compares color and dates to determine if there is a match
getTagColor <- function(color, date){
for (i in (1:nrow(TwistTieFix))){
if ((color == TwistTieFix$color_match[i]) &
(date > TwistTieFix$color_match[i]) &
(date <= TwistTieFix$julian_cut_off_date[i])) {
Data$color_code <- TwistTieFix$color_code[i]
print(Data$color_code)
}
}
}
然后我使用apply()试图将函数应用于每一行。
#Apply the above function to the data set
testData <- apply(Data, 1, getTagColor(Data$tag_color,Data$julian_date))`
代码的目标是在Data中使用两个变量,并找到另一个值放入Data(color_code)中的新列,该列将基于TwistTieFix中的信息。当我运行代码时,我得到一个警告列表
In if ((color == TwistTieFix$color_match[i]) & (date > ... :
the condition has length > 1 and only the first element will be used
我无法确定为什么函数不使用每一行中的日期和颜色并在函数中使用它(至少这是我认为这里出错的地方)。谢谢!
以下是正在使用的数据框的示例:
TwistTieFix
color_name date color_code cut_off_date color_match julian_start julian_cut_off_date
yellow 2013-08-12 y1 2001-07-02 yellow 75 389
blue 2000-09-28 b1 2001-08-12 blue 112 430
数据
coll_date julian_date tag_color
2013-08-13 76 yellow
2013-08-14 76 yellow
2000-09-29 112 blue
数据包含更多不同变量的列,但我不允许包含所有列。但是,我在函数中引用了Data中的列。使用read.csv将数据集加载到r中,并且来自Excel csv文件。
答案 0 :(得分:1)
对我而言,您似乎想要加入tag_color=color_match
和julian_start <= julian_date <= julian_cut_off_date
的数据和TwistTieFix。以下是dput
表单
TwistTieFix <- structure(list(color_name = structure(c(2L, 1L), .Label = c("blue",
"yellow"), class = "factor"), date = structure(c(2L, 1L), .Label = c("2000-09-28",
"2013-08-12"), class = "factor"), color_code = structure(c(2L,
1L), .Label = c("b1", "y1"), class = "factor"), cut_off_date = structure(1:2, .Label = c("2001-07-02",
"2001-08-12"), class = "factor"), color_match = structure(c(2L,
1L), .Label = c("blue", "yellow"), class = "factor"), julian_start = c(75L,
112L), julian_cut_off_date = c(389L, 430L)), .Names = c("color_name",
"date", "color_code", "cut_off_date", "color_match", "julian_start",
"julian_cut_off_date"), class = "data.frame", row.names = c(NA,
-2L))
Data <- structure(list(coll_date = structure(c(2L, 3L, 1L), .Label = c("2000-09-29",
"2013-08-13", "2013-08-14"), class = "factor"), julian_date = c(76L,
76L, 112L), tag_color = structure(c(2L, 2L, 1L), .Label = c("blue",
"yellow"), class = "factor")), .Names = c("coll_date", "julian_date",
"tag_color"), class = "data.frame", row.names = c(NA, -3L))
执行此合并的简单方法是使用data.table
库。你可以做到
#convert to data.table and set keys
ttf<-setDT(TwistTieFix)
setkey(ttf, color_match, julian_start)
dt<-setDT(Data)
setkey(dt, tag_color, julian_date)
#merge and extract columns
ttf[dt, roll=T][julian_start<julian_cut_off_date,list(coll_date,
julian_date=julian_start, tag_color=color_match, color_code)]
获取
coll_date julian_date tag_color color_code
1: 2000-09-29 112 blue b1
2: 2013-08-13 76 yellow y1
3: 2013-08-14 76 yellow y1