有没有办法让大规模的匹配值更具编程性?基本上我想要做的是在数据帧上添加一堆用于值查找的列,但我不想每次都写入match []参数。这似乎是 mapply 的一个用例,但我不知道如何在这里使用它。有什么建议?
以下是数据:
data <- data.frame(
region = sample(c("northeast","midwest","west"), 50, replace = T),
climate = sample(c("dry","cold","arid"), 50, replace = T),
industry = sample(c("tech","energy","manuf"), 50, replace = T))
以及相应的查找表:
lookups <- data.frame(
orig_val = c("northeast","midwest","west","dry","cold","arid","tech","energy","manuf"),
look_val = c("dir1","dir2","dir3","temp1","temp2","temp3","job1","job2","job3")
)
所以现在我想要做的是:首先在“数据”中添加一个名为“reg_lookups”的列,它将在“lookups”中将该区域与其适当的值匹配。对“climate_lookups”等做同样的事情。
现在,我已经弄得一团糟了:
data$reg_lookup <- lookups$look_val[match(data$region, lookups$orig_val)]
data$clim_lookup <- lookups$look_val[match(data$climate, lookups$orig_val)]
data$indus_lookup <- lookups$look_val[match(data$industry, lookups$orig_val)]
我已经尝试使用一个函数来执行此操作,但该函数似乎不起作用,因此将其应用于 mapply 是一个禁忌(加上我对如何mapply语法可以在这里工作):
match_fun <- function(df, newval, df_look, lookup_val, var, ref_val) {
df$newval <- df_look$lookup_val[match(df$var, df_look$ref_val)]
return(df)
}
data2 <- match_fun(data, reg_2, lookups, look_val, region, orig_val)
答案 0 :(得分:0)
我认为你只是想这样做:
data <- merge(data,lookups[1:3,],by.x = "region",by.y = "orig_val",all.x = TRUE)
data <- merge(data,lookups[4:6,],by.x = "climate",by.y = "orig_val",all.x = TRUE)
data <- merge(data,lookups[7:9,],by.x = "industry",by.y = "orig_val",all.x = TRUE)
但是将查找存储在单独的数据帧中要好得多。这样,您可以更轻松地控制新列的名称。它还允许你做这样的事情:
lookups1 <- split(lookups,rep(1:3,each = 3))
colnames(lookups1[[1]]) <- c('region','reg_lookup')
colnames(lookups1[[2]]) <- c('climate','clim_lookup')
colnames(lookups1[[3]]) <- c('industry','indus_lookup')
do.call(cbind,mapply(merge,
x = list(data[,1,drop = FALSE],data[,2,drop =FALSE],data[,3,drop = FALSE]),
y = lookups1,
moreArgs = list(all.x = TRUE),
SIMPLIFY = FALSE))
并且您应该能够在函数中包含do.call
位。
我使用data[,1,drop = FALSE]
将它们保存为一个列数据框。
构建mapply
调用的方式是将命名参数作为列表传递(x =
和y =
部分)。我希望确保保留data
中的所有行,因此我通过all.x = TRUE
传递moreArgs
,以便每次调用merge
时都会传递。最后,我需要自己将它们拼接在一起,所以我关掉了SIMPLIFY
。