我正在尝试根据另一列从一组列中提取值。以第一行为例:
-取CodeToMatch
的值= 1
-搜索以下列:Code.1
,Code.2
,Code.3
以查找1的值。在这种情况下,它位于第三列中,因此从pCode.1
,pCode.2
,pCode3
中返回第三列的值,即“ p4”
下面我的示例df中的expected_outcome
列显示了我的追求。
非常感谢您的帮助!
c1 <- c("1","2","3")
c2 <- c("8","1","3")
c3 <- c("4","2","4")
c4 <- c("1","3","5")
c5 <- c("p1","p2","p3")
c6 <- c("p8","p1","p3")
c7 <- c("p4","p2","p4")
c8 <- c("p4","p1","p3")
df <- data.frame(c1,c2,c3,c4,c5,c6,c7,c8)
colnames(df)[c(1:8)] <- c("CodeToMatch","Code.1","Code.2","Code.3","pCode.1","pCode.2","pCode.3","expected_output")
答案 0 :(得分:3)
data.table解决方案
样本数据
df <- structure(list(CodeToMatch = structure(1:3, .Label = c("1", "2",
"3"), class = "factor"), Code.1 = structure(c(3L, 1L, 2L), .Label = c("1",
"3", "8"), class = "factor"), Code.2 = structure(c(2L, 1L, 2L
), .Label = c("2", "4"), class = "factor"), Code.3 = structure(1:3, .Label = c("1",
"3", "5"), class = "factor"), pCode.1 = structure(1:3, .Label = c("p1",
"p2", "p3"), class = "factor"), pCode.2 = structure(c(3L, 1L,
2L), .Label = c("p1", "p3", "p8"), class = "factor"), pCode.3 = structure(c(2L,
1L, 2L), .Label = c("p2", "p4"), class = "factor")), class = "data.frame", row.names = c(NA,
-3L))
代码
library(data.table)
#first, melt wide table to long format
df.melt <- melt( setDT(df), id.vars="CodeToMatch", measure.vars = patterns(Code="^Code\\..*", pCode="^pCode.*"))
#now finding everything is easy...
df.melt[ Code == CodeToMatch, .(CodeToMatch, pCode)]
输出
# CodeToMatch pCode
# 1: 3 p3
# 2: 2 p1
# 3: 1 p4
答案 1 :(得分:0)
我不知道这有多概括,但这是一个选择
nCode <- 3
df$expected_output <- apply(df, 1, function(x) x[nCode + 1 + which(x[2:(nCode + 1)] == x[1])])
df$expected_output
#[1] "p4" "p1" "p3"
请注意,“代码”列的数量是硬编码的。在您的情况下,您有3个"Code"
列与匹配的"pCode"
列。根据需要进行调整。这也假定第一列始终包含要匹配的代码号。
答案 2 :(得分:0)
根据名称中的模式分隔code和pCode列。找出每行CodeToMatch
中的code_columns
的索引,并使用pcode_columns
从中提取相应的mapply
。
code_columns <- grep("^Code\\.[0-9]+", names(df))
pcode_columns <- grep("^pCode", names(df))
mapply(function(x, y) df[x, pcode_columns][df[x, code_columns]==y],
1:nrow(df), df$CodeToMatch)
#[1] "p4" "p1" "p3"
Ran
df[1:4] <- lapply(df[1:4], function(x) as.numeric(as.character(x)))
将数字列保留为数字而非因数。