我有一个包含数百列的数据框,我想根据另一列设置其中一些列的值。
project <- c(1,2,3)
team <- c('john,bob', 'bob,gary', 'larry')
john <- c('john','john','john')
bob <- c('bob','bob','bob')
gary <- c('gary','gary','gary')
larry <- c('larry','larry','larry')
df <- data.frame(project,team,john,bob,gary,larry)
project team john bob gary larry
1 1 john,bob john bob gary larry
2 2 bob,gary john bob gary larry
3 3 larry john bob gary larry
我想将一个函数应用于列df[,3:ncol(df)]
。
该函数应将每个名称列的值与团队列进行比较,如果匹配,则将值设置为1
project team john bob gary larry ...
1 john,bob 1 1 0 0 ...
2 bob,gary 0 1 1 0 ...
3 larry 0 0 0 1 ...
我可以将函数应用于正确的列,但我不确定如何将我的团队列的值传递给函数。
df[,3:ncol(df)] <- sapply(df[,3:ncol(df)],function(x) ifelse(grepl(x, df$team),1,0)
抛出以下错误:
有50个或更多警告(使用警告()查看前50个
答案 0 :(得分:2)
使用mapply
并覆盖:
df[-(1:2)] <- mapply(grepl, pattern=names(df)[-(1:2)], x=list(df$team))+0
df
# project team john bob gary larry
#1 1 john,bob 1 1 0 0
#2 2 bob,gary 0 1 1 0
#3 3 larry 0 0 0 1
list(df$team)
确保您不会遇到每个对象的长度问题,因为在整个pattern=
向量中搜索每列的x=
。
答案 1 :(得分:1)
由于3中的列只有一个值,我们只能split
'团队'列,使用mtabulate
获取计数并将'df'中的列替换为新结果
library(qdapTools)
d1 <- mtabulate(strsplit(as.character(df$team), ","))
df[names(df)[-(1:2)]] <- d1[names(df)[-(1:2)]]
df
# project team john bob gary larry
#1 1 john,bob 1 1 0 0
#2 2 bob,gary 0 1 1 0
#3 3 larry 0 0 0 1
我们也可以使用mtabulate
中的table
代替base R
。
d1 <- as.data.frame.matrix(
table(stack(setNames(strsplit(as.character(df$team), ","),
df$project))[2:1]))
然后替换上面'df'中的列。
答案 2 :(得分:0)
对于grepl
这应该非常简单,因为您的名称向量在列中都是相同的:
df[ ,3:ncol(df)] <- lapply(df[ ,3:ncol(df)] , function(x) as.numeric(grepl(x, df$team) ) )
Warning messages:
1: In grepl(x, df$team) :
argument 'pattern' has length > 1 and only the first element will be used
2: In grepl(x, df$team) :
argument 'pattern' has length > 1 and only the first element will be used
3: In grepl(x, df$team) :
argument 'pattern' has length > 1 and only the first element will be used
4: In grepl(x, df$team) :
argument 'pattern' has length > 1 and only the first element will be used
# the warnings didn't affect the outcome adversely.
> df
project team john bob gary larry
1 1 john,bob 1 1 0 0
2 2 bob,gary 0 1 1 0
3 3 larry 0 0 0 1
如果您发出警告消失,只需使用每列中的“顶部”值进行逻辑匹配:
df[ ,3:ncol(df)] <- lapply(df[ ,3:ncol(df)] ,
function(x) as.numeric(grepl(x[1], df$team) ) )