Data.frame,将函数应用于列的子集,同时将多个参数传递给函数

时间:2016-01-15 02:44:59

标签: r dataframe arguments

我有一个包含数百列的数据框,我想根据另一列设置其中一些列的值。

project <- c(1,2,3)
team <- c('john,bob', 'bob,gary', 'larry')
john <- c('john','john','john')
bob <- c('bob','bob','bob')
gary <- c('gary','gary','gary')
larry <- c('larry','larry','larry')
df <- data.frame(project,team,john,bob,gary,larry)

  project     team john bob gary larry
1       1 john,bob john bob gary larry
2       2 bob,gary john bob gary larry
3       3    larry john bob gary larry

我想将一个函数应用于列df[,3:ncol(df)]。 该函数应将每个名称列的值与团队列进行比较,如果匹配,则将值设置为1

project      team  john  bob  gary  larry  ...
      1  john,bob     1    1     0      0  ... 
      2  bob,gary     0    1     1      0  ...
      3     larry     0    0     0      1  ...

我可以将函数应用于正确的列,但我不确定如何将我的团队列的值传递给函数。

df[,3:ncol(df)] <- sapply(df[,3:ncol(df)],function(x) ifelse(grepl(x, df$team),1,0)

抛出以下错误:

  

有50个或更多警告(使用警告()查看前50个

3 个答案:

答案 0 :(得分:2)

使用mapply并覆盖:

df[-(1:2)] <- mapply(grepl, pattern=names(df)[-(1:2)], x=list(df$team))+0
df
#  project     team john bob gary larry
#1       1 john,bob    1   1    0     0
#2       2 bob,gary    0   1    1     0
#3       3    larry    0   0    0     1

list(df$team)确保您不会遇到每个对象的长度问题,因为在整个pattern=向量中搜索每列的x=

答案 1 :(得分:1)

由于3中的列只有一个值,我们只能split'团队'列,使用mtabulate获取计数并将'df'中的列替换为新结果

library(qdapTools)
d1 <- mtabulate(strsplit(as.character(df$team), ","))
df[names(df)[-(1:2)]] <- d1[names(df)[-(1:2)]]
df
#   project     team john bob gary larry
#1       1 john,bob    1   1    0     0
#2       2 bob,gary    0   1    1     0
#3       3    larry    0   0    0     1

我们也可以使用mtabulate中的table代替base R

 d1 <- as.data.frame.matrix(
         table(stack(setNames(strsplit(as.character(df$team), ","), 
                  df$project))[2:1]))

然后替换上面'df'中的列。

答案 2 :(得分:0)

对于grepl这应该非常简单,因为您的名称向量在列中都是相同的:

df[ ,3:ncol(df)] <- lapply(df[ ,3:ncol(df)] , function(x) as.numeric(grepl(x, df$team) )  )
Warning messages:
1: In grepl(x, df$team) :
  argument 'pattern' has length > 1 and only the first element will be used
2: In grepl(x, df$team) :
  argument 'pattern' has length > 1 and only the first element will be used
3: In grepl(x, df$team) :
  argument 'pattern' has length > 1 and only the first element will be used
4: In grepl(x, df$team) :
  argument 'pattern' has length > 1 and only the first element will be used

# the warnings didn't affect the outcome adversely.
> df
  project     team john bob gary larry
1       1 john,bob    1   1    0     0
2       2 bob,gary    0   1    1     0
3       3    larry    0   0    0     1

如果您发出警告消失,只需使用每列中的“顶部”值进行逻辑匹配:

df[ ,3:ncol(df)] <- lapply(df[ ,3:ncol(df)] , 
                           function(x) as.numeric(grepl(x[1], df$team) )  )