ifelse()并在数据框中有条件地粘贴列名

时间:2017-12-05 21:16:56

标签: r

我有一个包含九个分类变量的数据框(df),第一个被称为学生,然后是八个学校科目的名称。

我想创建一个名为overall的新变量,总结学生研究的主题(dfgoal)。

问题在于我得到的东西不起作用。此外,我不知道如何最好地跳过第一栏(学生)。使用我想要使用的变量列表(八个主题)?

非常感谢任何帮助。

起点(df):

     df <-
  data.frame(
    student = c(1, 2, 3, 4, 5),
    maths = c("y", "n", "n", "n", "n"),
    English = c("n", "y", "n", "n", "n"),
    geography = c("y", "n", "n", "n", "n"),
    history = c("n", "n", "n", "n", "n"),
    art = c("n", "n", "n", "n", "n"),
    Spanish = c("n", "n", "n", "n", "n"),
    physics = c("n", "n", "n", "n", "y"),
    chemistry = c("n", "n", "n", "n", "y"),
    stringsAsFactors = TRUE
  )

期望的结果(dfgoal):

 dfgoal <-
data.frame(
student = c(1, 2, 3, 4, 5),
maths = c("y", "n", "n", "n", "n"),
English = c("n", "y", "n", "n", "n"),
geography = c("y", "n", "n", "n", "n"),
history = c("n", "n", "n", "n", "n"),
art = c("n", "n", "n", "n", "n"),
Spanish = c("n", "n", "n", "n", "n"),
physics = c("n", "n", "n", "n", "y"),
chemistry = c("n", "n", "n", "n", "y"),
overall = c("maths, geography,", "English", "n", "n", "physics,chemistry,"),
stringsAsFactors = TRUE )

当前代码:

sapply(df, function(x)
  df$overall <- ifelse(df$x == y, paste0(names(df$x), ","), "n"))

4 个答案:

答案 0 :(得分:0)

你在sapply做了一些错事。 ifelse中的第一个应该是"y"而不是y,因为它不是变量,其次paste0应该替换为pastecollaps = "," },第三个sapply(df无法正常工作,因为sapply将在列上运行,而不是您想要的行。

我就这样做了:

overall = sapply(1:nrow(df), function(x) ifelse(length(colnames(df)[which(df[x,] == "y")])!=0,paste(colnames(df)[which(df[x,] == "y")], collapse = ","),"n"))
cbind(df,new_colum) 

答案 1 :(得分:0)

单行:

dfgoal <- cbind.data.frame(
    df,
    overall = apply(df, 1, function(x)
        paste(colnames(df[-1])[x[2:length(x)] == "y"], collapse = ", ")))
dfgoal;
#  student maths English geography history art Spanish physics chemistry
#1       1     y       n         y       n   n       n       n         n
#2       2     n       y         n       n   n       n       n         n
#3       3     n       n         n       n   n       n       n         n
#4       4     n       n         n       n   n       n       n         n
#5       5     n       n         n       n   n       n       y         y
#             overall
#1   maths, geography
#2            English
#3
#4
#5 physics, chemistry        

如果您还想用"n"替换空字符串,则可以执行

levels(dfgoal$overall)[levels(dfgoal$overall) == ""] <- "n";

答案 2 :(得分:0)

library(data.table)
setDT(df)

merge(df, 
      melt(df, "student")[value == "y"][, .(overall = paste(variable, collapse = ", ")), by = student],
      by = "student",
      all.x = TRUE)
#    student maths English geography history art Spanish physics chemistry            overall
# 1:       1     y       n         y       n   n       n       n         n   maths, geography
# 2:       2     n       y         n       n   n       n       n         n            English
# 3:       3     n       n         n       n   n       n       n         n                 NA
# 4:       4     n       n         n       n   n       n       n         n                 NA
# 5:       5     n       n         n       n   n       n       y         y physics, chemistry

答案 3 :(得分:0)

TL; DR

> overall <- apply(df, 1, function(r) paste0(names(r)[r == 'y'], collapse = ', '))
> dfgoal <- cbind(df, overall)

让我们尝试推导出来:

选择要播放的第一行:

r <- df[2,]
r
  student maths English geography history art Spanish physics chemistry
    1     y       n         y      n      n     n         n     n

现在让我们首先生成一个布尔列表,其中TRUE将分配给 y 出现的位置,否则为FALSE,这是微不足道的:

> r == 'y'
  student maths English geography history   art Spanish physics chemistry
   FALSE  TRUE   FALSE      TRUE   FALSE FALSE   FALSE   FALSE     FALSE

现在你应该明白:我们将使用last作为子集参数从 names(r)向量中获取所需的元素,其中包含每个位置的实际名称:

> names(r)[r == 'y']
[1] "maths"     "geography"

现在我们只需要连接所有这些,并遍历整个数据帧:

> overall <- apply(df, 1, function(r) paste0(names(r)[r == 'y'], collapse = ', '))
> dfgoal <- cbind(df, overall)