我有一个包含九个分类变量的数据框(df),第一个被称为学生,然后是八个学校科目的名称。
我想创建一个名为overall的新变量,总结学生研究的主题(dfgoal)。
问题在于我得到的东西不起作用。此外,我不知道如何最好地跳过第一栏(学生)。使用我想要使用的变量列表(八个主题)?
非常感谢任何帮助。
起点(df):
df <-
data.frame(
student = c(1, 2, 3, 4, 5),
maths = c("y", "n", "n", "n", "n"),
English = c("n", "y", "n", "n", "n"),
geography = c("y", "n", "n", "n", "n"),
history = c("n", "n", "n", "n", "n"),
art = c("n", "n", "n", "n", "n"),
Spanish = c("n", "n", "n", "n", "n"),
physics = c("n", "n", "n", "n", "y"),
chemistry = c("n", "n", "n", "n", "y"),
stringsAsFactors = TRUE
)
期望的结果(dfgoal):
dfgoal <-
data.frame(
student = c(1, 2, 3, 4, 5),
maths = c("y", "n", "n", "n", "n"),
English = c("n", "y", "n", "n", "n"),
geography = c("y", "n", "n", "n", "n"),
history = c("n", "n", "n", "n", "n"),
art = c("n", "n", "n", "n", "n"),
Spanish = c("n", "n", "n", "n", "n"),
physics = c("n", "n", "n", "n", "y"),
chemistry = c("n", "n", "n", "n", "y"),
overall = c("maths, geography,", "English", "n", "n", "physics,chemistry,"),
stringsAsFactors = TRUE )
当前代码:
sapply(df, function(x)
df$overall <- ifelse(df$x == y, paste0(names(df$x), ","), "n"))
答案 0 :(得分:0)
你在sapply
做了一些错事。 ifelse
中的第一个应该是"y"
而不是y
,因为它不是变量,其次paste0
应该替换为paste
和collaps = ","
},第三个sapply(df
无法正常工作,因为sapply
将在列上运行,而不是您想要的行。
我就这样做了:
overall = sapply(1:nrow(df), function(x) ifelse(length(colnames(df)[which(df[x,] == "y")])!=0,paste(colnames(df)[which(df[x,] == "y")], collapse = ","),"n"))
cbind(df,new_colum)
答案 1 :(得分:0)
单行:
dfgoal <- cbind.data.frame(
df,
overall = apply(df, 1, function(x)
paste(colnames(df[-1])[x[2:length(x)] == "y"], collapse = ", ")))
dfgoal;
# student maths English geography history art Spanish physics chemistry
#1 1 y n y n n n n n
#2 2 n y n n n n n n
#3 3 n n n n n n n n
#4 4 n n n n n n n n
#5 5 n n n n n n y y
# overall
#1 maths, geography
#2 English
#3
#4
#5 physics, chemistry
如果您还想用"n"
替换空字符串,则可以执行
levels(dfgoal$overall)[levels(dfgoal$overall) == ""] <- "n";
答案 2 :(得分:0)
library(data.table)
setDT(df)
merge(df,
melt(df, "student")[value == "y"][, .(overall = paste(variable, collapse = ", ")), by = student],
by = "student",
all.x = TRUE)
# student maths English geography history art Spanish physics chemistry overall
# 1: 1 y n y n n n n n maths, geography
# 2: 2 n y n n n n n n English
# 3: 3 n n n n n n n n NA
# 4: 4 n n n n n n n n NA
# 5: 5 n n n n n n y y physics, chemistry
答案 3 :(得分:0)
TL; DR
> overall <- apply(df, 1, function(r) paste0(names(r)[r == 'y'], collapse = ', '))
> dfgoal <- cbind(df, overall)
让我们尝试推导出来:
选择要播放的第一行:
r <- df[2,]
r
student maths English geography history art Spanish physics chemistry
1 y n y n n n n n
现在让我们首先生成一个布尔列表,其中TRUE将分配给 y 出现的位置,否则为FALSE,这是微不足道的:
> r == 'y'
student maths English geography history art Spanish physics chemistry
FALSE TRUE FALSE TRUE FALSE FALSE FALSE FALSE FALSE
现在你应该明白:我们将使用last作为子集参数从 names(r)向量中获取所需的元素,其中包含每个位置的实际名称:
> names(r)[r == 'y']
[1] "maths" "geography"
现在我们只需要连接所有这些,并遍历整个数据帧:
> overall <- apply(df, 1, function(r) paste0(names(r)[r == 'y'], collapse = ', '))
> dfgoal <- cbind(df, overall)