我正在尝试创建一个能够生成条件值的新变量的函数。我有一个包含100多列的调查数据集,这些数据集将相应地折叠。阅读this,但没有帮助。
'data.frame': 117 obs. of 7 variables:
$ fin_partner: Factor w/ 4 levels "","9","No","Yes": 2 2 4 3 2 2 2 2 4 4 ...
$ fin_parent : Factor w/ 4 levels "","9","No","Yes": 2 2 2 2 2 2 4 3 2 2 ...
$ fin_kids : Factor w/ 4 levels "","9","No","Yes": 4 2 2 2 2 2 2 2 2 2 ...
$ fin_othkids: Factor w/ 4 levels "","9","No","Yes": 2 2 2 2 2 2 3 2 2 2 ...
$ fin_fam : Factor w/ 4 levels "","9","No","Yes": 2 2 2 2 2 2 4 3 2 2 ...
$ fin_friend : Factor w/ 4 levels "","9","No","Yes": 2 2 3 3 2 2 2 2 4 2 ...
$ fin_oth : Factor w/ 4 levels "","9","No","Yes": 2 2 2 2 2 2 2 2 4 2 ...
我希望能够根据列对数据集进行子集化,然后将其传递给函数。现在,这些值包含"是"," No"," 999" (缺少)。
我的目标是能够说明,对于每一行,任何列是否包含"是",然后新列将填充"是"。我相信有一种比下面代码更简单的方法,所以我对此持开放态度。
目前我的代码:
trial <- df[, 23:29]
trial.test <- as.data.frame(trial)
composite_score <- function(x){
# Convert to numeric values
change_to_number <- function(j) {
for (i in 1:length(j)){
if(i == "Yes"){
i <- 1
}
else{
i <- 0
}
}
}
x <- change_to_number(x)
new_col_var <- function(k){
if(rowSums(k) > 0){
k$newvar <- 1
}
else {
k$newvar <- 0
}
}
x <- new_col_var(x)
}
composite_score(trial.test)
代码产生以下错误:
Error in rowSums(k) : 'x' must be an array of at least two dimensions
数据:
> dput(head(trial.test))
structure(list(fin_partner = structure(c(2L, 2L, 4L, 3L, 2L,
2L), .Label = c("", "9", "No", "Yes"), class = "factor"), fin_parent = structure(c(2L,
2L, 2L, 2L, 2L, 2L), .Label = c("", "9", "No", "Yes"), class = "factor"),
fin_kids = structure(c(4L, 2L, 2L, 2L, 2L, 2L), .Label = c("",
"9", "No", "Yes"), class = "factor"), fin_othkids = structure(c(2L,
2L, 2L, 2L, 2L, 2L), .Label = c("", "9", "No", "Yes"), class = "factor"),
fin_fam = structure(c(2L, 2L, 2L, 2L, 2L, 2L), .Label = c("",
"9", "No", "Yes"), class = "factor"), fin_friend = structure(c(2L,
2L, 3L, 3L, 2L, 2L), .Label = c("", "9", "No", "Yes"), class = "factor"),
fin_oth = structure(c(2L, 2L, 2L, 2L, 2L, 2L), .Label = c("",
"9", "No", "Yes"), class = "factor")), .Names = c("fin_partner",
"fin_parent", "fin_kids", "fin_othkids", "fin_fam", "fin_friend",
"fin_oth"), row.names = c(NA, 6L), class = "data.frame")
答案 0 :(得分:1)
您的change_to_number
函数严重损坏 - 它仅将i
更改为1或0,这对输入没有任何结果。您可以将其更改为:
change_to_number <- function(j){
sapply(j, function(x) +(x=="yes"))
}
或者,将整体功能更改为:
composite_score <- function(x){
+(apply(x, 1, function(z) ("yes" %in% z)))
}
然后运行你的功能:
dat$newcol <- composite_score(dat)
说明:您想知道每行中是否有"yes"
。要查看是否存在,您可以为每一行运行以下命令:
"yes" %in% trial.test[1, ]
"yes" %in% trial.test[2, ]....
要做到这一点,你可以使用如下的apply - 我们在z中应用函数“yes”,跨行(1),每行作为z传递给函数:
tempdata <- apply(trial.test, 1, function(z) ("yes" %in% z))
tempdata
每行应获得TRUE
或FALSE
。现在我们可以做一个技巧,其中R会将TRUE
转换为1,将FALSE
转换为0:
as.numeric(tempdata)
+(tempdata) #same, less typing
如果我们把它们放在一起,你会得到新专栏:
+(apply(trial.test, 1, function(z) ("yes" %in% z)))
答案 1 :(得分:1)
感谢发布数据,它可以实际检查我写的内容!
# Loading your data
trial.test <- structure(list(fin_partner = [... redacted ...], class = "data.frame")
# computing the new variable
# the MARGIN=1 arg precises that we are working on the rows
# the applied function just looks for a "Yes" in the row
# and returns "Yes" if... yes, "No" otherwise.
myvar <- apply(trial.test, MARGIN=1, FUN=function(row)
ifelse(any("Yes" %in% row), "Yes", "No"))
# converting it to factor
myvar <- factor(myvar)
# putting it in trial.test just for illustration
cbind(trial.test, summary=myvar)
这给出了:
fin_partner fin_parent fin_kids fin_othkids fin_fam fin_friend fin_oth summary
1 9 9 Yes 9 9 9 9 Yes
2 9 9 9 9 9 9 9 No
3 Yes 9 9 9 9 No 9 Yes
4 No 9 9 9 9 No 9 No
5 9 9 9 9 9 9 9 No
6 9 9 9 9 9 9 9 No
答案 2 :(得分:0)
library(tidyr)
library(dplyr)
library(magrittr)
trial.test %<>% mutate(row_number = 1:n())
answer =
trial.test %>%
gather(variable, value, -row_number) %>%
filter(value == "Yes") %>%
select(-variable) %>%
distinct %>%
right_join(trial.test)