Question

我在数据框中有一些字符变量列。下面给出了我感兴趣的列的两行示例：

a <- rep('Agree', 20)
b <- rep(c('Disagree', 'Agree'), 10)
dat <- data.frame(rbind(a,b), stringsAsFactors = FALSE)

我想识别每个列具有相同值的所有行。例如，使用dplyr mutate，我想创建一个名为'allSame'的新变量，其中'dat'的第一行中的值将为'yes'，而第二行中的值将为'no'。

我还想按数字而不是名称对列进行索引，因为有些变量的名称非常长，而且数据框中有多组列我想要这样做。

Answer 1

以下是检查您是否有相同答案的一种方法（即，所有同意或所有不同意）。我创建了一个最小的样本并执行了以下操作。你想检查每一行是否有＆＃34;同意＆＃34;或者＆＃34;不同意＆＃34;只要。您可以使用逻辑检查。 mydf == "Agree"返回带有T或F的矩阵。使用rowSums()，您可以计算每行中T的次数。如果结果等于ncol（mydf），在这种情况下是3，那么你有＆＃34;同意＆＃34;只要。如果你有0，你就不会有＃34;不同意＆＃34;只要。我想你想要这些案例是肯定的。 allSame中的TRUE表示是。

mydf <- data.frame(col1 = c("Agree", "Agree", "Disagree"),
                   col2 = c("Agree", "Disagree", "Disagree"),
                   col3 = c("Agree", "Disagree", "Disagree"),
                   stringsAsFactors = FALSE)

#      col1     col2     col3
#1    Agree    Agree    Agree
#2    Agree Disagree Disagree
#3 Disagree Disagree Disagree

mydf %>%
mutate(allSame = (rowSums(mydf == "Agree") == 0 |
                  rowSums(mydf == "Agree") == ncol(mydf)))

#      col1     col2     col3 allSame
#1    Agree    Agree    Agree    TRUE
#2    Agree Disagree Disagree   FALSE
#3 Disagree Disagree Disagree    TRUE

鉴于上述情况，你会这样做：

dat %>%
mutate(allSame = (rowSums(dat == "Agree") == 0 |
                  rowSums(dat == "Agree") == ncol(dat)))

Answer 2

如果要独立迭代每一行，请使用sapply。可能值得查看functionals

a <- rep('Agree', 20)
b <- rep(c('Disagree', 'Agree'), 10)
df <- data.frame(a, b, stringsAsFactors =F)


df <- mutate(df, same = sapply(1:nrow(df), function(i){
  if(a[i] == b[i]){'yes'} else {'no'}
}))

重命名应使用names

names(df) <- paste0('index_', 1:length(names(df))

使用dplyr测试多列中的值是否相同

2 个答案: