Question

我试图在R（3.3.2）中创建一个新变量，方法是检查多列中一个因子的级别是否相同。

id<-c(1:5)
X1<-c("species1", "species1", NA, "species1", "species1")
X2<-c(NA, "species2", NA, "species2", "species2")
X3<-c("species1", "species2", "species2", "species3", "species3")

看起来应该是这样，检查X1：X3是否全部相同（忽略NAs）：

     id  X1         X2         X3         same   
[1,] 1   "species1" NA         "species1" TRUE 
[2,] 2   "species1" "species2" "species2" FALSE
[3,] 3   NA         NA         "species2" TRUE
[4,] 4   "species1" "species2" "species3" FALSE
[5,] 5   "species1" "species2" "species3" FALSE

编辑：这是我的实际数据，以及我在@ Mike的答案中使用的代码：

s$same <- apply(s[,c(2:11)], 1, function(x) length(unique((x[!is.na(x)]))) == 1)

dput(droplevels(head(s)))

structure(list(rowid = structure(c(5L, 6L, 4L, 3L, 2L, 1L), .Label = c("-68975029755346725", 
"-6985608891139937154", "-7064257681237955764", "-716653329714258929", 
"-7190954401213249258", "-7190954401427629087"), class = "factor"), 
    species1 = structure(c(3L, NA, 3L, 1L, 2L, NA), .Label = c("Mycobacterium avium complex", 
    "Mycobacterium fortuitum", "Mycobacterium kansasii"), class = "factor"), 
    species2 = structure(c(NA, NA, 4L, 2L, 3L, 1L), .Label = c(" Mycobacterium fortuitum", 
    "Mycobacterium avium complex", "Mycobacterium fortuitum", 
    "Mycobacterium kansasii"), class = "factor"), species3 = structure(c(4L, 
    NA, 3L, 1L, 2L, NA), .Label = c(" Mycobacterium avium complex", 
    " Mycobacterium fortuitum", " Mycobacterium kansasii", "Mycobacterium kansasii"
    ), class = "factor"), species4 = structure(c(NA, NA, NA, 
    NA, NA, 1L), .Label = " Mycobacterium fortuitum", class = "factor"), 
    species5 = structure(c(1L, NA, NA, NA, NA, NA), .Label = "Mycobacterium kansasii", class = "factor"), 
    species6 = structure(c(NA, NA, NA, NA, NA, 1L), .Label = " Mycobacterium fortuitum", class = "factor"), 
    species7 = structure(c(NA_integer_, NA_integer_, NA_integer_, 
    NA_integer_, NA_integer_, NA_integer_), .Label = character(0), class = "factor"), 
    species8 = structure(c(NA_integer_, NA_integer_, NA_integer_, 
    NA_integer_, NA_integer_, NA_integer_), .Label = character(0), class = "factor"), 
    species9 = structure(c(NA_integer_, NA_integer_, NA_integer_, 
    NA_integer_, NA_integer_, NA_integer_), .Label = character(0), class = "factor"), 
    species10 = structure(c(NA_integer_, NA_integer_, NA_integer_, 
    NA_integer_, NA_integer_, NA_integer_), .Label = character(0), class = "factor"), 
    same = c(TRUE, FALSE, FALSE, FALSE, FALSE, TRUE)), .Names = c("rowid", 
"species1", "species2", "species3", "species4", "species5", "species6", 
"species7", "species8", "species9", "species10", "same"), row.names = c(NA, 
6L), class = "data.frame")

第1行和第6行是正确的，但在这个群组中所有都应该是真的。

我对apply，ifelse，all和identical的每个组合尝试了duplicated和unique我能想到的，但要么你不能使用na.rm函数或我得到一个矩阵输出而不是一个新的变量。似乎有很多问题用数字变量做这个，但我很难用因子或字符串变量找到我需要的东西。在此先感谢您的帮助！

Answer 1

如何使用length和unique检查只有1个唯一值？

df <- data.frame(id = id, X1 = X1, X2 = X2, X3 = X3)
df$same <- apply(df[,c("X1","X2","X3")], 1, function(x) 
                length(unique(trimws(x[!is.na(x)]))) == 1 | length(unique(trimws(x))) == 1)

df
#  id       X1       X2       X3  same
# 1  1 species1     <NA> species1  TRUE
# 2  2 species1 species2 species2 FALSE
# 3  3     <NA>     <NA> species2  TRUE
# 4  4 species1 species2 species3 FALSE
# 5  5 species1 species2 species3 FALSE

在trimws()中添加，以消除前导/尾随空白和条件，其中所有内容均为NA。

确定R

1 个答案: