Question

我想检查每个ID号的列是否一致（它们应该是常量，但数据中可能存在一些疑问，所以我想仔细检查）

例如，给定以下数据框：

test <- data.frame(ID = c("one","two","three"), 
a = c(1,1,1), 
b = c(1,1,1), 
t = c(NA,1,1), 
d = c(2,4,1))

我想检查列a，b，c和d是否完全相同，忽略缺失值。我想我可以通过计算相关列中的唯一值来做到这一点，所以我只能选择唯一值的数量大于1的行......我想这可能不是最好的方法，但这是我用我有限的知识思考的唯一方法。

我在这里发现了这个问题，这似乎与我想要做的类似： Find unique values across a row of a data frame

但我正在努力将答案应用于我的数据。我已经尝试了这个，它没有做任何事情（但我之前从未使用过for循环，所以我可能做错了），虽然当我在函数内部运行它自己的单行时它完全符合我的希望：

yeartest <- function(x){
  temp <- test[x,2:5]
  temp <- as.numeric(temp)
  veclength <- length(unique(temp[!is.na(temp)]))
  temp2 <- c(temp,veclength)
  test[,"thing"] <- NA
  test[x,2:6] <- temp2
}

for(i in 1:nrow(test)){
  yeartest(i)
}

然后我尝试了接受的答案，申请：

x <- test
# dups <- function(x) x[!duplicated(x)]
yeartest <- function(x){
  #   x <- 1
  temp <- test[x,2:5]
  temp <- as.numeric(temp)
  veclength <- length(unique(temp[!is.na(temp)]))
  temp2 <- c(temp,veclength)
  test[,"thing"] <- NA
  test[x,2:6] <- temp2
}

new.df <- t(apply(x, 1, function(x) yeartest(x)))

这给出了一个错误，所以很明显我在翻译我的数据答案时犯了错误。

道歉，这对我来说一定是一个非常明显的失败，我非常感谢你的帮助。

解决方案:(谢谢你的帮助！）

test$new <- apply(test[,2:5],1,function(r) length(unique(na.omit(r))))

Answer 1

> df <- data.frame(
    a=sample(2,10,replace=TRUE),
    b=sample(2,10,replace=TRUE),
    c=sample(c("a","b"),10,replace=TRUE),
    d=sample(c("a","b"),10,replace=TRUE))

> df[c(3,6,8),1] <- NA

> df
    a b c d
1   1 2 a b
2   1 2 a b
3  NA 2 a a
4   2 2 a b
5   1 2 a a
6  NA 1 a b
7   2 1 b b
8  NA 1 a a
9   1 1 b b
10  2 2 b b

> apply(df,1,function(r) length(unique(na.omit(r))))
 [1] 3 3 2 4 3 2 4 2 3 3

计算一行中的唯一值

1 个答案: