我有以下称为temp的可复制数据集:
temp=as.data.frame(cbind(c("x3","x2","x1",NA),c("x5","x2","x1",NA),c("x2","x3","x1",NA),c("x3","x2","x1","x4"),c("x1","x2",NA,NA)))
我想计算c("x3","x2","x1")
列及其所有可能的变量(例如c("x1","x2","x3")
)处于临时状态的次数。因此,它应提供输出[2]
。
sum(sapply(temp, function(x) all(x[!is.na(x)] %in% c("x1","x2","x3"))))
很遗憾没有提供正确的解决方案。
如何计算其中包含某些值的列数及其所有变化?
答案 0 :(得分:1)
您的代表:
temp <- as.data.frame(
cbind(
c("x3", "x2", "x1", NA ),
c("x5", "x2", "x1", NA ),
c("x2", "x3", "x1", NA ),
c("x3", "x2", "x1", "x4"),
c("x1", "x2", NA , NA )
)
)
target <- c("x3", "x2", "x1")
然后,如果您要检查该列仅包含那三个级别:
sum(sapply(temp, function(x) setequal(target, levels(x))))
setequal()
检查两个集合是否相等(与顺序无关)。 levels
(因为您没有设置stringsAsFactors = FALSE
会告诉您该列中的全部内容。
这将执行相同的操作:
sum(sapply(temp, function(x) setequal(target, na.omit(x))))
如果要检查每个元素出现的次数相同,请尝试identical()
和as.character()
一起将向量转换为字符。
sum(sapply(temp, function(x) {
identical(sort(target), sort(as.character(na.omit(x))))
}))
(或者只需在原始数据集中设置stringsAsFactors = FALSE
,您就不必在这里使用as.character()
。)
答案 1 :(得分:0)
这应该有效。追踪一下唯一值是否相同:
data <- as.data.frame(cbind(c("x3","x2","x1",NA),c("x5","x2","x1",NA),c("x2","x3","x1",NA),c("x3","x2","x1","x4"),c("x1","x2",NA,NA)))
vector_pattern <- c("x3","x2","x1")
nvect <- length(vector_pattern)
cont <- 0
for(i in 1:ncol(data)){
aa <- unique(data[,i])
aa <- aa[!is.na(aa)]
if(all(!is.na(match(aa,vector_pattern))) & length(aa) == nvect){
cont <- cont + 1
}
}
print(cont)