我是R编程的新手。我在变量x1
到x25
中存储了一组字符,例如x1
的值"v21", "v345", "v212"
等x2 to x25
包含类似变体的字符值,例如"v45", "v67", "v556", "v21", "v44"
和他们(x1 to x25)
的长度各不相同。这些都是分析结果。
我想编写一个函数来比较x1 to x25
的字符值,并输出在值x1 to x25
中出现五次或更多次的字符的结果。所以例如我希望看到如下结果:
"v21", "v67", "v556", "v45", "v44", "v212"
如果这些是出现x1 to x25
的字符。我一直在进行视觉检查并写下结果,但这需要花费太多时间来限制我。
如果这是可能的(我知道),有人可以帮助我,所以我也可以从中学习。
由于
答案 0 :(得分:3)
首先,示例设置:
x1 <- c("v21", "v67", "v556", "v45", "v44", "v212")
x2 <- c("v21", "v67", "v556", "v45", "v44", "v212")
x3 <- c("v21", "v67", "v556", "v45", "v44", "v212")
x4 <- c("v21", "v67", "v556", "v45", "v44", "v212")
x5 <- c("v22", "v61", "v56", "v3", "v4", "v20")
x6 <- c("v22", "v61", "v56", "v3", "v4", "v20")
x7 <- c("v22", "v61", "v56", "v3", "v4", "v20")
x8 <- c("v22", "v61", "v56", "v3", "v4", "v20")
x9 <- c("v22", "v61", "v56", "v3", "v4", "v20")
x10 <- c("v556")
x11 <- c("v12","v345","v55")
x12 <- c("v12","v345","v55")
x13 <- c("v12","v345","v55")
x14 <- c("v12","v345","v55")
x15 <- c("v1", "v51", "v43", "v43")
x16 <- c("v1", "v51", "v43", "v43")
x17 <- c("v1", "v51", "v43", "v43")
x18 <- c("v1", "v51", "v43", "v43")
x19 <- c("v200")
x20 <- c("v200")
x21 <- c("v200")
x22 <- c("v39","v556","v41")
x23 <- c("v39","v556","v41")
x24 <- c("v39","v556","v41")
x25 <- c("v39","v556","v41")
单独存储25个变量会使得难以使用它们。为了让他们一起使用
vars <- paste0("x",1:25)
corpus <- mget(vars)
然后corpus
是一个包含所有数据的列表。要找到你想要的东西 - 所有出现至少5次的“v ###” - 创建一个表,然后对每个元素执行布尔测试。提取这些值的名称以获得“v ###”。
valTable <- table(unlist(corpus))
keepers <- names(valTable[valTable >= 5])
keepers
# [1] "v20" "v22" "v3" "v4" "v43" "v556" "v56" "v61"
答案 1 :(得分:1)
这是一个答案,假设您的x在列表中。如果不是先做一个:
my.vars <- list(x1, x2, ..., x25)
corpus <- unique(unlist(my.vars))
occurences <- sapply(X=corpus,
FUN=function (k) {
occurences <- sapply(my.vars, function (l) k %in% l)
occurences <- sum(occurences)
})
names(occurences) <- corpus
i.want <- occurences[occurences >= 5]