比较R中的字符串

时间:2013-07-15 19:09:42

标签: r comparison character

我是R编程的新手。我在变量x1x25中存储了一组字符,例如x1的值"v21", "v345", "v212"x2 to x25包含类似变体的字符值,例如"v45", "v67", "v556", "v21", "v44"和他们(x1 to x25)的长度各不相同。这些都是分析结果。 我想编写一个函数来比较x1 to x25的字符值,并输出在值x1 to x25中出现五次或更多次的字符的结果。所以例如我希望看到如下结果:

"v21", "v67", "v556", "v45", "v44", "v212"

如果这些是出现x1 to x25的字符。我一直在进行视觉检查并写下结果,但这需要花费太多时间来限制我。

如果这是可能的(我知道),有人可以帮助我,所以我也可以从中学习。

由于

2 个答案:

答案 0 :(得分:3)

首先,示例设置:

x1 <- c("v21", "v67", "v556", "v45", "v44", "v212")
x2 <- c("v21", "v67", "v556", "v45", "v44", "v212")
x3 <- c("v21", "v67", "v556", "v45", "v44", "v212")
x4 <- c("v21", "v67", "v556", "v45", "v44", "v212")
x5 <- c("v22", "v61", "v56", "v3", "v4", "v20")
x6 <- c("v22", "v61", "v56", "v3", "v4", "v20")
x7 <- c("v22", "v61", "v56", "v3", "v4", "v20")
x8 <- c("v22", "v61", "v56", "v3", "v4", "v20")
x9 <- c("v22", "v61", "v56", "v3", "v4", "v20")
x10 <- c("v556")
x11 <- c("v12","v345","v55")
x12 <- c("v12","v345","v55")
x13 <- c("v12","v345","v55")
x14 <- c("v12","v345","v55")
x15 <- c("v1", "v51", "v43", "v43")
x16 <- c("v1", "v51", "v43", "v43")
x17 <- c("v1", "v51", "v43", "v43")
x18 <- c("v1", "v51", "v43", "v43")
x19 <- c("v200")
x20 <- c("v200")
x21 <- c("v200")
x22 <- c("v39","v556","v41")
x23 <- c("v39","v556","v41")
x24 <- c("v39","v556","v41")
x25 <- c("v39","v556","v41")

单独存储25个变量会使得难以使用它们。为了让他们一起使用

vars <- paste0("x",1:25)
corpus <- mget(vars)

然后corpus是一个包含所有数据的列表。要找到你想要的东西 - 所有出现至少5次的“v ###” - 创建一个表,然后对每个元素执行布尔测试。提取这些值的名称以获得“v ###”。

valTable <- table(unlist(corpus))
keepers <- names(valTable[valTable >= 5])
keepers
# [1] "v20"  "v22"  "v3"   "v4"   "v43"  "v556" "v56"  "v61" 

答案 1 :(得分:1)

这是一个答案,假设您的x在列表中。如果不是先做一个:

my.vars <- list(x1, x2, ..., x25)

corpus <- unique(unlist(my.vars))
occurences <- sapply(X=corpus,
                     FUN=function (k) {
                       occurences <- sapply(my.vars, function (l) k %in% l)
                       occurences <- sum(occurences)
                     })
names(occurences) <- corpus

i.want <- occurences[occurences >= 5]