我有一个字符列表。
mylist <- list(c("apple", "banana", "cat", "dog", "elephant", "fish"),
c("apple", "banana", "camel", "doll", "egg"),
c("apple", "bag", "cat", "donkey", "elephant", "frog", "gun"),
c("apple", "ball", "cage", "dolphin", "doggy", "fishy"),
c("apple", "baggy", "catty", "doggy", "eggie", "gun_powder"))
我希望使用R中的grep函数将列表中的每个元素与其他元素完全匹配,但我得到的是部分匹配。
这是我写的代码
matched <- vector("list", length(mylist))
for(i in 1:length(mylist))
{
index <- NULL
indexx <- vector("list", length(mylist[[i]]))
for(j in 1:length(mylist[[i]]))
{
dummy <- NULL
for(k in 1:length(mylist))
{
c <- grep(mylist[[i]][j], mylist[[k]], value = TRUE, fixed = TRUE)
ind <- c(dummy, c)
dummy <- ind
}
indexx[[j]] <- ind
}
matched[[i]] <- indexx
}
请帮助我。
答案 0 :(得分:2)
取消列表
ulist = unlist(mylist)
对于ulist
的每个元素,找到所有ulist
的完全匹配项。使用等效==
而不是grep()
执行此操作,并“比较”比较。
matches0 = lapply(ulist, function(elt) ulist[ulist == elt])
最后,将匹配重新列出到原始几何
relist(matches0, mylist)
以这种方式总结结果似乎很奇怪;或许改为计算每个单词出现的次数
tbl = table(ulist)
并将这些计数用作条目
relist(tbl[ulist], mylist)
一些整理是删除table()
,
names(dimnames(tbl)) = NULL
答案 1 :(得分:0)
如果我理解正确,你想要实现的目标:
mylist <- list(c("apple", "banana", "cat", "dog", "elephant", "fish"),
c("apple", "banana", "camel", "doll", "egg"),
c("apple", "bag", "cat", "donkey", "elephant", "frog", "gun"),
c("apple", "ball", "cage", "dolphin", "doggy", "fishy"),
c("apple", "baggy", "catty", "doggy", "eggie", "gun_powder"))
ulist <- unique(unlist(mylist))
matched <- vector("list", length(ulist))
names(matched) <- ulist
### Counting every fruit
countList = function(ls, container) {
sapply(ls, function(elem) {
isEmpty = is.null(container[[elem]])
container[[elem]] <<- ifelse(isEmpty, 1, container[[elem]] + 1)
})
container
}
counted = countList(unlist(mylist), matched)
lapply(names(counted), function(lab) rep(lab, counted[[lab]]))
输出看起来像这样
[[1]]
[1] "apple" "apple" "apple" "apple" "apple"
[[2]]
[1] "banana" "banana"
[[3]]
[1] "cat" "cat"
[[4]]
[1] "dog"
[[5]]
[1] "elephant" "elephant"
[[6]]
[1] "fish"
[[7]]
[1] "camel"
[[8]]
[1] "doll"
[[9]]
[1] "egg"
[[10]]
[1] "bag"
[[11]]
[1] "donkey"
[[12]]
[1] "frog"
[[13]]
[1] "gun"
[[14]]
[1] "ball"
[[15]]
[1] "cage"
[[16]]
[1] "dolphin"
[[17]]
[1] "doggy" "doggy"
[[18]]
[1] "fishy"
[[19]]
[1] "baggy"
[[20]]
[1] "catty"
[[21]]
[1] "eggie"
[[22]]
[1] "gun_powder"
答案 2 :(得分:0)
您应该阅读有关正则表达式like this的教程
它们并不容易,但如果你使用字符串它们非常有用。这里有代码regexp
matched <- vector("list", length(mylist))
for(i in 1:length(mylist))
{
index <- NULL
indexx <- vector("list", length(mylist[[i]]))
for(j in 1:length(mylist[[i]]))
{
dummy <- NULL
for(k in 1:length(mylist))
{
c <- grep(paste("^",mylist[[i]][j],"$",sep=""),mylist[[k]],perl = TRUE, value = TRUE)
ind <- c(dummy, c)
dummy <- ind
}
indexx[[j]] <- ind
}
matched[[i]] <- indexx
}
^
simbol表示字符串的开头,$
表示结束。所以它会得到完全匹配。