我需要根据键值对的映射来转换向量中的值:
vector <- c("dog","ant","eagle","ant","eagle","parrot")
"dog" "ant" "eagle" "ant" "eagle" "parrot"
mapping <- data.frame(key=c("dog","cat","elephant","ant","parrot","eagle"),value=c("mammal","mammal","mammal","insect","bird","bird"))
key value
dog mammal
cat mammal
elephant mammal
ant insect
parrot bird
eagle bird
所需的输出如下:
output <- ("mammal", "insect", "bird", "insect", "bird", "bird")
在真实数据集中,我必须翻译〜10000个平均长度为~15的输入向量,并且映射数据帧在一百万个密钥的范围内,在值的一侧有大约100000个唯一类。
问题本身对我来说似乎很基础,但瓶颈是运行时。在其他编程语言中,您可能会使用HashMap进行映射,然后循环遍历向量。到目前为止,R I中的任何解决方案都比Java或Python中基于HashMap的简单慢几个数量级(参见下面的评论)。
存储映射的数据结构是否比数据帧更有效?
在R中,这个问题的运行时效率最高的解决方案是什么?
答案 0 :(得分:3)
有一个名为hashmap
的软件包,非常适用于此:
library(hashmap)
hash_lookup = hashmap(mapping$key, mapping$value)
output = hash_lookup[[vector]]
<强>结果:强>
> hash_lookup
## (character) => (character)
## [cat] => [mammal]
## [elephant] => [mammal]
## [ant] => [insect]
## [dog] => [mammal]
## [eagle] => [bird]
## [parrot] => [bird]
> output
[1] "mammal" "insect" "bird" "insect" "bird" "bird"
数据:强>
vector <- c("dog","ant","eagle","ant","eagle","parrot")
mapping <- data.frame(key=c("dog","cat","elephant","ant","parrot","eagle"),
value=c("mammal","mammal","mammal","insect","bird","bird"),
stringsAsFactors = FALSE)
注意:强>
必须在更大的数据集上测试它,但这种方法应该非常快,因为它是在内部用Rcpp实现的。
答案 1 :(得分:0)
在列表中怎么样?从:
开始FamLst <- list(mammal = c("elephant", "dog"), bird = c("parrot", "eagle"))
然后您可以按位添加到列表中。例如,您可以使用FamLst$mammal
显示所有哺乳动物的列表。如果您想测试"dog"
是否是哺乳动物的成员,请使用"dog" %in% FamLst$mammal
。
答案 2 :(得分:0)
一种选择是对矢量进行分解并改变水平。
mapping = data.table(mapping)
setkey(mapping, key)
vector = factor(vector)
levels(vector) = mapping[levels(vector),value]
答案 3 :(得分:0)
您可以使用列表存储键值对,然后使用lapply和unlist的组合将动物矢量映射到键/值对列表。请参见下面的示例。
animals = c('dog', 'ant', 'eagle', 'ant', 'eagle', 'parrot')
key_value = list('dog' = 'mammal',
'cat' = 'mammal',
'elephant' = 'mammal',
'ant' = 'insect',
'parrot' = 'bird',
'eagle' = 'bird')
unlist(lapply(animals, FUN = function(x){key_value[[x]]}))
> unlist(lapply(animals, FUN = function(x){key_value[[x]]}))
[1] "mammal" "insect" "bird" "insect" "bird" "bird"