使用R中的键值映射(相当于HashMap)转换值向量

时间:2015-08-04 12:05:00

标签: r hashmap dataframe mapping

我需要根据键值对的映射来转换向量中的值:

vector <- c("dog","ant","eagle","ant","eagle","parrot") 

  "dog"  "ant"  "eagle"  "ant"  "eagle"  "parrot"


mapping <- data.frame(key=c("dog","cat","elephant","ant","parrot","eagle"),value=c("mammal","mammal","mammal","insect","bird","bird"))

  key      value
  dog      mammal
  cat      mammal
  elephant mammal
  ant      insect
  parrot   bird
  eagle    bird

所需的输出如下:

output <- ("mammal", "insect", "bird", "insect", "bird", "bird") 

在真实数据集中,我必须翻译〜10000个平均长度为~15的输入向量,并且映射数据帧在一百万个密钥的范围内,在值的一侧有大约100000个唯一类。

问题本身对我来说似乎很基础,但瓶颈是运行时。在其他编程语言中,您可能会使用HashMap进行映射,然后循环遍历向量。到目前为止,R I中的任何解决方案都比Java或Python中基于HashMap的简单慢几个数量级(参见下面的评论)。

存储映射的数据结构是否比数据帧更有效?

在R中,这个问题的运行时效率最高的解决方案是什么?

4 个答案:

答案 0 :(得分:3)

有一个名为hashmap的软件包,非常适用于此:

library(hashmap)

hash_lookup = hashmap(mapping$key, mapping$value)

output = hash_lookup[[vector]]

<强>结果:

> hash_lookup
## (character) => (character)
##       [cat] => [mammal]   
##  [elephant] => [mammal]   
##       [ant] => [insect]   
##       [dog] => [mammal]   
##     [eagle] => [bird]     
##    [parrot] => [bird]     

> output
[1] "mammal" "insect" "bird"   "insect" "bird"   "bird"

数据:

vector <- c("dog","ant","eagle","ant","eagle","parrot")

mapping <- data.frame(key=c("dog","cat","elephant","ant","parrot","eagle"),
                      value=c("mammal","mammal","mammal","insect","bird","bird"),
                      stringsAsFactors = FALSE)

注意:

必须在更大的数据集上测试它,但这种方法应该非常快,因为它是在内部用Rcpp实现的。

答案 1 :(得分:0)

在列表中怎么样?从:

开始
FamLst <- list(mammal = c("elephant", "dog"), bird = c("parrot", "eagle"))

然后您可以按位添加到列表中。例如,您可以使用FamLst$mammal显示所有哺乳动物的列表。如果您想测试"dog"是否是哺乳动物的成员,请使用"dog" %in% FamLst$mammal

答案 2 :(得分:0)

一种选择是对矢量进行分解并改变水平。

mapping = data.table(mapping)

setkey(mapping, key)

vector = factor(vector)

levels(vector) = mapping[levels(vector),value]

答案 3 :(得分:0)

您可以使用列表存储键值对,然后使用lapply和unlist的组合将动物矢量映射到键/值对列表。请参见下面的示例。

animals = c('dog', 'ant', 'eagle', 'ant', 'eagle', 'parrot')

key_value = list('dog' = 'mammal',
                 'cat' = 'mammal',
                 'elephant' = 'mammal',
                 'ant' = 'insect',
                 'parrot' = 'bird',
                 'eagle' = 'bird')

unlist(lapply(animals, FUN = function(x){key_value[[x]]}))

> unlist(lapply(animals, FUN = function(x){key_value[[x]]}))
[1] "mammal" "insect" "bird"   "insect" "bird"   "bird"