将数据帧中单元格中的多个元素映射到R中的另一个数据帧

时间:2017-06-27 17:28:53

标签: r

我有一个只有一列的表:

df <- data.frame(Interest = c("a,b,c,d,e","a,b,d","e,c,b","d,f"))

Interest  
----
a,b,c,d,e  
a,b,d  
e,c,b  
d,f

另一个数据框

df1 <- data.frame(Key = c("a","b","c","d","e","f"), Value = c("1","2","3","4","5","6"))

Key  | Value  
----  
a  |  1  
b  |  2  
c  |  3  
d  |  4  
e  |  5  
f  |  6  

预期输出为:

df <- data.frame(Interest = c("a,b,c,d,e","a,b,d","e,c,b","d,f"), Mapped = c("1,2,3,4,5","1,2,4","5,3,2","4,6"))

Interest  |  Mapped  
----
a,b,c,d,e  |  1,2,3,4,5  
a,b,d  |  1,2,4  
e,c,b  |  5,3,2  
d,f  |  4,6

一对一映射是相当简单的工作。但在这种情况下,我有一个需要映射的列表。真的很感激一些帮助。

3 个答案:

答案 0 :(得分:3)

strsplit的简单sapply应该有效:

df$Mapped <- sapply(strsplit(as.character(df$Interest), split = ","), 
                    function(x) paste0(df1[match(x, df1$Key), "Value"], collapse = ","))

df
#   Interest    Mapped
#1 a,b,c,d,e 1,2,3,4,5
#2     a,b,d     1,2,4
#3     e,c,b     5,3,2
#4       d,f       4,6

答案 1 :(得分:2)

我不确定为什么你需要这种输出格式,但是这段代码将提供你想要的。

library(tidyr)
library(dplyr)

df$Id=1:dim(df)[1]
df=df %>%
    transform(input = strsplit(input, ",")) %>%
    unnest(input)
df=merge(df,map,by.x='input',by.y='key',all.x=T)
df%>%group_by(Id)%>%dplyr::summarise(Interest=paste(input,collapse = ","),Mapped=paste(value,collapse = ","))

# A tibble: 4 × 3
     Id  Interest    Mapped
  <chr>     <chr>     <chr>
1     1 a,b,c,d,e 1,2,3,4,5
2     2     a,b,d     1,2,4
3     3     b,c,e     2,3,5
4     4       d,f       4,6

答案 2 :(得分:1)

我的小数据集因为我很懒:

import unittest
from core import Driver
import page

class testLoginOK(unittest.TestCase):

    def setUp(self):
        self.driver = Driver.getDriver('iOS')

    def test_login_error_message(self):

        main_page = page.MainPage(self.driver)
        main_page.click_Login_Button()

    def tearDown(self):
        self.driver.close()

if __name__ == "__main__":
    unittest.main()

可以使用interest = data.frame(interest = c('a,b,c', 'a,c')) keyvalue = data.frame(kv = c('a|1', 'b|2', 'c|3')) 。做一些“举重”

具体来说,我们采用键值对并使用tidyr的单独创建一个两列data.frame。然后使用qdap::mgsub来格式化模式和替换的向量。

qdap::mgsub

输出:

library(dplyr)
library(tidyr)
keyv <- keyvalue %>% separate(kv, into = c('Interest', 'Value'), sep = '\\|')
library(qdap)
interest$interest <- paste0(interest$interest,
                            '|',
                            mgsub(keyv$Interest, keyv$Value, interest$interest))