我有一个只有一列的表:
df <- data.frame(Interest = c("a,b,c,d,e","a,b,d","e,c,b","d,f"))
Interest
----
a,b,c,d,e
a,b,d
e,c,b
d,f
另一个数据框
df1 <- data.frame(Key = c("a","b","c","d","e","f"), Value = c("1","2","3","4","5","6"))
Key | Value
----
a | 1
b | 2
c | 3
d | 4
e | 5
f | 6
预期输出为:
df <- data.frame(Interest = c("a,b,c,d,e","a,b,d","e,c,b","d,f"), Mapped = c("1,2,3,4,5","1,2,4","5,3,2","4,6"))
Interest | Mapped
----
a,b,c,d,e | 1,2,3,4,5
a,b,d | 1,2,4
e,c,b | 5,3,2
d,f | 4,6
一对一映射是相当简单的工作。但在这种情况下,我有一个需要映射的列表。真的很感激一些帮助。
答案 0 :(得分:3)
带strsplit
的简单sapply
应该有效:
df$Mapped <- sapply(strsplit(as.character(df$Interest), split = ","),
function(x) paste0(df1[match(x, df1$Key), "Value"], collapse = ","))
df
# Interest Mapped
#1 a,b,c,d,e 1,2,3,4,5
#2 a,b,d 1,2,4
#3 e,c,b 5,3,2
#4 d,f 4,6
答案 1 :(得分:2)
我不确定为什么你需要这种输出格式,但是这段代码将提供你想要的。
library(tidyr)
library(dplyr)
df$Id=1:dim(df)[1]
df=df %>%
transform(input = strsplit(input, ",")) %>%
unnest(input)
df=merge(df,map,by.x='input',by.y='key',all.x=T)
df%>%group_by(Id)%>%dplyr::summarise(Interest=paste(input,collapse = ","),Mapped=paste(value,collapse = ","))
# A tibble: 4 × 3
Id Interest Mapped
<chr> <chr> <chr>
1 1 a,b,c,d,e 1,2,3,4,5
2 2 a,b,d 1,2,4
3 3 b,c,e 2,3,5
4 4 d,f 4,6
答案 2 :(得分:1)
我的小数据集因为我很懒:
import unittest
from core import Driver
import page
class testLoginOK(unittest.TestCase):
def setUp(self):
self.driver = Driver.getDriver('iOS')
def test_login_error_message(self):
main_page = page.MainPage(self.driver)
main_page.click_Login_Button()
def tearDown(self):
self.driver.close()
if __name__ == "__main__":
unittest.main()
可以使用interest = data.frame(interest = c('a,b,c', 'a,c'))
keyvalue = data.frame(kv = c('a|1', 'b|2', 'c|3'))
。做一些“举重”
具体来说,我们采用键值对并使用tidyr的单独创建一个两列data.frame。然后使用qdap::mgsub
来格式化模式和替换的向量。
qdap::mgsub
输出:
library(dplyr)
library(tidyr)
keyv <- keyvalue %>% separate(kv, into = c('Interest', 'Value'), sep = '\\|')
library(qdap)
interest$interest <- paste0(interest$interest,
'|',
mgsub(keyv$Interest, keyv$Value, interest$interest))