ca.df
id Category
1 Noun
2 Negative
3 Positive
4 adj
5 word
每个术语分配到多个类别,因此,它对应的ID超过1个。在术语中,所有ID都在一列中。
terms.df
Terms id
Love 1 4 5 3
Hate 2 4 5
ice 1 5
id的含义与ca.df中的类别相对应。我想要一个像这样的输出:
x.df
Category terms
Noun ice Love
Negative Hate
Positive Love
adj Hate Love
word ice Hate Love
怎么做?
答案 0 :(得分:5)
这是一个可能的data.table
/ splitstackshape
包解决方案
library(splitstackshape) ## loads `data.table` package too
terms.df <- cSplit(terms.df, "id", sep = " ", direction = "long")
setkey(terms.df, id)[ca.df, .(Category , Terms = toString(Terms)), by = .EACHI]
# id Category Terms
# 1: 1 Noun Love, ice
# 2: 2 Negative Hate
# 3: 3 Positive Love
# 4: 4 adj Love, Hate
# 5: 5 word Love, Hate, ice
一些解释
id
列Terms
列拆分为空格
id
列上的两个数据集之间执行二进制左连接 Terms
运算符连接by = .EACHI
列,这样我们可以在joinig时执行不同的操作答案 1 :(得分:2)
使用tidyr
和dplyr
的解决方案。
library(tidyr)
library(dplyr)
ca.df$id <- as.character(ca.df$id)
terms.df %>% separate(id,into=paste0("V",1:3),sep = " ",extra = "merge") %>%
gather(var,id,-Terms) %>%
filter(!is.na(id)) %>%
left_join(ca.df,by="id") %>%
select(-var,-id) %>%
group_by(Category) %>%
summarize(Terms=paste(Terms,collapse=" "))
输出:
Source: local data frame [4 x 2]
Category Terms
1 Negative Hate
2 Noun Love ice
3 adj Love Hate
4 word ice Love Hate
数据:
ca.df <- read.table(text =
"id Category
1 Noun
2 Negative
3 Positive
4 adj
5 word",head=TRUE,stringsAsFactors=FALSE)
terms.df <- read.table(text =
"Terms id
Love '1 4 5'
Hate '2 4 5'
ice '1 5'
",head=TRUE,stringsAsFactors=FALSE)
答案 2 :(得分:1)
您可以使用merge
根据 id
ca.df <- data.frame(id=1:5, Category=c("Noun", "Negative", "Positive", "adj", "word"))
terms.df <- data.frame(Terms=c(rep("Love", 3), rep("Hate", 3), rep("ice", 2)),
id = c(1,4,5,2,4,5,1,5))
x.df <- merge(ca.df, terms.df, by="id")
x.df
id Category Terms
1 1 Noun Love
2 1 Noun ice
3 2 Negative Hate
4 4 adj Love
5 4 adj Hate
6 5 word Love
7 5 word Hate
8 5 word ice