我有一张像
这样的表格+------+---------+---------+---------+----------+---------+
| Code | Display | Synonym | Synonym | Synonym | Synonym |
+------+---------+---------+---------+----------+---------+
| 1 | A | Cat | Dog | Lion | |
| 2 | B | Horse | Penguin | | |
| 3 | C | Donkey | Giraffe | Mongoose | Rabbit |
+------+---------+---------+---------+----------+---------+
我想输出一个像
这样的表格+------+---------+----------+
| Code | Display | Synonym |
+------+---------+----------+
| 1 | A | Cat |
| 1 | A | Dog |
| 1 | A | Lion |
| 2 | B | Horse |
| 2 | B | Penguin |
| 3 | C | Donkey |
| 3 | C | Giraffe |
| 3 | C | Mongoose |
| 3 | C | Rabbit |
+------+---------+----------+
换句话说,我想将代码和显示与所呈现的每个同义词配对,并且每个代码可以具有1到几个同义词。我已经看到了在其他情况下使用的重塑的例子,但是还没有能够弄清楚如何在这里应用它。
答案 0 :(得分:2)
你可以在一个参差不齐的数组上使用标准整形 - 来自reshape2的melt()
,你可以使用na.rm
参数来移除NA
,否则你可以在以后执行:
library(reshape2)
dat.m <- melt(dat, id.vars = c("Code", "Display"), value.name = "Synonym", na.rm = TRUE)
# Code Display variable Synonym
#1 1 A Synonym Cat
#2 2 B Synonym Horse
#3 3 C Synonym Donkey
#4 1 A Synonym.1 Dog
#5 2 B Synonym.1 Penguin
#6 3 C Synonym.1 Giraffe
#7 1 A Synonym.2 Lion
#9 3 C Synonym.2 Mongoose
#12 3 C Synonym.3 Rabbit
如果您愿意,可以删除variable
列:
dat.m$variable <- NULL
答案 1 :(得分:1)
以下是两种基本R方法。
stack
cbind(mydf[1:2], stack(lapply(mydf[-c(1:2)], as.character)))
# Code Display values ind
# 1 1 A Cat Synonym
# 2 2 B Horse Synonym
# 3 3 C Donkey Synonym
# 4 1 A Dog Synonym.1
# 5 2 B Penguin Synonym.1
# 6 3 C Giraffe Synonym.1
# 7 1 A Lion Synonym.2
# 8 2 B Synonym.2
# 9 3 C Mongoose Synonym.2
# 10 1 A Synonym.3
# 11 2 B Synonym.3
# 12 3 C Rabbit Synonym.3
reshape
首先将列重命名为“Synonym_1”,“Synonym_2”等模式,让生活更轻松。实际上,R喜欢“Synonym.1”,“Synonym.2”等等......
A <- grep("Synonym", names(mydf))
names(mydf)[A] <- paste0("Synonym_", seq_along(A))
现在,重塑......
reshape(mydf, direction = "long", varying = A, sep = "_")
# Code Display time Synonym id
# 1.1 1 A 1 Cat 1
# 2.1 2 B 1 Horse 2
# 3.1 3 C 1 Donkey 3
# 1.2 1 A 2 Dog 1
# 2.2 2 B 2 Penguin 2
# 3.2 3 C 2 Giraffe 3
# 1.3 1 A 3 Lion 1
# 2.3 2 B 3 2
# 3.3 3 C 3 Mongoose 3
# 1.4 1 A 4 1
# 2.4 2 B 4 2
# 3.4 3 C 4 Rabbit 3
答案 2 :(得分:0)
在提出问题之后,我想出了一个可能间接的方法来做到这一点:
allergies_output <- reshape(allergies_input,varying=list(grep('Synonym',names(allergies_input),value=TRUE)),direction='long',idvar=c('Code','Display'),v.names='Synonym',names(allergies_input))
这会产生一些不稳定的结果,但不能通过删除一些列名来修复。