用r中的重复变量重新形成一个参差不齐的宽数组

时间:2014-04-16 12:38:57

标签: r text reshape tabular

我有一张像

这样的表格
+------+---------+---------+---------+----------+---------+

| Code | Display | Synonym | Synonym | Synonym  | Synonym |

+------+---------+---------+---------+----------+---------+

|    1 | A       | Cat     | Dog     | Lion     |         |

|    2 | B       | Horse   | Penguin |          |         |

|    3 | C       | Donkey  | Giraffe | Mongoose | Rabbit  |

+------+---------+---------+---------+----------+---------+

我想输出一个像

这样的表格
+------+---------+----------+

| Code | Display | Synonym  |

+------+---------+----------+

|    1 | A       | Cat      |

|    1 | A       | Dog      |

|    1 | A       | Lion     |

|    2 | B       | Horse    |

|    2 | B       | Penguin  |

|    3 | C       | Donkey   |

|    3 | C       | Giraffe  |

|    3 | C       | Mongoose |

|    3 | C       | Rabbit   |

+------+---------+----------+

换句话说,我想将代码和显示与所呈现的每个同义词配对,并且每个代码可以具有1到几个同义词。我已经看到了在其他情况下使用的重塑的例子,但是还没有能够弄清楚如何在这里应用它。

3 个答案:

答案 0 :(得分:2)

你可以在一个参差不齐的数组上使用标准整形 - 来自reshape2的melt(),你可以使用na.rm参数来移除NA,否则你可以在以后执行:

library(reshape2)
dat.m <- melt(dat, id.vars = c("Code", "Display"), value.name = "Synonym", na.rm = TRUE)
#   Code Display  variable  Synonym
#1     1       A   Synonym      Cat
#2     2       B   Synonym    Horse
#3     3       C   Synonym   Donkey
#4     1       A Synonym.1      Dog
#5     2       B Synonym.1  Penguin
#6     3       C Synonym.1  Giraffe
#7     1       A Synonym.2     Lion
#9     3       C Synonym.2 Mongoose
#12    3       C Synonym.3   Rabbit

如果您愿意,可以删除variable列:

dat.m$variable <- NULL

答案 1 :(得分:1)

以下是两种基本R方法。

stack

cbind(mydf[1:2], stack(lapply(mydf[-c(1:2)], as.character)))
#    Code Display   values       ind
# 1     1       A      Cat   Synonym
# 2     2       B    Horse   Synonym
# 3     3       C   Donkey   Synonym
# 4     1       A      Dog Synonym.1
# 5     2       B  Penguin Synonym.1
# 6     3       C  Giraffe Synonym.1
# 7     1       A     Lion Synonym.2
# 8     2       B          Synonym.2
# 9     3       C Mongoose Synonym.2
# 10    1       A          Synonym.3
# 11    2       B          Synonym.3
# 12    3       C   Rabbit Synonym.3

reshape

首先将列重命名为“Synonym_1”,“Synonym_2”等模式,让生活更轻松。实际上,R喜欢“Synonym.1”,“Synonym.2”等等......

A <- grep("Synonym", names(mydf))
names(mydf)[A] <- paste0("Synonym_", seq_along(A))

现在,重塑......

reshape(mydf, direction = "long", varying = A, sep = "_")
#     Code Display time  Synonym id
# 1.1    1       A    1      Cat  1
# 2.1    2       B    1    Horse  2
# 3.1    3       C    1   Donkey  3
# 1.2    1       A    2      Dog  1
# 2.2    2       B    2  Penguin  2
# 3.2    3       C    2  Giraffe  3
# 1.3    1       A    3     Lion  1
# 2.3    2       B    3           2
# 3.3    3       C    3 Mongoose  3
# 1.4    1       A    4           1
# 2.4    2       B    4           2
# 3.4    3       C    4   Rabbit  3

答案 2 :(得分:0)

在提出问题之后,我想出了一个可能间接的方法来做到这一点:

allergies_output <- reshape(allergies_input,varying=list(grep('Synonym',names(allergies_input),value=TRUE)),direction='long',idvar=c('Code','Display'),v.names='Synonym',names(allergies_input))

这会产生一些不稳定的结果,但不能通过删除一些列名来修复。