重塑数据以仅产生一行

时间:2016-03-31 09:59:42

标签: r reshape reshape2

我有一个数据帧(df),长/高格式,如此

输入:

ID  entity_id  type
A1  1001       husband
A1  1002       wife
A1  1003       brother
A1  1004       son
A2  2005       husband
A2  2006       son

我希望这是宽格式的,我做了以下

因为Reshape无法处理重复项(默认为count),所以我添加了一个虚拟列

df$dummy <- seq_len(now(df))

df_wide <- dcast(df, dummy + ID ~ type, value.var="entity_id")

这就是我得到的:

dummy ID  husband wife  brother son
1     A1  1001    NA    NA      NA
2     A1  NA      1002  NA      NA
3     A1  NA      NA    1003    NA

我想要的是什么:

dummy ID  husband wife brother son
1     A1  1001    1002 1003    1004
2     A2  2005    NA   NA      2006  

EDIT1 SessionINFO()

R version 3.2.4 (2016-03-10)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.11.3 (El Capitan)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] tidyr_0.4.1    reshape2_1.4.1 dplyr_0.4.3    RMySQL_0.10.8  DBI_0.3.1     

loaded via a namespace (and not attached):
[1] plyr_1.8.3     magrittr_1.5   R6_2.1.2       assertthat_0.1 parallel_3.2.4 tools_3.2.4    Rcpp_0.12.4    stringi_1.0-1  stringr_1.0.0 

2 个答案:

答案 0 :(得分:1)

我不确定我是否完全理解为什么要添加虚拟列(我假设您打算为其编写df$dummy而不是df_dummy)。但以下似乎给出了您正在寻找的结果:

library(reshape2)

df <- read.delim(text="ID  entity_id  type
                 A1  1001       husband
                 A1  1002       wife
                 A1  1003       brother
                 A1  1004       son
                 A2  2005       husband
                 A2  2006       son", sep="")

dcast(df, ID ~ type, value.var="entity_id")
  ID brother husband  son wife
1 A1    1003    1001 1004 1002
2 A2      NA    2005 2006   NA

编辑:根据您修改后的数据,其中有多个兄弟和儿子,我建议如下(---假设您仍希望将所有内容放在一行中---):

解决方案1:将所有内容放入一个单元格中:

df <- read.delim(text="ID  entity_id  type
A1  1001       husband
A1  1002       wife
A1  1003       brother
A1  1005       brother
A1  1004       son
A1  1006       son
A2  2005       husband
A2  2006       son", sep="")

dcast(df, ID ~ type, value.var="entity_id", 
      fun.aggregate = function(...) paste0(..., collapse = "_"))
  ID   brother husband       son wife
1 A1 1003_1005    1001 1004_1006 1002
2 A2              2005      2006     

在这里,我通过将ID一起粘贴来聚合多个实例。我不知道你以后想做什么,所以我不知道这对你来说是否有用。我只想指出一种可能性。不用说,您可以更改聚合功能以满足您的需求。例如,您可以将它们放入列表中,而不是将它们粘贴在一起。

dcast(df, ID ~ type, value.var="entity_id", fun.aggregate = list)
  ID    brother husband        son wife
1 A1 1003, 1005    1001 1004, 1006 1002
2 A2               2005       2006     

解决方案2:添加列:

library(dplyr)
new.df <- df %>% group_by(ID, type) %>% 
                 mutate(type_num = paste(type, 1:n(), sep="_"))   
dcast(new.df, ID ~ type_num, value.var="entity_id")
  ID brother_1 brother_2 husband_1 son_1 son_2 wife_1
1 A1      1003      1005      1001  1004  1006   1002
2 A2        NA        NA      2005  2006    NA     NA

答案 1 :(得分:0)

对我来说是一个重大的疏忽,但是为了将来我这样的人的利益。

只有当您有多个相同类型的条目时,才会出现上述问题,在上面的示例中,我的实际数据看起来像这样

ID  entity_id  type
A1  1001       husband
A1  1002       wife
A1  1003       brother
A1  1005       brother
A1  1004       son
A1  1006       son
A2  2005       husband
A2  2006       son

请注意,有两个儿子和兄弟:

since 'dcast' can't figure out how to resolve this, it ends up creating another row