我有一个数据帧(df),长/高格式,如此
输入:
ID entity_id type
A1 1001 husband
A1 1002 wife
A1 1003 brother
A1 1004 son
A2 2005 husband
A2 2006 son
我希望这是宽格式的,我做了以下
因为Reshape无法处理重复项(默认为count),所以我添加了一个虚拟列
df$dummy <- seq_len(now(df))
df_wide <- dcast(df, dummy + ID ~ type, value.var="entity_id")
这就是我得到的:
dummy ID husband wife brother son
1 A1 1001 NA NA NA
2 A1 NA 1002 NA NA
3 A1 NA NA 1003 NA
我想要的是什么:
dummy ID husband wife brother son
1 A1 1001 1002 1003 1004
2 A2 2005 NA NA 2006
R version 3.2.4 (2016-03-10)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.11.3 (El Capitan)
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] tidyr_0.4.1 reshape2_1.4.1 dplyr_0.4.3 RMySQL_0.10.8 DBI_0.3.1
loaded via a namespace (and not attached):
[1] plyr_1.8.3 magrittr_1.5 R6_2.1.2 assertthat_0.1 parallel_3.2.4 tools_3.2.4 Rcpp_0.12.4 stringi_1.0-1 stringr_1.0.0
答案 0 :(得分:1)
我不确定我是否完全理解为什么要添加虚拟列(我假设您打算为其编写df$dummy
而不是df_dummy
)。但以下似乎给出了您正在寻找的结果:
library(reshape2)
df <- read.delim(text="ID entity_id type
A1 1001 husband
A1 1002 wife
A1 1003 brother
A1 1004 son
A2 2005 husband
A2 2006 son", sep="")
dcast(df, ID ~ type, value.var="entity_id")
ID brother husband son wife
1 A1 1003 1001 1004 1002
2 A2 NA 2005 2006 NA
编辑:根据您修改后的数据,其中有多个兄弟和儿子,我建议如下(---假设您仍希望将所有内容放在一行中---):
解决方案1:将所有内容放入一个单元格中:
df <- read.delim(text="ID entity_id type
A1 1001 husband
A1 1002 wife
A1 1003 brother
A1 1005 brother
A1 1004 son
A1 1006 son
A2 2005 husband
A2 2006 son", sep="")
dcast(df, ID ~ type, value.var="entity_id",
fun.aggregate = function(...) paste0(..., collapse = "_"))
ID brother husband son wife
1 A1 1003_1005 1001 1004_1006 1002
2 A2 2005 2006
在这里,我通过将ID一起粘贴来聚合多个实例。我不知道你以后想做什么,所以我不知道这对你来说是否有用。我只想指出一种可能性。不用说,您可以更改聚合功能以满足您的需求。例如,您可以将它们放入列表中,而不是将它们粘贴在一起。
dcast(df, ID ~ type, value.var="entity_id", fun.aggregate = list)
ID brother husband son wife
1 A1 1003, 1005 1001 1004, 1006 1002
2 A2 2005 2006
解决方案2:添加列:
library(dplyr)
new.df <- df %>% group_by(ID, type) %>%
mutate(type_num = paste(type, 1:n(), sep="_"))
dcast(new.df, ID ~ type_num, value.var="entity_id")
ID brother_1 brother_2 husband_1 son_1 son_2 wife_1
1 A1 1003 1005 1001 1004 1006 1002
2 A2 NA NA 2005 2006 NA NA
答案 1 :(得分:0)
对我来说是一个重大的疏忽,但是为了将来我这样的人的利益。
只有当您有多个相同类型的条目时,才会出现上述问题,在上面的示例中,我的实际数据看起来像这样
ID entity_id type
A1 1001 husband
A1 1002 wife
A1 1003 brother
A1 1005 brother
A1 1004 son
A1 1006 son
A2 2005 husband
A2 2006 son
请注意,有两个儿子和兄弟:
since 'dcast' can't figure out how to resolve this, it ends up creating another row