Question

我有一个数据集如下：

Pt    EVENT
123    GGG
123    Nor
123    tre
144    GGG
1667   tre
1667   Nor
1667   tre

我正在尝试为Sankey图准备数据，为了做到这一点，我需要将数据转换为以下形状

Pt    
123   GGG      Nor   tre
144   GGG
1667  tre      Nor   tre

然后我最终想到了源，目标，价值格式如下：

source    target   value
 GGG        Nor       1
 GGG                  1
 tre        tre       1
 Nor        tre       2

我不理解的部分是如何从原始数据集到第二个数据集。我以为我可以用dplyr做到这一点，但没有快乐：

  Sankey<-EndoSubset %>%
      group_by(Pt) %>% 
      select(t(EVENT))

Answer 1

这可以通过合成时间列reshape()来完成：

reshape(cbind(df,time=ave(seq_len(nrow(df)),df$Pt,FUN=seq_along)),dir='w',idvar='Pt');
##     Pt EVENT.1 EVENT.2 EVENT.3
## 1  123     GGG     Nor     tre
## 4  144     GGG    <NA>    <NA>
## 5 1667     tre     Nor     tre

数据

df <- data.frame(Pt=c(123L,123L,123L,144L,1667L,1667L,1667L),EVENT=c('GGG','Nor','tre','GGG', 'tre','Nor','tre'),stringsAsFactors=F);

Answer 2

我们可以使用data.table

 library(data.table)
 dcast(setDT(df1), Pt~rowid(Pt), value.var="EVENT")

Answer 3

以下是dplyr和tidyr解决方案：

library(dplyr)
library(tidyr)

data %>%
     group_by(Pt) %>%
     mutate(rn = 1:n()) %>%
     ungroup %>%
     spread(rn, EVENT)

Answer 4

另一种选择：

library(data.table)
l <- sapply(unique(df$Pt), function(x) data.frame(rbind(c(x,df[df$Pt==x,]$EVENT))))
rbindlist(l, fill = T)

     # X1  X2  X3  X4
# 1:  123 GGG Nor tre
# 2:  144 GGG  NA  NA
# 3: 1667 tre Nor tre

数据

df <- structure(list(Pt = c(123L, 123L, 123L, 144L, 1667L, 1667L, 1667L ), EVENT = c("GGG", "Nor", "tre", "GGG", "tre", "Nor", "tre")), .Names = c("Pt", "EVENT"), row.names = c(NA, -7L), class = "data.frame")

如何同时转置列和组

4 个答案: