如何根据R中的ID变量合并保留序列的不同长度的行

时间:2017-06-30 03:42:31

标签: r

我正在从一个csv工作,这个csv描述了不同的事件(下面标记为" A"," B"," C")。我对事件中的行为顺序(" x"," y"," z"下面)感兴趣,但事件可能会分开不止一排。对于我感兴趣的问题,我想为每个事件在一行中排列完整的行为序列。我很难在R中弄清楚如何做到这一点。

这就是我的数据:

    Behavior 1 |  Behavior 2 | Behavior 3  | Behavior 4  | Behavior 5
A | x          |  x          |             |             |
A | y          |             |             |             |
B | y          |  x          |             |             |
C | y          |  z          |  x          |             |
C | x          |             |             |             |

这就是我想要的数据:

     Behavior 1 |  Behavior 2  | Behavior 3  | Behavior 4  | Behavior 5
A |  x          |  x           | y           |             |
B |  y          |  x           |             |             |
C |  y          |  z           | x           | x           |

提前感谢您的帮助!

2 个答案:

答案 0 :(得分:1)

我建议您熟悉R中的 dplyr tidyr 包,因为它们相对容易使用。抓住数据萎缩备忘单https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf,然后你去:

event <- c("A", "A", "B", "C", "C")
behavior1 <- c("x", "y","y", "y", "x")
behavior2 <- c("x", "","x", "z", "")
behavior3 <- c("", "","", "x", "")
behavior4 <- c("", "","", "", "")
behavior5 <- c("", "","", "", "")
df <- data.frame(event, behavior1, behavior2,behavior3,behavior4,behavior5, stringsAsFactors =F)
df

library(tidyr)
library (dplyr)
#make table flat
df2 <- gather(df, behavior, outcome, -event)
df2
#remove empty rows and sort (sort is only to make it easier to understand)
df3 <- df2 %>% filter(outcome != "") %>% arrange(event)
df3
#create row number per event 
df4 <- df3 %>% group_by(event) %>% mutate (t = row_number(), behavior_new=paste("Behavior", t)) 
df4

#drop old behavir and t column
df5 <- df4 %>% select (-behavior, -t)
df5

#spread out bevhavior again
spread(df5, behavior_new, outcome )

PS:对于您的下一个问题,请检查此问题和第一个答案How to make a great R reproducible example?,以更好的方式提问。

答案 1 :(得分:1)

或者,如果您因任何原因希望避免使用其他软件包,可以这样做:

beh <- matrix( c("A", "A", "B", "C", "C",
    "x", "y", "y", "y", "x",
    "x", NA, "x", "z", NA,
    NA, NA, NA, "x", NA,
    NA, NA, NA, NA, NA,
    NA, NA, NA, NA, NA),
    ncol=6)

ret.list <- list()
events <- unique(beh[,1])

for(evt in events)
{   
    sel <- beh[,1] == evt
    row <- na.omit(as.vector(t(beh[sel, -1])))
    ret.list[[evt]] <- as.vector(row)
}

# if you want a matrix instead:
max.beh <- max(unlist(lapply(ret.list, length)))

ret.mat <- matrix(NA, nrow=length(events), ncol=max.beh)
for(i in 1:length(events))
{
    evt.beh <- ret.list[[events[i]]]
    ret.mat[i, 1:length(evt.beh)] <- evt.beh
}

rownames(ret.mat) <- events
colnames(ret.mat) <- paste("Behavior", 1:max.beh)

这只是遍历行并将事件中的标签粘贴到列表项中,删除NA。如果您想要矩阵,则通过查找ret.list中最长的列表项和唯一事件的数量来确定维。然后将列表项粘贴到适当的行中。