我正在从一个csv工作,这个csv描述了不同的事件(下面标记为" A"," B"," C")。我对事件中的行为顺序(" x"," y"," z"下面)感兴趣,但事件可能会分开不止一排。对于我感兴趣的问题,我想为每个事件在一行中排列完整的行为序列。我很难在R中弄清楚如何做到这一点。
这就是我的数据:
Behavior 1 | Behavior 2 | Behavior 3 | Behavior 4 | Behavior 5
A | x | x | | |
A | y | | | |
B | y | x | | |
C | y | z | x | |
C | x | | | |
这就是我想要的数据:
Behavior 1 | Behavior 2 | Behavior 3 | Behavior 4 | Behavior 5
A | x | x | y | |
B | y | x | | |
C | y | z | x | x |
提前感谢您的帮助!
答案 0 :(得分:1)
我建议您熟悉R中的 dplyr 和 tidyr 包,因为它们相对容易使用。抓住数据萎缩备忘单https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf,然后你去:
event <- c("A", "A", "B", "C", "C")
behavior1 <- c("x", "y","y", "y", "x")
behavior2 <- c("x", "","x", "z", "")
behavior3 <- c("", "","", "x", "")
behavior4 <- c("", "","", "", "")
behavior5 <- c("", "","", "", "")
df <- data.frame(event, behavior1, behavior2,behavior3,behavior4,behavior5, stringsAsFactors =F)
df
library(tidyr)
library (dplyr)
#make table flat
df2 <- gather(df, behavior, outcome, -event)
df2
#remove empty rows and sort (sort is only to make it easier to understand)
df3 <- df2 %>% filter(outcome != "") %>% arrange(event)
df3
#create row number per event
df4 <- df3 %>% group_by(event) %>% mutate (t = row_number(), behavior_new=paste("Behavior", t))
df4
#drop old behavir and t column
df5 <- df4 %>% select (-behavior, -t)
df5
#spread out bevhavior again
spread(df5, behavior_new, outcome )
PS:对于您的下一个问题,请检查此问题和第一个答案How to make a great R reproducible example?,以更好的方式提问。
答案 1 :(得分:1)
或者,如果您因任何原因希望避免使用其他软件包,可以这样做:
beh <- matrix( c("A", "A", "B", "C", "C",
"x", "y", "y", "y", "x",
"x", NA, "x", "z", NA,
NA, NA, NA, "x", NA,
NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA),
ncol=6)
ret.list <- list()
events <- unique(beh[,1])
for(evt in events)
{
sel <- beh[,1] == evt
row <- na.omit(as.vector(t(beh[sel, -1])))
ret.list[[evt]] <- as.vector(row)
}
# if you want a matrix instead:
max.beh <- max(unlist(lapply(ret.list, length)))
ret.mat <- matrix(NA, nrow=length(events), ncol=max.beh)
for(i in 1:length(events))
{
evt.beh <- ret.list[[events[i]]]
ret.mat[i, 1:length(evt.beh)] <- evt.beh
}
rownames(ret.mat) <- events
colnames(ret.mat) <- paste("Behavior", 1:max.beh)
这只是遍历行并将事件中的标签粘贴到列表项中,删除NA。如果您想要矩阵,则通过查找ret.list
中最长的列表项和唯一事件的数量来确定维。然后将列表项粘贴到适当的行中。