我需要使用管道一次性计算数据集中的多个变量。
我使用了以下代码:
#R
NonComp_Strat <- Minor_Behaviours %>%
filter(Categories == "Non compliant with routine") %>%
group_by(Strategies) %>%
summarise(frequency= n())
但是,在我的数据框中,某些单元格包含多个用逗号分隔的条目。
例如
它以不同的方式对待以下行为条目“破坏性”和“破坏性,关闭任务”。
数据框中的两个行为条目都具有我要查找的变量,但我不知道如何将grep或grepl函数包装到管道中以计算所有单个变量。其中有20多个,执行20多个单独的grep函数听起来很糟糕。任何帮助是极大的赞赏。
谢谢
丹
答案 0 :(得分:1)
您首先必须拆分逗号分隔的值,并在其中创建新行。然后,您可以像以前一样group_by
:
library(splitstackshape)
df <- data.frame(id = c(1:4), Strategies = c("Disruptive", "Disruptive, Off Task", "Off Task", "Off Task, Interview"))
df
id Strategies
1 1 Disruptive
2 2 Disruptive, Off Task
3 3 Off Task
4 4 Off Task, Interview
df <- cSplit(df, "Strategies", ",", "long")
df
id Strategies
1: 1 Disruptive
2: 2 Disruptive
3: 2 Off Task
4: 3 Off Task
5: 4 Off Task
6: 4 Interview
答案 1 :(得分:0)
在一个dplyr
和tidyr
工作流程中:
df %>%
separate(Strategies, paste("Strategies", 1:5, sep = "_"), extra = "drop", sep = ",") %>%
gather(Stacked, Strategies, Strategies_1:Strategies_5) %>%
select(-Stacked) %>%
na.omit() %>%
mutate(Strategies = as.factor(trimws(Strategies))) %>%
group_by(Strategies) %>%
summarise(count = n())
Strategies count
<fct> <int>
1 Brief Time Out 1
2 Detention 2
3 Disruptive 2
4 Interview 1
5 Off Task 1
答案 2 :(得分:0)
更笼统地说,我们可以设计一个生成reshape
可用数据的拆分函数。
spltCol <- function(x) {
l <- strsplit(as.character(x), ", ?")
l <- lapply(l, function(y) c(y, rep(NA, max(lengths(l)) - length(y))))
return(as.data.frame(do.call(rbind, l)))
}
示例
df1
# id x z
# 1 1 alpha, beta, gamma 0.7281856
# 2 2 alpha, beta -0.3149730
# 3 3 alpha -2.6412875
# 4 4 <NA> 0.6412990
df12 <- data.frame(append(df1[-2], spltCol(df1$x)))
# id z V1 V2 V3
# 1 1 0.7281856 alpha beta gamma
# 2 2 -0.3149730 alpha beta <NA>
# 3 3 -2.6412875 alpha <NA> <NA>
# 4 4 0.6412990 <NA> <NA> <NA>
reshape(df12, direction="long", varying=cbind("V1", "V2", "V3"), v.names=names(df1)[2])
# id z time x
# 1.1 1 0.7281856 1 alpha
# 2.1 2 -0.3149730 1 alpha
# 3.1 3 -2.6412875 1 alpha
# 4.1 4 0.6412990 1 <NA>
# 1.2 1 0.7281856 2 beta
# 2.2 2 -0.3149730 2 beta
# 3.2 3 -2.6412875 2 <NA>
# 4.2 4 0.6412990 2 <NA>
# 1.3 1 0.7281856 3 gamma
# 2.3 2 -0.3149730 3 <NA>
# 3.3 3 -2.6412875 3 <NA>
# 4.3 4 0.6412990 3 <NA>
数据
df1 <- structure(list(id = 1:4, x = structure(c(3L, 2L, 1L, NA), .Label = c("alpha",
"alpha, beta", "alpha, beta, gamma"), class = "factor"), z = c(0.72818559355044,
-0.314973049072542, -2.64128753187138, 0.641298995312115)), class = "data.frame", row.names = c(NA,
-4L))