我有一个很小的操作员计时数据集。运算符1-6在响应中计时。我需要创建一个频率表,以2秒为间隔总结它们的响应时间。
数据如下:
Operator 1 24.5
Operator 1 26.3
Operator 1 32.9
Operator 1 33.4
Operator 1 40.5
Operator 1 47.7
所需的输出看起来像这样:
Seconds Operator 1 Operator 2 Operator 3
0-2 0 2 5
3-4 1 5 3
5-6 5 0 4
答案 0 :(得分:0)
我模拟了一些看起来像您的数据的数据,以向您展示如何做。您将必须安装tibble
,magrittr
和dplyr
软件包,以使管道%>%
和功能正常运行:
从此开始:
library(tibble)
library(magrittr)
library(dplyr)
# simulate data
ops <- sample(c("Operator 1","Operator 2","Operator 3"),100,replace=TRUE)
tms <- rnorm(100,mean=20,sd=4)
df <- as.tibble(cbind(ops,tms))
df$ops <- as.factor(df$ops)
df$tms <- as.numeric(df$tms)
然后根据您定义的垃圾箱对df
进行排序(breaks
之后更改代码,以根据计时数据的特性获得所需的方式):
> results <- df %>% group_by(ops) %>%
mutate(category=cut(tms, breaks=c(-Inf,0,10,20,30,Inf),
labels=c("-Inf-0 sec","0-10 sec","10-20 sec","20-30 sec","30-Inf sec")))
> results
# A tibble: 100 x 3
# Groups: ops [3]
ops tms category
<fct> <dbl> <fct>
1 Operator 1 16.6 10-20 sec
2 Operator 2 25.1 20-30 sec
3 Operator 3 20.4 20-30 sec
4 Operator 1 19.7 10-20 sec
5 Operator 3 23.6 20-30 sec
6 Operator 3 22.6 20-30 sec
7 Operator 1 14.6 10-20 sec
8 Operator 3 19.6 10-20 sec
9 Operator 3 22.3 20-30 sec
10 Operator 2 18.1 10-20 sec
# ... with 90 more rows
您可以按照上面指定的格式检查数据,如下所示:
> table(results$ops,results$category)
-Inf-0 sec 0-10 sec 10-20 sec 20-30 sec 30-Inf sec
Operator 1 0 0 24 13 1
Operator 2 0 0 13 13 0
Operator 3 0 0 12 24 0
或
> table(results$category,results$ops)
Operator 1 Operator 2 Operator 3
-Inf-0 sec 0 0 0
0-10 sec 0 0 0
10-20 sec 23 22 18
20-30 sec 12 13 12
30-Inf sec 0 0 0
答案 1 :(得分:0)
使用tidyverse
和cutr::smart_cut
,并借用@mysteRious的数据:
数据
set.seed(1)
ops <- sample(c("Operator 1","Operator 2","Operator 3"),100,replace=TRUE)
tms <- rnorm(100,mean=20,sd=4)
df <- as.tibble(cbind(ops,tms))
df$ops <- as.factor(df$ops)
df$tms <- as.numeric(df$tms)
解决方案:
library(tidyverse)
# devtools::install_github("moodymudskipper/cutr")
library(cutr)
df %>%
mutate(Seconds = smart_cut(
tms,list(2,0), "width", labels = ~paste0(.y[1], "-", .y[2]-1), open_end=TRUE)) %>%
count(ops, Seconds) %>%
spread(ops, n)
# # A tibble: 9 x 4
# Seconds `Operator 1` `Operator 2` `Operator 3`
# <ord> <int> <int> <int>
# 1 12-13 4 2 1
# 2 14-15 2 1 4
# 3 16-17 6 7 6
# 4 18-19 7 7 8
# 5 20-21 3 10 6
# 6 22-23 1 5 4
# 7 24-25 2 3 4
# 8 26-27 1 2 1
# 9 28-29 1 1 1
答案 2 :(得分:0)
这是一个使用基数R的cut()
函数创建间隔,并使用dcast()
包中的reshape2
函数从长格式更改为宽格式的解决方案,从而进行汇总(计数):
# create sample dataset
set.seed(123L)
n_row <- 100L
df <- data.frame(
ops = sample(c("Operator 1", "Operator 2", "Operator 3"), n_row, replace = TRUE),
tms = rnorm(n_row, mean = 20, sd = 4))
# define parameter
intval <- 2
# create pretty breaks depending on range of response times
breaks <-with(df,
seq(floor(min(tms) / intval) * intval, max(tms) + intval, intval))
# reshape from long to wide format and aggregate by interval
library(reshape2)
dcast(df, cut(tms, breaks) ~ ops, length, value.var = "tms")
cut(tms, breaks) Operator 1 Operator 2 Operator 3 1 (10,12] 1 0 1 2 (12,14] 1 4 1 3 (14,16] 2 4 3 4 (16,18] 5 7 3 5 (18,20] 9 3 9 6 (20,22] 5 9 7 7 (22,24] 5 2 4 8 (24,26] 3 2 3 9 (26,28] 1 2 1 10 (28,30] 1 1 1