我的数据如下:
我想分别计算每个位置的总通过次数,同样分别计算R中每个位置的失败总数。我怎样才能实现?
Here is the output of dput(head(data1)) PS:失败==失败lens_bubble
答案 0 :(得分:1)
当数据格式整齐时,执行此类分析要容易得多。您可以详细了解here的含义。
首先,让我们创建一个数据示例。
library(tidyverse)
set.seed(327)
dta <- tibble(
tool_number = 200:212,
Err = sample(c("pass", "fail"), size = 13, replace = TRUE),
pos01 = sample(c(0, 1), size = 13, replace = TRUE),
pos02 = sample(c(0, 1), size = 13, replace = TRUE),
pos03 = sample(c(0, 1), size = 13, replace = TRUE),
pos04 = sample(c(0, 1), size = 13, replace = TRUE),
pos05 = sample(c(0, 1), size = 13, replace = TRUE),
pos06 = sample(c(0, 1), size = 13, replace = TRUE),
pos07 = sample(c(0, 1), size = 13, replace = TRUE),
pos08 = sample(c(0, 1), size = 13, replace = TRUE),
pos09 = sample(c(0, 1), size = 13, replace = TRUE),
pos10 = sample(c(0, 1), size = 13, replace = TRUE),
date = sample(seq(
as.Date("2017-01-01"), as.Date("2017-12-31"), by = "day"
), 13)
)
下一步使用gather()
功能更改数据布局,使其整洁。
dta <- gather(dta, key = position, value = value, pos01:pos10)
现在,您可以使用group_by()
和summarise()
函数查找每个位置的通过次数和失败次数。
dta %>%
group_by(Err, position) %>%
summarise(count = sum(value))
# # A tibble: 20 x 3
# # Groups: Err [?]
# Err position count
# <chr> <chr> <dbl>
# 1 fail pos01 2
# 2 fail pos02 1
# 3 fail pos03 3
# 4 fail pos04 0
如果您希望数据看起来更像您开始时的数据,则可以spread()
结果。
dta %>%
group_by(Err, position) %>%
summarise(count = sum(value)) %>%
spread(key = Err, value = count)
# # A tibble: 10 x 3
# position fail pass
# * <chr> <dbl> <dbl>
# 1 pos01 2 4
# 2 pos02 1 2
# 3 pos03 3 4
# 4 pos04 0 5
答案 1 :(得分:0)
我同意安德鲁关于整洁的数据,但基本的R解决方案将是
sapply(data1[, 3:12], function(x) sum(x[data1$Err == "pass"]))
和
sapply(data1[, 3:12], function(x) sum(x[data1$Err == "fail"]))