Question

我有以下数据表

PIECE   SAMPLE  QC_CODE
1       1       1
2       1       NA  
3       2       2
4       2       4
5       2       NA
6       3       6
7       3       3
8       3       NA
9       4       6
10      4       NA

我想计算每个样本中qc_code的数量并返回这样的输出

SAMPLE    SAMPLE_SIZE    QC_CODE_COUNT
1         2              1
2         3              2
3         3              2
4         2              1

样本大小是每个样本中的片数，qc_code_count是不是NA的al qc_code的计数。

我将如何在R

中解决这个问题

Answer 1

你可以尝试

library(dplyr)
 df1 %>%
     group_by(SAMPLE) %>% 
     summarise(SAMPLE_SIZE=n(), QC_CODE_UNIT= sum(!is.na(QC_CODE)))

 #   SAMPLE SAMPLE_SIZE QC_CODE_UNIT
 #1      1           2            1
 #2      2           3            2
 #3      3           3            2
 #4      4           2            1

或者

library(data.table)
setDT(df1)[,list(SAMPLE_SIZE=.N, QC_CODE_UNIT=sum(!is.na(QC_CODE))), by=SAMPLE]

或使用aggregate

中的base R

do.call(data.frame,aggregate(QC_CODE~SAMPLE, df1, na.action=NULL,
  FUN=function(x) c(SAMPLE_SIZE=length(x), QC_CODE_UNIT= sum(!is.na(x)))))

数据

df1 <- structure(list(PIECE = 1:10, SAMPLE = c(1L, 1L, 2L, 2L, 2L, 3L, 
 3L, 3L, 4L, 4L), QC_CODE = c(1L, NA, 2L, 4L, NA, 6L, 3L, NA, 
6L, NA)), .Names = c("PIECE", "SAMPLE", "QC_CODE"), class = "data.frame", 
row.names = c(NA, -10L))

R聚合和计数非空

1 个答案:

数据