基于另一个变量的变量摘要统计

时间:2020-10-01 17:38:43

标签: r structure summary

我试图找出ID中有多少个x值,其中一些值被重复,然后根据新结果找到总体的最小值,最大值,IQR和中位数;

ID <- c("ID004", "ID004", "ID004", "ID004", "ID006", "ID006", "ID006", "ID006", "ID006", "ID006", "ID006", "ID006", "ID006", "ID006", "ID006", "ID009", "ID009", "ID009", "ID009", "ID009", "ID009", "ID020", "ID020")
D <- c("CMP-001", "CMP-001","CMP-001","CMP-001","CMP-001", "CMP-001","CMP-002", "CMP-002", "CMP-002", "CMP-003", "CMP-003", "CMP-003", "CMP-004", "CMP-004", "CMP-004", "CMP-001", "CMP-001", "CMP-001", "CMP-001", "CMP-002", "CMP-002", "CMP-001", "CMP-001")
X <- c(3,3,3,3,1,1,3,3,3,1,1,1,4,4,4,4,4,4,4,2,2,2,2)
data <- data.frame(ID, D, X)

我们首先找到每个ID有多少个x值;

ID.       No. of X values
ID004.          1
ID006.          4
ID009           2
ID020           1

然后基于此结果,我们应该得到以下结果;

                          Min.    Median.    Max.     IQR
Number of X per ID        1         1.5        4      3-1

我认为我们需要创建一个新变量,其中每个ID包含X值。然后找到新变量的汇总统计信息

感谢您的帮助

1 个答案:

答案 0 :(得分:0)

希望这个答案:

> data %>% group_by(ID) %>% summarise(Min = min(X), Median = median(X), Max = max(X), IQR = IQR(X), No_of_X_values = length(rle(X)[[1]]))
`summarise()` ungrouping output (override with `.groups` argument)
# A tibble: 4 x 6
  ID      Min Median   Max   IQR No_of_X_values
  <chr> <dbl>  <dbl> <dbl> <dbl>          <int>
1 ID004     3      3     3   0                1
2 ID006     1      3     4   2.5              4
3 ID009     2      4     4   1.5              2
4 ID020     2      2     2   0                1
> 

可以将ID和x值的数量存储在新的数据框中,并获取x值数量的摘要统计信息:

> x_values <- data %>% group_by(ID) %>% summarise(No_of_X_values = length(rle(X)[[1]]))
`summarise()` ungrouping output (override with `.groups` argument)
> x_values
# A tibble: 4 x 2
  ID    No_of_X_values
  <chr>          <int>
1 ID004              1
2 ID006              4
3 ID009              2
4 ID020              1
> summary(x_values$No_of_X_values)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
    1.0     1.0     1.5     2.0     2.5     4.0