Question

我有一个类似于此的大型数据集

   StudentID SectorID ClassID
1          A   Team_1 Class_1
2          A   Team_1 Class_1
3          B   Team_1 Class_1
4          B   Team_2 Class_1
5          B   Team_2 Class_1
6          A   Team_2 Class_1
7          A   Team_3 Class_1
8          C   Team_3 Class_2
9          C   Team_3 Class_2
10         C   Team_3 Class_2
11         C   Team_3 Class_2
12         C   Team_1 Class_2
13         D   Team_1 Class_2
14         D   Team_1 Class_2

这可以通过

生成

stg <- data.frame(StudentID = c( rep("A", 2), rep("B", 3), rep("A", 2), rep("C", 5), rep("D", 2)  ),
                  SectorID  = c(rep("Team_1", 3), rep("Team_2", 3), rep("Team_3", 5), rep("Team_1", 3)),               
                  ClassID     = c(rep("Class_1", 7), rep("Class_2", 7) )            
)

然后，设法找到按每个扇区分组的StudentID的频率，然后是Class。

stg.a <- aggregate(stg$StudentID, by =  list(SectorID = stg$SectorID, ClassID = stg$ClassID), count )

但是这里count会返回某种复杂列表。如果你检查stg.a，你会产生奇怪或明显误导的输出。所以，我把它转换成矩阵，

stg.a.f <- as.data.frame(as.matrix(stg.a))

，看起来像这样，

  SectorID ClassID  x.x x.freq
1   Team_1 Class_1 1, 2   2, 1
2   Team_2 Class_1 1, 2   1, 2
3   Team_3 Class_1    1      1
4   Team_1 Class_2 3, 4   1, 2
5   Team_3 Class_2    3      4

第一行是，在Team_1中，在Class_1学生编号1（ID：A）中，出现了2次，学生编号2（身份证号码B），出现1次。

现在，我想把这个显示为图，主要是boxplot，比如Y轴，我想查看频率（如果可能的话，用Student_ID（xx）的颜色分隔）然后按一些因素分组（例如，Team ，班级）

Answer 1

我看到尝试将频率视为复杂而没有预期的结果。我建议通过查找组合项并记录其频率来创建简单的data.frame。这可以使用compare == 0函数完成，如下所示：

table

可能存在零值的组合。根据所需的分析，您可能有或没有保留或从集合中删除它们的价值。如果删除零值是最佳选项，则运行命令

stg.a <- as.data.frame(table(stg$StudentID, stg$SectorID, stg$ClassID))
names(stg.a)<-c(colnames(stg), 'Freq')

这应该为构建图表提供了一个更简单的平台。

如果您需要有关情节的帮助，请告诉我。我要求提供比OP中提供的关于你想要展示什么以及如何显示的更清晰。

如何从R中的多列分组频率绘制箱线图？

1 个答案: