我有一个相当大的数据框,并且我试图将此数据框分成多个较小的数据框。 假设我有一个称为df的数据框:
Patient Status cancer
1 1 treated melanoma
2 2 deceased melanoma
3 3 deceased carcinoma
4 4 treated lymphoma
5 5 deceased melanoma
6 6 treated carcinoma
7 7 deceased lymphoma
8 8 deceased carcinoma
9 9 treated melanoma
10 10 treated melanoma
我想基于“癌症”列对数据帧进行子集,并将其存储在各自的对象中,如下所示:
Patient Status cancer
1 3 deceased carcinoma
2 6 treated carcinoma
3 8 deceased carcinoma
Patient Status cancer
1 1 treated melanoma
2 2 deceased melanoma
3 5 deceased melanoma
4 9 treated melanoma
5 10 treated melanoma
Patient Status cancer
1 4 treated lymphoma
2 7 deceased lymphoma
我已经使用dplyr的函数filter
编写了这段代码,并且可以完成这项工作,但是由于我的初始数据帧很大,因此循环使我的计算机阻塞,
factors = c(levels(df[,"cancer"]))
for (i in factors) {
assign(i, filter(df, cancer == i), envir = .GlobalEnv)
}
如果有人能提出更优化的替代方案,我将不胜感激。
最诚挚的问候。
答案 0 :(得分:0)
如果您的数据框的操作通常很慢,请考虑更改为data.table框架。您会对性能的提高感到惊讶。