我希望在以下数据框中为每个“名称”计算不同数量的“类型”。
到目前为止,我使用了一个循环,这可能是一个糟糕的R编码习惯。你知道如何改进代码吗?
library(data.table) # for function 'as.data.table'
library(dplyr) # for function 'n_distinct'
original = data.frame(Name = c(rep(1,10),rep(2,10),rep(3,10)),
Type = c(1,2,1,3,1,2,1,2,3,1,4,5,4,5,4,5,4,5,4,5,6,7,8,9,6,7,8,9,6,9))
我需要这个只包含名称的数据框,以便输入从数据中获得的所有相关信息。
# creates a data table containing only one row per Name
onerow <- as.data.table(original) # from library 'data.table'
onerow <- unique(onerow, by = "Name")
# now transform 'onerow' to data frame and retain the column of interest ("Name")
onerow <- as.data.frame(onerow)
onerow <- as.data.frame(onerow[, 1])
names(onerow) <- "Name"
循环旨在计算每个名称的类型数。我的真实数据集将有超过60个人(每个人约300行,每行是记录类型),不同类型的计数范围在5到13之间。
# ugly loop to determine for each "Name" the count of different "Type"
for (i in 1:max(original$Name)){
ssp <- assign(paste("SSP_", i, sep = ""), original[original$Name == i, ])
# 'n_distinct' is from library 'dplyr', equivalent to length(unique(ssp$Type)), but faster
cou <- assign(paste("count_", i, sep = ""), n_distinct(ssp$Type))
onerow[i, 2] <- cou
}
names(onerow) <- c("Name","Count")
其他问题:如何避免在全球环境中创建'count_i','cou'和'ssp'变量?