是否存在以下for循环的矢量化解决方案。它是一个包含医疗机构入院数据的大型数据集。
EDITED
library(lubridate)
dateSeq <- as.Date(c("2015-01-01", "2015-02-01"))
admissionDate <- as.Date(c("2015-01-03", "2015-01-06", "2015-01-10", "2015-01-05", "2015-01-07", "2015-02-03", "2015-02-06"))
Dfactor <- c("elective", "acute", "elective", "acute", "acute", "elective", "acute")
Dfactor <- factor(Dfactor)
df <- data.frame(admissionDate, Dfactor)
# loop through large dataset collecting tabulated data from a factorised vector for each month (admissions date) based on 'dateSeq'
Dfactorsums <- c()
for (i in 1:length(dateSeq)) {
monthSub <- df[(df$admissionDate >= as.Date(timeFirstDayInMonth(dateSeq[i]))) & (df$admissionDate <= as.Date(timeLastDayInMonth(dateSeq[i]))), ]
x <- table(monthSub$Dfactor)
Dfactorsums[i] <- as.numeric((x[1]))
}
print(Dfactorsums)
# Outcome = [1] 3 1
# Question is rather than use a for loop is there a 'vectorized' solution.
答案 0 :(得分:1)
这不是技术上和矢量化的#34;但是应该做你以后的事情,而且应该很快。
library( data.table )
setDT( df )
df[ , month := format( AdmissionsDate, "%m" ) ]
df[ , table( Dfactor )[2], by = month ]
我们将列设置为月份,以便更轻松地按月进行子集化,然后提取每个月所需的值。这应该输出一个两列数据表,第二列等于你的Dfactor
输出向量。