矢量化解决方案为循环

时间:2017-06-04 08:01:35

标签: r

是否存在以下for循环的矢量化解决方案。它是一个包含医疗机构入院数据的大型数据集。

EDITED

library(lubridate)

dateSeq  <- as.Date(c("2015-01-01", "2015-02-01"))

admissionDate  <- as.Date(c("2015-01-03", "2015-01-06", "2015-01-10", "2015-01-05", "2015-01-07", "2015-02-03", "2015-02-06"))
Dfactor  <- c("elective", "acute", "elective", "acute", "acute", "elective", "acute")
Dfactor  <- factor(Dfactor)
df  <- data.frame(admissionDate, Dfactor)
# loop through large dataset collecting tabulated data from a factorised vector for each month (admissions date) based on 'dateSeq'


Dfactorsums  <- c()

for (i in 1:length(dateSeq)) {
    monthSub  <- df[(df$admissionDate >= as.Date(timeFirstDayInMonth(dateSeq[i]))) & (df$admissionDate <= as.Date(timeLastDayInMonth(dateSeq[i]))), ]
    x  <- table(monthSub$Dfactor)
    Dfactorsums[i]  <- as.numeric((x[1]))
}

print(Dfactorsums)   
# Outcome = [1] 3 1
# Question is rather than use a for loop is there a 'vectorized' solution.

1 个答案:

答案 0 :(得分:1)

这不是技术上和矢量化的#34;但是应该做你以后的事情,而且应该很快。

library( data.table )
setDT( df )

df[ , month := format( AdmissionsDate, "%m" ) ]
df[ , table( Dfactor )[2], by = month ]

我们将列设置为月份,以便更轻松地按月进行子集化,然后提取每个月所需的值。这应该输出一个两列数据表,第二列等于你的Dfactor输出向量。