R-lapply()函数无法创建可读的输出

时间:2018-06-21 20:31:59

标签: r dataframe time-series lapply

我有一个很大的表,里面有一堆数据,但是相关的列是serialNumberdate

我的目标是创建一个新表,该表为我提供每个序列号的连续连续几天的开始日期和结束日期。像这样:

serialNumber,    minDate,      maxDate
1111,            2009-02-15,   2011-07-01
1111,            2014-09-01,   2015-04-12
1111,            2017-12-11,   NA
2222,            2016-07-11,   2018-07-01

通过运行下面的代码片段,我能够一次获得我需要一个序列号的数据,但是我为尝试让我的脚本以上述格式输出数据而感到困惑。

这是我的剧本:

library(RMySQL)
library(dplyr)

db <- dbConnect(MySQL(), user=username, password=password, 
            dbname='database', host='host')
results = data.frame(serialNumber = numeric(), minDate = as.Date(numeric(), origin="1970-01-01"), maxDate = as.Date(numeric(), origin="1970-01-01"))

queryUniqueSerialNumbers <- "SELECT DISTINCT(serialNumber) FROM myTable"
uniqueSerialNumberIds <- dbGetQuery(db, queryUniqueSerialNumbersIds)

geTimeDataForGivenSerialNumber <- function(serialNumber) {
  queryTimeData <- paste0("SELECT * FROM myTable WHERE serialNumber = ", serialNumber) 
  timeData <- dbGetQuery(db, queryTimeData)
  dateRanges <- as.vector(rle(timeData$date)$values)
  unbrokenRuns <- split(as.Date(dateRanges), cumsum(c(TRUE, diff(as.Date(dateRanges)) != 1L)))
  record <- createRecordOfTimeSpan(unbrokenRuns)
  serialNumbers <- as.list(rep(serialNumberNumber, length(results)))
  results <- cbind(serialNumbers, record)
  return(results)
}

createRecordOfTimeSpan <- function(unbrokenRuns) {
  mins <- lapply(unbrokenRuns, min)
  maxs <- lapply(unbrokenRuns, max)
  record <- data.frame(minDate = mins, maxDate = maxs)
  return(record)
}

results <- as.data.frame(lapply(uniqueSerialNumbers, getTimeDataForGivenserialNumber))

0 个答案:

没有答案