我首先建立一个空的数据框:
results <- data.frame(ID=numeric(0), StartDate=numeric(0), term_type=character(0), EndDate=numeric(0), stringsAsFactors = FALSE)
然后我有一个唯一的ID号列表: uniqueIds <-c(1234,4566,7838)
我有一个函数getDataForGivenId
,它会产生以下格式的数据帧:
ID, StartDate, term_type, EndDate
我需要代码为每个ID调用函数getDataForGivenId
并将结果数据帧附加到空数据帧results
。
我尝试过:
library(dplyr)
results <- bind_rows(results, (lapply(uniqueIds, getDataForGivenId)))
和
do.call("rbind", lapply(uniqueIds, getDataForGivenId))
和
for (Id in uniqueIds) {
Y <- getDataForGivenId(Id)
results <- rbind(results, Y)
}
每次我最终都会得到一个空的results
数据框。
请注意,如果我不做任何事情,而只是执行代码:
Y <- getDataForGivenId(1234)
results <- rbind(results, Y)
我得到了期望的输出。
有人知道我在做什么错吗?
编辑-我的完整脚本在下面。
library(dplyr)
library(lubridate)
enVariables <- Sys.getenv()
username <- enVariables[["DB_USERNAME"]]
password <- enVariables[["DB_PASSWORD"]]
results <- data.frame(ID=numeric(0), StartDate=numeric(0), term_type=character(0), EndDate=numeric(0), stringsAsFactors = FALSE)
getConnection <- function(){
require(RMySQL)
username <- username
password <- password
con <- dbConnect(
MySQL(), user=username, password=password,
dbname='database', host='host', port=port
)
return(con)
}
queryuniqueIds <- "SELECT DISTINCT(id) FROM table LIMIT 5"
con <- getConnection()
uniqueIds <- dbGetQuery(con, queryuniqueIds)
dbDisconnect(con)
getDataForGivenID <- function(idNumber) {
queryData <- paste0(
"SELECT ",
"Id, bill_date, bill_hour ",
"FROM table ",
"WHERE id = ", idNumber
)
con <- getConnection()
Data <- dbGetQuery(con, queryData)
dbDisconnect(con)
X <- Data %>%
select(ID, bill_date, bill_hour) %>%
mutate(
bill_date_x = ymd_hms(bill_date)
) %>%
arrange(ID, bill_date, bill_hour)
hour(X$bill_date_x) <- X$bill_hour
X <- X %>%
mutate(
lag_x = lag(bill_date_x, 1),
lag_diff = difftime(bill_date_x,lag_x, units = "hours") %>% as.integer(),
lead_x = lead(bill_date_x, 1),
lead_diff = difftime(lead_x, bill_date_x, units = "hours") %>% as.integer()
)
Y <- X %>%
filter(
is.na(lag_diff) |
is.na(lead_diff) |
!(lag_diff == 1 & lead_diff == 1),
is.na(lag_diff) |
is.na(lead_diff) |
!(lag_diff == 0 | lead_diff == 0)
) %>%
mutate(
term_type = "N",
term_type = replace(term_type, lead_diff == 1, "S"),
term_type = replace(term_type, lag_diff == 1, "E")
)
Y <- Y %>%
select(ID, bill_date_x, term_type) %>%
mutate(
lead_date = lead(bill_date_x, 1)
) %>%
filter(term_type == "S")
colnames(Y) <- c("ID", "StartDate", "term_type", "EndDate")
return(Y)
}
do.call("rbind", lapply(uniqueIds, getDataForGivenID))
View(results)
答案 0 :(得分:1)
我终于弄清楚了我的问题。
列表uniqueIds
的长度为1。R一次传入整个列表,导致SQL语句仅返回第一个id的数据。
我改变了
uniqueIds <- dbGetQuery(con, queryuniqueIds)
到
uniqueIds <- as.data.frame(dbGetQuery(con, queryuniqueIds))
和
do.call("rbind", lapply(uniqueIds, getDataForGivenId))
到
results <- do.call("rbind", lapply(uniqueIds$id, getDataForGivenId))
现在一切正常。谢谢那些提供帮助的人。