如果之前已经解决了这个问题,请提前道歉,但我已经尝试查看与ddply,sapply和apply相关的所有问题,并且不能在我的生活中想出这一个......
我编写了一个函数countMonths,它将结算周期中的日,月和总天数作为参数,并返回结算周期所属的日历月数:
countMonths <- function(day, month, cycle.days) {
month.days <- c(31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31)
if (month < 1 | month > 12 | floor(month) != month) {
cat("Invalid month value, must be an integer from 1 to 12")
} else if (day < 1 | day > month.days[month]) {
cat("Invalid day value, must be between 1 and month.days[month]")
} else if (cycle.days < 0) {
cat("Invalid cycle.days value, must be >= 0")
} else {
nmonths <- 1
day.ct <- cycle.days - day
while (day.ct > 0) {
nmonths <- nmonths + 1
month <- ifelse(month == 1, 12, month - 1) # sets to previous month
day.ct <- day.ct - month.days[month] # subtracts days of previous month
}
nmonths
}
}
我想将此功能应用于包含客户账单记录的data.frame中的每一行,例如
> head(cons2[-1],10)
kwh cycle.days read.date row.index year month day kwh.per.day
1 381 29 2010-09-02 1 2010 9 2 13.137931
2 280 32 2010-10-04 2 2010 10 4 8.750000
3 282 29 2010-11-02 3 2010 11 2 9.724138
4 330 34 2010-12-06 4 2010 12 6 9.705882
5 371 30 2011-01-05 5 2011 1 5 12.366667
6 405 30 2011-02-04 6 2011 2 4 13.500000
7 441 32 2011-03-08 7 2011 3 8 13.781250
8 290 29 2011-04-06 8 2011 4 6 10.000000
9 296 29 2011-05-05 9 2011 5 5 10.206897
10 378 32 2011-06-06 10 2011 6 6 11.812500
> dput(head(cons2[-1],10))
structure(list(kwh = c(381L, 280L, 282L, 330L, 371L, 405L, 441L,
290L, 296L, 378L), cycle.days = c(29L, 32L, 29L, 34L, 30L, 30L,
32L, 29L, 29L, 32L), read.date = structure(c(1283385600, 1286150400,
1288656000, 1291593600, 1294185600, 1296777600, 1299542400, 1302048000,
1304553600, 1307318400), class = c("POSIXct", "POSIXt"), tzone = "UTC"),
row.index = 1:10, year = c(2010, 2010, 2010, 2010, 2011,
2011, 2011, 2011, 2011, 2011), month = c(9, 10, 11, 12, 1,
2, 3, 4, 5, 6), day = c(2L, 4L, 2L, 6L, 5L, 4L, 8L, 6L, 5L,
6L), kwh.per.day = c(13.1379310344828, 8.75, 9.72413793103448,
9.70588235294118, 12.3666666666667, 13.5, 13.78125, 10, 10.2068965517241,
11.8125)), .Names = c("kwh", "cycle.days", "read.date", "row.index",
"year", "month", "day", "kwh.per.day"), row.names = c(NA, 10L
), class = "data.frame")
我尝试了几种选择,但都没有效果。具体来说,我需要能够将给定var的值作为数据帧中每一行的标量(或长度为1的向量)传递,但它们总是作为向量传递:
> cons2$tot.months <- countMonths(cons2$day, cons2$month, cons2$cycle.days)
Warning messages:
1: In if (month < 1 | month > 12 | floor(month) != month) { :
the condition has length > 1 and only the first element will be used
2: In if (day < 1 | day > month.days[month]) { :
the condition has length > 1 and only the first element will be used
3: In if (cycle.days < 0) { :
the condition has length > 1 and only the first element will be used
4: In while (day.ct > 0) { :
the condition has length > 1 and only the first element will be used
5: In while (day.ct > 0) { :
the condition has length > 1 and only the first element will be used
我终于能够使用ddply获得正确的结果,将每一行视为自己的组,但需要很长时间:
cons2 <- ddply(cons2, .(account, year, month, day), transform,
tot.months = countMonths(day, month, cycle.days)
)
有没有更好的方法将此功能应用于我的数据框的每一行?或者,作为一个相关问题,我如何将数据框中的变量作为标量参数(来自给定行的值)而不是数据框中该变量的所有值的向量传递?如果有人能够从我的思想中指出我在哪里出错,那我会特别感激。
答案 0 :(得分:1)
要使函数起作用,可以使用mapply
,它会将函数连续应用于传递给它的所有向量的每个元素。所以你可以这样做:
mapply(countMonths,cons2$day,cons2$month,cons2$cycle.days)
正如我在评论中提到的,有更简单的方法可以做到这一点。例如,我认为这样可行:
cons2$read.date=as.Date(cons2$read.date)
monnb <- function(d){ lt <- as.POSIXlt(as.Date(d, origin="1900-01-01")); lt$year*12 + lt$mon }
mondf <- function(d1, d2) monnb(d2) - monnb(d1)
mondf(cons2$read.date-cons2$cycle.days,cons2$read.date) + 1
另外,我注意到你试图捕捉到你的功能不起作用的所有条件,这太棒了!有一个非常方便的函数叫stopifnot
,它将用于此目的:
countMonths <- function(day, month, cycle.days) {
month.days <- c(31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31)
stopifnot(month >=1 & month <= 12 & floor(month)==month & cycle.days >=0 & day >= 1 & day <= month.days[month])
nmonths <- 1
day.ct <- cycle.days - day
while (day.ct > 0) {
nmonths <- nmonths + 1
month <- ifelse(month == 1, 12, month - 1) # sets to previous month
day.ct <- day.ct - month.days[month] # subtracts days of previous month
}
nmonths
}
关于你的功能的评论,我认为它有效,但它没有利用R中的向量操作。我从其他答案得到的功能非常光滑,因为它允许你提供一个完整的向量日期,而不是连续循环每一个。