我正在尝试将函数应用于数据框的每个元素。这种数据框的一个简单例子是:
> accts
ACCOUNT DATE
1 2008-03-01
2 2009-06-17
3 2008-07-02
4 2009-03-15
我需要做的是查看此数据框的每一行,然后在更大的数据框中查找该帐户,如下所示:
> trans
ACCOUNT_NUM TRAN_DATE
1 2008-02-02
2 2008-04-02
3 2008-03-16
3 2009-08-22
3 2008-05-05
6 2010-11-03
7 2008-09-18
4 2009-10-14
4 2009-01-15
10 2011-07-06
对于'accts'数据框中的每一行,我需要在对应于该帐户的'trans'数据框中获取记录,该记录也具有最接近'DATE'但在其之前发生的'TRAN_DATE'。我尝试使用apply函数:
tranDateVector <- apply(accts, 2, getTranDate)
getTranDate <- function(x)
{
tranDate <- subset(trans$TRAN_DATE, with(trans, ACCOUNT_NUM == x[1] & TRAN_DATE < x[2]))
dataDiff <- x[2] - tranDate
tranDate <- unique(date[which(dateDiff == min(dateDiff))])
return(tranDate)
}
accts <- cbind(accts, tranDateVector)
当我运行我的迷你示例时,我收到以下错误:
Error in charToDate(x) :
character string is not in a standard unambiguous format
然而,当我运行我的完整版本时,我得到了一个不同的错误,我已经意识到这是来自这一行:
subset(trans$TRAN_DATE, with(trans, ACCOUNT_NUM == x[1] & TRAN_DATE < x[2]))
如果我将x设置为'accts'数据框的第三行,那么:
x
ACCOUNT DATE
3 3 2008-07-02
并运行'subset'代码行我得到以下错误,这对应于我在常规代码中得到的错误:
> subset(trans$TRAN_DATE, with(trans, ACCOUNT_NUM == x[1] & TRAN_DATE < x[2]))
Error in eval(expr, envir, enclos) :
dims [product 1] do not match the length of object [10]
In addition: Warning message:
In eval(expr, envir, enclos) :
Incompatible methods ("Ops.Date", "Ops.data.frame") for "<"
感谢您的帮助。
(以下信息是在提供上述答案后添加的,b / c我意识到有并发症)
我刚刚意识到需要考虑的功能有其他限制,这些会导致问题变得有点复杂。在'accts'数据框中有两种不同的状态:
> accts <- data.frame(
+ ACCOUNT = 1:4,
+ DATE = as.Date(c("2008-03-01", "2009-06-17",
+ "2008-07-02", "2009-03-15")),
+ STATUS = c("new", "old", "new", "old"))
在'accts'框架中,记录可以分为旧版或新版。如果该帐户是“新”的,而不是满足前面指定的条件,但它也必须只与“trans”中标记为“已修订”的记录匹配。同样对于“旧”帐户,它们只能与trans的“orig”记录进行比较:
> trans <- data.frame(
+ ACCOUNT_NUM = c(1,2,3,3,3,6,7,4,4,10),
+ TRAN_DATE = as.Date(c("2008-02-02", "2008-04-02",
+ "2008-03-16", "2009-08-22",
+ "2008-05-05", "2010-11-03",
+ "2008-09-18", "2009-10-14",
+ "2009-01-15", "2011-07-06")),
+ BALANCE = c("orig", "orig", "orig", "orig", "revised", "orig", "revised", "revised", "revised", "orig"))
我尝试按照以下方式实现您的代码以适应这种情况:
library(plyr)
adply(accts, 1, transform,
TRAN_DATE = {
if(STATUS == "old")
{
data <- subset(trans, ACCOUNT_NUM == ACCOUNT &
TRAN_DATE < DATE & BALANCE == "orig")
}else{
data <- subset(trans, ACCOUNT_NUM == ACCOUNT &
TRAN_DATE < DATE & BALANCE == "revised")
}
tail(data$TRAN_DATE, 1) })
我从此代码中收到以下错误:
Error in data.frame(list(ACCOUNT = 1L, DATE = 13939, STATUS = 1L), BALANCE = list( :
arguments imply differing number of rows: 1, 0
我很抱歉在我的帖子中未指明此要求,我没有意识到这会导致问题。
答案 0 :(得分:4)
因为数据混合类型(数字,日期),所以我不会使用apply
,因为它会将您的数据强制转换为单一类型。相反,我建议使用plyr
的{{1}}函数,该函数会保留所有类型,因为每行都作为data.frame处理。它还有一个优点,即仍然可以使用列名访问字段,这通常会产生更易读的代码,因为我会让你判断。
您的数据:
adply
使用accts <- data.frame(
ACCOUNT = 1:4,
DATE = as.Date(c("2008-03-01", "2009-06-17",
"2008-07-02", "2009-03-15")))
trans <- data.frame(
ACCOUNT_NUM = c(1,2,3,3,3,6,7,4,4,10),
TRAN_DATE = as.Date(c("2008-02-02", "2008-04-02",
"2008-03-16", "2009-08-22",
"2008-05-05", "2010-11-03",
"2008-09-18", "2009-10-14",
"2009-01-15", "2011-07-06")))
的解决方案:
adply