与我昨天在reshaping matrices in R的问题类似,我现在正在尝试重塑数据框,以便我可以对我的函数进行矢量化。在下面的代码中,主要功能是scorecard
。它接收一个名为subset.loans
和subset.collateral
的数据框。我想知道我是否可以重塑loans
和collaterals
两个框架,它们看起来像这样:
LOANS COLLATERAL
id | value id | value type
---------- -------------------
1 200 1 600 a
2 4390 1 899 b
2 860 2 190 d
2 9750 3 4930 e
3 600 3 300 a
: : : : :
进入这个:
id | loans collateral
-----------------------------
1 c(200) data.frame(a=c(600,899), b=('a','b'))
2 c(4390,860,9750) data.frame(a=c(190), b=c('d'))
3 c(600) data.frame(a=c(4930,300), b=c('e','a'))
我希望如果我这样做,我可以使用其中一个*apply
函数 - 或plyr
工具箱中的某些函数 - 来简单地将scorecard
函数应用于整个事情。如果有更好/更简单的方法,请提及它!我正在使用的代码(带有一个被遗忘的for
循环)如下:
# An Nx2 data frame of loans (ID, amount)
loans <- read.table(...)
# An Mx4 data frame of collaterals to loans (ID, type, value, lien)
collateral <- read.table(...)
# One person (ID) can have >1 loan and >1 collateral, so first just
# find all unique IDs
loans.ID.unique = unique(loans$ID)
# Run an analysis on each ID grouping:
for(n in 1:length(loans.ID.unique)) {
# ...all loans for that ID...
subset.loans <- loans$loans[
which(
loans$scorecard_id == loans.ID.unique[n])]
# ...all collateral for that ID...
subset.collateral <- collateral[
which(
collateral$scorecard_id == loans.ID.unique[n]),
c('type','value','lien')]
# Output scores for each ID
scores[n,1] <- loans.ID.unique[n]
scores[n,c(2,3)] <- scorecard(loans=subset.loans,
collateral=subset.collateral,
}
谢谢!
答案 0 :(得分:3)
1)没有数据结构。在R中创建这样的结构是不寻常的。建议你只需抓住你需要的东西。此处Loans
和Collateral
是您的两个输入数据框,loans
和collateral
是正在处理的当前id
的部分。用您自己的代码替换下面函数的双哈希线:
ids <- union(Loans$id, Collateral$id)
do.call("rbind", lapply(ids, function(id) {
loans <- Loans[Loans$id == id, "value"]
collateral <- Collateral[Collateral$id == id, -1]
c(id = id, score = sum(loans) - sum(collateral$value)) ##
}))
增加:
2)矩阵。另一方面,如果我们确实想要创建这样的结构,可以这样做:
ids <- union(Loans$id, Collateral$id)
m <- cbind(id = ids,
loans = lapply(ids, function(id) Loans[Loans$id == id, "value"]),
collateral = lapply(ids, function(id) Collateral[Collateral$id == id, -1])
)
do.call("rbind", lapply(1:nrow(m), function(i) with(m[i,],
c(id = id, score = sum(loans) - sum(collateral$value))
)))
3)数据框。我们可以交替地将结构表示为数据框d <- as.date.frame(m)
或以下几乎相同:
d <- data.frame(id = ids,
loans = I(lapply(ids, function(id) Loans[Loans$id == id, "value"])),
collateral = I(lapply(ids, function(id) Collateral[Collateral$id == id, -1]))
)
do.call("rbind", lapply(1:nrow(m), function(i) with(d,
c(id = id[[i]], score = sum(loans[[i]]) - sum(collateral[[i]]$value))
)))
编辑:简化了构建m
的代码。
ADDED:数据框表示。
答案 1 :(得分:0)
您根本不需要转换数据。事实上,您正在寻找的转型是不可能的,因为data.frame
内部没有data.frame
。相反,只需在记分卡功能上尝试使用lapply
。
# Read in data
loans=data.frame(id=c(1,2,2,2,3),value=c(200,4390,860,9750,600))
col=data.frame(id=c(1,1,2,3,3),value=c(600,899,190,4930,300),type=c('a','b','d','e','a'))
# Load in scorecard function
scorecard = function(subset.loans,subset.collateral) {
# Do something other than this
list(subset.loans,subset.collateral)
}
# Use lapply
lapply(unique(loans$id),
function (x) scorecard( loans[loans$id==x,] , col[col$id==x,c('type','value')])
)
如果你想像你提到的那样转换你的数据,你可以用这样做:
loans.agg=aggregate(loans$value,by=list(loans$id),c)
names(loans.agg)=c('id','loans')
col.agg.val=aggregate(col$value,by=list(col$id),c)
names(col.agg.val)=c('id','collateral')
col.agg.type=aggregate(col$type,by=list(col$id),c)
names(col.agg.type)=c('id','type')
# What you probably want
merge(merge(loans.agg,col.agg.val),col.agg.type)