在R中矢量化列表操作列表

时间:2012-02-09 17:01:19

标签: r vectorization

与我昨天在reshaping matrices in R的问题类似,我现在正在尝试重塑数据框,以便我可以对我的函数进行矢量化。在下面的代码中,主要功能是scorecard。它接收一个名为subset.loanssubset.collateral的数据框。我想知道我是否可以重塑loanscollaterals两个框架,它们看起来像这样:

  LOANS              COLLATERAL           
id | value       id | value   type             
----------       -------------------             
 1     200        1     600      a
 2    4390        1     899      b               
 2     860        2     190      d               
 2    9750        3    4930      e               
 3     600        3     300      a               
 :       :        :       :      :

进入这个:

id | loans             collateral
-----------------------------
 1   c(200)            data.frame(a=c(600,899), b=('a','b'))
 2   c(4390,860,9750)  data.frame(a=c(190), b=c('d'))
 3   c(600)            data.frame(a=c(4930,300), b=c('e','a'))

我希望如果我这样做,我可以使用其中一个*apply函数 - 或plyr工具箱中的某些函数 - 来简单地将scorecard函数应用于整个事情。如果有更好/更简单的方法,请提及它!我正在使用的代码(带有一个被遗忘的for循环)如下:

# An Nx2 data frame of loans (ID, amount)
loans <- read.table(...)

# An Mx4 data frame of collaterals to loans (ID, type, value, lien)
collateral <- read.table(...)

# One person (ID) can have >1 loan and >1 collateral, so first just
# find all unique IDs
loans.ID.unique = unique(loans$ID)

# Run an analysis on each ID grouping:
for(n in 1:length(loans.ID.unique)) {

  # ...all loans for that ID...
  subset.loans      <- loans$loans[
                         which(
                           loans$scorecard_id == loans.ID.unique[n])]

  # ...all collateral for that ID...
  subset.collateral <- collateral[
                         which(
                           collateral$scorecard_id == loans.ID.unique[n]),
                         c('type','value','lien')]

  # Output scores for each ID
  scores[n,1]   <- loans.ID.unique[n]
  scores[n,c(2,3)] <- scorecard(loans=subset.loans,
                                collateral=subset.collateral,
}

谢谢!

2 个答案:

答案 0 :(得分:3)

1)没有数据结构。在R中创建这样的结构是不寻常的。建议你只需抓住你需要的东西。此处LoansCollateral是您的两个输入数据框,loanscollateral是正在处理的当前id的部分。用您自己的代码替换下面函数的双哈希线:

ids <- union(Loans$id, Collateral$id)
do.call("rbind", lapply(ids, function(id) {
    loans <- Loans[Loans$id == id, "value"]
    collateral <- Collateral[Collateral$id == id, -1]
    c(id = id, score = sum(loans) - sum(collateral$value)) ##
}))

增加:

2)矩阵。另一方面,如果我们确实想要创建这样的结构,可以这样做:

ids <- union(Loans$id, Collateral$id) 
m <- cbind(id = ids,
    loans = lapply(ids, function(id)  Loans[Loans$id == id, "value"]),
    collateral = lapply(ids, function(id)  Collateral[Collateral$id == id, -1])
)

do.call("rbind", lapply(1:nrow(m), function(i) with(m[i,],
   c(id = id, score = sum(loans) - sum(collateral$value))
)))

3)数据框。我们可以交替地将结构表示为数据框d <- as.date.frame(m)或以下几乎相同:

d <- data.frame(id = ids,
  loans = I(lapply(ids, function(id)  Loans[Loans$id == id, "value"])),
  collateral = I(lapply(ids, function(id)  Collateral[Collateral$id == id, -1]))
)
do.call("rbind", lapply(1:nrow(m), function(i) with(d, 
   c(id = id[[i]], score = sum(loans[[i]]) - sum(collateral[[i]]$value))
)))

编辑:简化了构建m的代码。

ADDED:数据框表示。

答案 1 :(得分:0)

您根本不需要转换数据。事实上,您正在寻找的转型是不可能的,因为data.frame内部没有data.frame。相反,只需在记分卡功能上尝试使用lapply

# Read in data
loans=data.frame(id=c(1,2,2,2,3),value=c(200,4390,860,9750,600))
col=data.frame(id=c(1,1,2,3,3),value=c(600,899,190,4930,300),type=c('a','b','d','e','a'))

# Load in scorecard function
 scorecard = function(subset.loans,subset.collateral) {
    # Do something other than this
    list(subset.loans,subset.collateral)
 }   

# Use lapply
lapply(unique(loans$id),
function (x) scorecard( loans[loans$id==x,] , col[col$id==x,c('type','value')])
)

如果你想像你提到的那样转换你的数据,你可以用这样做:

loans.agg=aggregate(loans$value,by=list(loans$id),c)
names(loans.agg)=c('id','loans')

col.agg.val=aggregate(col$value,by=list(col$id),c)
names(col.agg.val)=c('id','collateral')

col.agg.type=aggregate(col$type,by=list(col$id),c)
names(col.agg.type)=c('id','type')

# What you probably want
merge(merge(loans.agg,col.agg.val),col.agg.type)