我有以下数据框,并且我将顶级产品中的第一个product_id替换为不会出现在该行中的NAs。为了给出一些背景信息,这些是产品推荐。
虽然我对plyr和sapply有一些经验,但我很难找到实现这一目标的正确方法。
我认为下面的代码说明了一切。
> head(recs_with_na)
V1 V2 V3 V4
148 1227 1213 <NA> <NA>
249 1169 1221 <NA> <NA>
553 1227 1162 <NA> <NA>
732 1227 1162 <NA> <NA>
765 1227 1162 <NA> <NA>
776 1227 1162 <NA> <NA>
> top_products
product_id count
21 1162 7917
65 1213 4839
19 1160 4799
11 1152 3543
34 1175 3423
75 1227 2719
2 1143 2396
13 1154 2168
> fill_nas_with_top <- function(data, top_products) {
+ top_products_copy <- top_products
+ mydata <- data
+ #mydata <- as.data.frame(data)
+ for (i in 1:4) {
+ if (is.na(mydata[,i])) {
+ mydata[,i] <- top_products_copy[1,1]
+ top_products_copy <- top_products_copy[-1,]
+
+ }
+ else {
+ top_products_copy <- top_products_copy[top_products_copy[,1] != mydata[,i],]
+ }
+ }
+ return(mydata)
+ }
> sapply(recs_with_na, fill_nas_with_top, top_products)
Show Traceback
Rerun with Debug
Error in `[.default`(mydata, , i) : incorrect number of dimensions
答案 0 :(得分:1)
R uses pass-by-value semantics. Your function will get copies of data and top_products each time it is called so no need for you to make defensive copies.
Because pass-by-value means creating copies (and for many other reasons too), it is a good practice to give your functions the smallest possible amount of information they need to accomplish their task. In this case, you don't need to pass the whole top_products data frame. A vector of product_ids will do.
fill_nas_with_top <- function(data, top) {
for (i in 1:4) {
d <- data[i]
if (is.na(d)) {
## Find the first not already existing value
for (t in top) {
top <- top[-1]
if (!t %in% data) {
data[i] <- t
break;
}
}
} else {
# This no longer assumes that product_ids in top are ordered as in data
if (d %in% top) top <- top[-which(d == top)]
}
}
return(data)
}
Called like this (observe that we call it with a vector of product_ids in top_products):
as.data.frame(t(apply(recs_with_na, 1, fill_nas_with_top, top_products[,1])))
will produce:
V1 V2 V3 V4
1 1227 1213 1162 1160
2 1169 1221 1162 1213
3 1227 1162 1213 1160
4 1227 1162 1213 1160
5 1227 1162 1213 1160
6 1227 1162 1213 1160