我的数据如下。
visitor_id timestamp distance_value guest_reviews price rating_value
--kkVxJWTRGDccUSZG9u2g 8/16/2016 14:03 7.441392 355 199 4.3
--kkVxJWTRGDccUSZG9u2g 8/16/2016 14:03 7.351424 359 110.67 4.4
--kkVxJWTRGDccUSZG9u2g 8/16/2016 14:03 17.556168 204 79.34 3.9
--kkVxJWTRGDccUSZG9u2g 8/16/2016 14:03 2.469943 429 159 4.2
-1IIpqtwRqeAV1P7yh0upw 8/10/2016 2:33 21.654525 142 58.79 4.1
-1IIpqtwRqeAV1P7yh0upw 8/10/2016 2:33 0.567264 436 83.29 4.4
-1IIpqtwRqeAV1P7yh0upw 8/10/2016 2:33 10.063784 195 56.95 4.2
尝试使用最后4列上的lapply来使用缩放函数
对它们进行标准化cols<-c("distance_value","min_avg_nightly_before_tax","rating_value","guest_reviews")
norm_cols<-c("norm_distance_value","norm_min_avg_nightly_before_tax","norm_rating_value","norm_guest_reviews")
myframe1[, (norm_cols):=lapply(.SD, scale), by= list(visitor_id, timestamp), .SDcols=cols]
然而,这给了我以下错误
Error in `[.data.table`(myframe1, , `:=`((norm_cols), lapply(.SD, scale)), :
All items in j=list(...) should be atomic vectors or lists. If you are trying something like j=list(.SD,newcol=mean(colA)) then use := by group instead (much quicker), or cbind or merge afterwards.
以下是dput(head(myframe2))
输出的样本数据的部分输出“Zzy8RX7KSniWaNLyHwUXvQ”,“zZYJITYrSHmYJv8MUDw6Cw”, “zzZ6OXcvROe88wD4JjTtnA”,“zzZBq1DsQe6PYM7AQNmbUQ”, “zzzi__t-SRW9OZOKBdDfwg”,“ZZZTTaS6RD6bQ2KzdaiSVA”),class = “因素”),时间戳=结构(c(1471381408.339, 1471381408.339,1471381408.339,1471381408.339,1471381408.339, 1471381408.339),class = c(“POSIXct”,“POSIXt”)),distance_value = c(7.4413922836545, 7.35142425353227,17.5561677012408,2.46994294033727,24.8529546453572, 21.8463254946658),rating_value = c(4.3,4.4,3.9,4.2,3.1, 4.4),guest_reviews = c(355L,359L,204L,429L,305L,633L), min_avg_nightly_before_tax = c(199,110.67,79.34,159,77.62, 101.37)),。Name = c(“visitor_id”,“timestamp”,“distance_value”,“rating_value”,“guest_reviews”,“min_avg_nightly_before_tax”), row.names = c(NA,6L),class =“data.frame”)