我正在使用Kaggles https://www.kaggle.com/c/two-sigma-connect-rental-listing-inquiries/data
json训练文件以分析功能和数据,并应用其他算法来检查是否可以提高准确性。
例如,我有一列:功能:
示例:
l <- structure(list(`4` = c("Dining Room", "Pre-War", "Laundry in Building",
"Dishwasher", "Hardwood Floors", "Dogs Allowed", "Cats Allowed"
), `6` = c("Doorman", "Elevator", "Laundry in Building", "Dishwasher",
"Hardwood Floors", "No Fee"), `9` = c("Doorman", "Elevator",
"Laundry in Building", "Laundry in Unit", "Dishwasher", "Hardwood Floors"
), `10` = list(), `15` = c("Doorman", "Elevator", "Fitness Center",
"Laundry in Building")), .Names = c("4", "6", "9", "10", "15"
))
我想建立一个看起来像这样的数据框:
name nested list
4 <list = list(c("Dining Room", "Pre-War", "Laundry in Building",
"Dishwasher", "Hardwood Floors", "Dogs Allowed", "Cats Allowed"))>
6 <list = list(c("Doorman", "Elevator", "Laundry in Building", "Dishwasher", "Hardwood Floors", "No Fee"))>
9 <list = list(c("Doorman", "Elevator",
"Laundry in Building", "Laundry in Unit", "Dishwasher", "Hardwood Floors"))>
10 <list = list(c())>
15 <list = list(c("Doorman", "Elevator", "Fitness Center",
"Laundry in Building")))>
请告知操作方法。
我有点困惑如何转换。
我的最终目标是建立一个将所有这些功能结合在一起的数据帧,如果每个具有这些功能的4、6、10、15 ...分别具有自己的1和0,则对它们进行一次热编码。 >
请告知。
答案 0 :(得分:1)
一种方法是使用参数为data.table::rbindlist()
的{{1}}函数。这使您可以绑定具有不同列数的数据帧。但是,在您的情况下,技巧是使空数据框也显示在其中。为此,我们添加了一条if语句,该语句为空列表元素(即
fill = TRUE
数据框
NA
给出,
library(data.table) rbindlist(lapply(l, function(i) {d <- as.data.frame(t(i)); if(!ncol(d)){d <- data.frame(V1 = NA)}; d}), fill = TRUE)