R列表到宽(稀疏)数据帧

时间:2015-11-23 09:54:31

标签: r list dataframe

我第一次来这里,所以我希望我不会打破任何事情...... 我有一份清单清单:

Browse[2]> head(str(mylist))
List of 33
 $ : chr [1:33] "0001" "space" "28" "night_club" ...
 $ : chr [1:33] "0002" "concert" "28" "night_club" ...
 $ : chr [1:31] "0003" "night_club" "24" "martial_arts" ...
 $ : chr [1:31] "0004" "stage" "24" "basketball" ...
 $ : chr [1:43] "0005" "night_club" "16" "concert" ...
 $ : chr [1:43] "0006" "night_club" "16" "concert" ...
 $ : chr [1:39] "0007" "night_club" "22" "concert" ...
 $ : chr [1:39] "0008" "night_club" "22" "concert" ...
 $ : chr [1:31] "0009" "night_club" "46" "martial_arts" ...
 $ : chr [1:31] "0010" "night_club" "46" "martial_arts" ...
 $ : chr [1:41] "0011" "night_club" "17" "martial_arts" ...
 $ : chr [1:41] "0012" "night_club" "17" "martial_arts" ...
 $ : chr [1:29] "0013" "concert" "23" "night_club" ...
 $ : chr [1:29] "0014" "concert" "23" "night_club" ...
 $ : chr [1:25] "0015" "night_club" "26" "concert" ...
 $ : chr [1:31] "0016" "night_club" "42" "concert" ...
 $ : chr [1:31] "0017" "night_club" "42" "concert" ...
 $ : chr [1:31] "0018" "night_club" "25" "wrestling" ...
 $ : chr [1:31] "0019" "night_club" "25" "wrestling" ...
 $ : chr [1:33] "0020" "night_club" "46" "wrestling" ...
 $ : chr [1:33] "0021" "night_club" "46" "wrestling" ...
 $ : chr [1:41] "0022" "concert" "21" "stage" ...
 $ : chr [1:41] "0023" "concert" "21" "stage" ...
 $ : chr [1:55] "0024" "basketball" "8" "concert" ...
 $ : chr [1:55] "0025" "basketball" "8" "concert" ...
 $ : chr [1:37] "0026" "bald_person" "26" "martial_arts" ...
 $ : chr [1:37] "0027" "bald_person" "26" "martial_arts" ...
 $ : chr [1:37] "0028" "night_club" "32" "business_meeting" ...
 $ : chr [1:37] "0029" "night_club" "32" "business_meeting" ...
 $ : chr [1:15] "0030" "night_club" "59" "stage" ...
 $ : chr [1:37] "0031" "stage" "12" "night_club" ...
 $ : chr [1:37] "0032" "stage" "12" "night_club" ...
 $ : chr [1:33] "0033" "night_club" "23" "portrait" ...

我想将此列表转换为宽格式数据框,其中第一列是每个内部列表的第一个元素(即" 0001"," 0002"等等)并且文件中将存在所有可能包含类别的列: "空间"," night_club","音乐会"," marital_arts","摔跤"等等 这意味着我将使用一个非常宽的数据框,每行将以一些id(0001,0002,0003 ...)开头,列名称将再次成为文件中的所有类别:" space",& #34; night_club","音乐会"," marital_arts","摔跤"对于该ID存在类别的每一行,它将填充列表中类别旁边的值("空格" - > 28,例如第一行)。

我试图用循环构建一个规范化的数据框,然后将其转换为宽格式,但是随着数据的扩展,这将是一个坏主意:

for (file in files){# iterate over files in folder

    mylist <- strsplit(readLines(file), ":")
    #close(mylist)
    for (elem in mylist){
      dataframe <- data.frame(frameid = numeric(), category = character(), nrow = length(unlist(elem)))
      frameid <- rep.int(elem[[1]], length(elem)-1) 
      categories <- elem[-1:-1]
      dataframe$frameid <- frameid
      dataframe$category <- categories
    }
  }

可重复输入输出示例: 投入的输入:

 list(c("0001", "space", "28", "night_club", "25"), c("0002", 
"concert", "28", "night_club", "26"), c("0003", "night_club", 
"24", "martial_arts", "27"), c("0004", "stage", "24", "basketball", 
"30"))

输出:

Dataframe
frameid, cat_space, cat_night_club, cat_concert, cat_martial_arts, cat_stage, cat_basketball
0001, 28, 25, 0, 0, 0, 0
0002, 0, 26, 28, 0, 0, 0
0003, 0, 24, 0, 27, 0, 0
0004, 0, 0, 0, 0, 24, 30

0 个答案:

没有答案