是否可以计算R中每个数据帧列的大小?

时间:2018-12-06 03:15:03

标签: r dataframe size

在R中,可以获取整个对象的对象大小:

> object.size(dplyr::starwars)
50632 bytes

如果检查数据框,您会发现并非所有列的内容都相似:

> head(dplyr::starwars)
# A tibble: 6 x 13
  name   height  mass hair_color skin_color eye_color birth_year gender homeworld species films vehicles
  <chr>   <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr>  <chr>     <chr>   <lis> <list>  
1 Luke …    172   77. blond      fair       blue            19.0 male   Tatooine  Human   <chr… <chr [2…
2 C-3PO     167   75. NA         gold       yellow         112.  NA     Tatooine  Droid   <chr… <chr [0…
3 R2-D2      96   32. NA         white, bl… red             33.0 NA     Naboo     Droid   <chr… <chr [0…
4 Darth…    202  136. none       white      yellow          41.9 male   Tatooine  Human   <chr… <chr [0…
5 Leia …    150   49. brown      light      brown           19.0 female Alderaan  Human   <chr… <chr [1…
6 Owen …    178  120. brown, gr… light      blue            52.0 male   Tatooine  Human   <chr… <chr [0…
# ... with 1 more variable: starships <list>

很明显,height将比hair_color占用更少的空间。有没有办法检查最大的列?例如,如果数据框较大,则可能需要查看是否有某些列占用不成比例的空间。

1 个答案:

答案 0 :(得分:1)

只需使用lapply / sapply遍历所有列

library(dplyr)

sapply(starwars, object.size)

# name     height       mass hair_color skin_color  eye_color birth_year     gender 
# 5576        392        736       1336       2400       1480        736        936 

# homeworld    species      films   vehicles  starships 
#      3216       2648      17920       5136       6496 

如果您有兴趣了解最大的前几列,可以

sapply(starwars, object.size) %>%
            data.frame() %>%
            add_rownames() %>%
            top_n(5)


#  rowname       .
#  <chr>     <dbl>
#1 name       5576
#2 homeworld  3216
#3 films     17920
#4 vehicles   5136
#5 starships  6496

tail(sort(sapply(starwars, object.size)), 5)

#homeworld  vehicles      name starships     films 
#     3216      5136      5576      6496     17920