在R中,可以获取整个对象的对象大小:
> object.size(dplyr::starwars)
50632 bytes
如果检查数据框,您会发现并非所有列的内容都相似:
> head(dplyr::starwars)
# A tibble: 6 x 13
name height mass hair_color skin_color eye_color birth_year gender homeworld species films vehicles
<chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr> <chr> <lis> <list>
1 Luke … 172 77. blond fair blue 19.0 male Tatooine Human <chr… <chr [2…
2 C-3PO 167 75. NA gold yellow 112. NA Tatooine Droid <chr… <chr [0…
3 R2-D2 96 32. NA white, bl… red 33.0 NA Naboo Droid <chr… <chr [0…
4 Darth… 202 136. none white yellow 41.9 male Tatooine Human <chr… <chr [0…
5 Leia … 150 49. brown light brown 19.0 female Alderaan Human <chr… <chr [1…
6 Owen … 178 120. brown, gr… light blue 52.0 male Tatooine Human <chr… <chr [0…
# ... with 1 more variable: starships <list>
很明显,height
将比hair_color
占用更少的空间。有没有办法检查最大的列?例如,如果数据框较大,则可能需要查看是否有某些列占用不成比例的空间。
答案 0 :(得分:1)
只需使用lapply
/ sapply
遍历所有列
library(dplyr)
sapply(starwars, object.size)
# name height mass hair_color skin_color eye_color birth_year gender
# 5576 392 736 1336 2400 1480 736 936
# homeworld species films vehicles starships
# 3216 2648 17920 5136 6496
如果您有兴趣了解最大的前几列,可以
sapply(starwars, object.size) %>%
data.frame() %>%
add_rownames() %>%
top_n(5)
# rowname .
# <chr> <dbl>
#1 name 5576
#2 homeworld 3216
#3 films 17920
#4 vehicles 5136
#5 starships 6496
或
tail(sort(sapply(starwars, object.size)), 5)
#homeworld vehicles name starships films
# 3216 5136 5576 6496 17920