用data_frame()替换data.frame,用dplyr用bind_cols()替换cbind

时间:2015-01-14 12:53:37

标签: r dplyr sp

我正在尝试使用dplyr的最新功能重写我的部分代码,将data.frame()替换为data_frame(),将cbind()替换为bind_cols()

library(rgeos)
library(dplyr)

mc <- montreal %>%
  gCentroid(byid=TRUE) %>%
  data.frame %>%
  cbind(., name = montreal[["NOM"]])

当我尝试将data.frame替换为data_frame时,我得到:

Error: data_frames can only contain 1d atomic vectors and lists

当我尝试将cbind替换为bind_cols时,我得到:

Error: object at index 2 not a data.frame

有没有办法让这项工作成功?

此处,montreal是SpatialPolygonsDataframe:

GEOJSON文件:http://elm.bi/limadmin.json

montreal <- readOGR("data/limadmin.json", "OGRGeoJSON")

1 个答案:

答案 0 :(得分:4)

所以我最终在这两种方法上运行microbenchmark,因为使用它感觉有点奇怪:

mc <- montreal %>% 
    gCentroid(byid=TRUE) %>% 
    data.frame %>% 
    bind_cols(., data_frame(name=montreal[["NOM"]]))

我尝试了两个不同的数据集:

world <- readOGR("data/world.json", "OGRGeoJSON")

wmbm = microbenchmark(
  base = world %>% 
    gCentroid(byid=TRUE) %>% 
    data.frame %>% 
    cbind(., name=world[["name"]]),
  dplyr = world %>% 
    gCentroid(byid=TRUE) %>% 
    data.frame %>% 
    bind_cols(., data_frame(name=world[["name"]])),
  times=100
)

Microbenchmark结果:

Unit: milliseconds
  expr      min       lq     mean   median       uq      max neval
  base 13.78396 14.08301 14.21357 14.12023 14.16435 20.04362   100
 dplyr 13.87098 14.10680 14.25245 14.14330 14.18020 17.63248   100

enter image description here

montreal <- readOGR("data/limadmin.json", "OGRGeoJSON")

lmbm = microbenchmark(
  base = montreal %>% 
    gCentroid(byid=TRUE) %>% 
    data.frame %>% 
    cbind(., name=montreal[["NOM"]]),
  dplyr = montreal %>% 
    gCentroid(byid=TRUE) %>% 
    data.frame %>% 
    bind_cols(., data_frame(name=montreal[["NOM"]])),
  times=100
  )

Microbenchmark结果:

Unit: milliseconds
  expr      min       lq     mean   median       uq      max neval
  base 1.597957 1.628723 1.736709 1.651747 1.686554 3.091738   100
 dplyr 1.621092 1.642678 1.756978 1.659041 1.739707 3.751866   100

enter image description here

这里没有真正的结论。虽然看起来有点慢,但我会坚持使用dplyr - esque解决方案来确保一致性。