Question

似乎dplyr::pull()和dplyr::select()做同样的事情。除了dplyr::pull()只选择1个变量之外还有区别吗？

Answer 1

首先，它会查看每个函数创建的class。

library(dplyr)

mtcars %>% pull(cyl) %>% class()
#> 'numeric'

mtcars %>% select(cyl) %>% class()
#> 'data.frame'

因此pull()会创建一个向量 - 在本例中为numeric - 而select()会创建一个数据框。

基本上，pull()相当于编写mtcars$cyl或mtcars[, "cyl"]，而select()会删除除cyl以外的所有列但保留数据框结构

Answer 2

您可以将select视为[或magrittr::extract和pull的类似物，作为[[（或$）的类似物或数据框magrittr::extract2（列表的[[类似于purr::pluck）。

df <- iris %>% head

所有这些都提供相同的输出：

df %>% pull(Sepal.Length)
df %>% pull("Sepal.Length")
a <- "Sepal.Length"; df %>% pull(!!quo(a))
df %>% extract2("Sepal.Length")
df %>% `[[`("Sepal.Length")
df[["Sepal.Length"]]

# all of them:
# [1] 5.1 4.9 4.7 4.6 5.0 5.4

所有这些都给出了相同的输出：

df %>% select(Sepal.Length)
a <- "Sepal.Length"; df %>% select(!!quo(a))
df %>% select("Sepal.Length")
df %>% extract("Sepal.Length")
df %>% `[`("Sepal.Length")
df["Sepal.Length"]
# all of them:
#   Sepal.Length
# 1          5.1
# 2          4.9
# 3          4.7
# 4          4.6
# 5          5.0
# 6          5.4

pull和select可以使用literal，character或numeric个索引，而其他人可以character或numeric只有

一个重要的事情是他们如何处理负面指数。

对于select负指数，意味着要删除的列。

对于pull，它们指的是上一栏的数量。

df %>% pull(-Sepal.Length)
df %>% pull(-1)
# [1] setosa setosa setosa setosa setosa setosa
# Levels: setosa versicolor virginica

奇怪的结果，但Sepal.Length转换为1，列-1为Species（最后一列）

[[和extract2不支持此功能：

df %>% `[[`(-1)
df %>% extract2(-1)
df[[-1]]
# Error in .subset2(x, i, exact = exact) : 
#   attempt to select more than one element in get1index <real>

[和extract支持删除列的负索引。

df %>% select(-Sepal.Length)
df %>% select(-1)
df %>% `[`(-1)
df[-1]

#   Sepal.Width Petal.Length Petal.Width Species
# 1         3.5          1.4         0.2  setosa
# 2         3.0          1.4         0.2  setosa
# 3         3.2          1.3         0.2  setosa
# 4         3.1          1.5         0.2  setosa
# 5         3.6          1.4         0.2  setosa
# 6         3.9          1.7         0.4  setosa

dplyr中pull和select之间的区别？

2 个答案: