自定义函数为dplyr的mutate中的所有行返回相同的值

时间:2016-07-16 11:18:28

标签: r dplyr

我有以下数据:

                                                 Name
1                             Braund, Mr. Owen Harris
2 Cumings, Mrs. John Bradley (Florence Briggs Thayer)
3                              Heikkinen, Miss. Laina
4        Futrelle, Mrs. Jacques Heath (Lily May Peel)
5                            Allen, Mr. William Henry

可以加载数据,如:

structure(list(Name = c("Braund, Mr. Owen Harris", "Cumings, Mrs. John Bradley (Florence Briggs Thayer)", 
"Heikkinen, Miss. Laina", "Futrelle, Mrs. Jacques Heath (Lily May Peel)", 
"Allen, Mr. William Henry")), .Names = "Name", row.names = c(NA, 
-5L), class = c("tbl_df", "tbl", "data.frame"))

我的预期输出是:

                                                 Name    Title
1                             Braund, Mr. Owen Harris       Mr
2 Cumings, Mrs. John Bradley (Florence Briggs Thayer)      Mrs
3                              Heikkinen, Miss. Laina      Mrs
4        Futrelle, Mrs. Jacques Heath (Lily May Peel)      Mrs
5                            Allen, Mr. William Henry       Mr

问题是,以下代码会将所有Title设置为"Mr"。我正在使用dplyr mutate的自定义函数。

library('stringr')
library('dplyr')

extractTitle <- function(name) {
  str_match(name, '(\\b[a-zA-z]+)\\.')[2]
}

data <- data %>% 
          mutate(Title = extractTitle(Name))

奇怪的是,如果我更改extractTitle以按原样返回参数,它将按预期工作。例如:

extractTitle <- function(name) {
  name
}

data <- data %>% 
          mutate(Title = extractTitle(Name))

以上代码将返回:

                                                 Name    Title
1                             Braund, Mr. Owen Harris   Braund, Mr. Owen Harris
2 Cumings, Mrs. John Bradley (Florence Briggs Thayer)   Cumings, Mrs. John Bradley (Florence Briggs Thayer)
3                              Heikkinen, Miss. Laina   Heikkinen, Miss. Laina
4        Futrelle, Mrs. Jacques Heath (Lily May Peel)   Futrelle, Mrs. Jacques Heath (Lily May Peel)
5                            Allen, Mr. William Henry   Allen, Mr. William Henry

这是我预期的行为,这与我遇到问题的代码的行为不同。

我在这里缺少什么或这是一个错误吗?

P.S。 - 我正在使用dplyr版本0.5.0

1 个答案:

答案 0 :(得分:2)

library(dplyr)
library(stringr)    
data %>%
      mutate(title = str_extract(string = Name, pattern = "(Mr|Miss|Mrs)\\.")) %>%
      select(Name, title)

返回:

# A tibble: 6 x 2
                                                 Name title
                                                <chr> <chr>
1                             Braund, Mr. Owen Harris   Mr.
2 Cumings, Mrs. John Bradley (Florence Briggs Thayer)  Mrs.
3                              Heikkinen, Miss. Laina Miss.
4        Futrelle, Mrs. Jacques Heath (Lily May Peel)  Mrs.
5                            Allen, Mr. William Henry   Mr.
6                                    Moran, Mr. James   Mr.