我有以下数据:
Name
1 Braund, Mr. Owen Harris
2 Cumings, Mrs. John Bradley (Florence Briggs Thayer)
3 Heikkinen, Miss. Laina
4 Futrelle, Mrs. Jacques Heath (Lily May Peel)
5 Allen, Mr. William Henry
可以加载数据,如:
structure(list(Name = c("Braund, Mr. Owen Harris", "Cumings, Mrs. John Bradley (Florence Briggs Thayer)",
"Heikkinen, Miss. Laina", "Futrelle, Mrs. Jacques Heath (Lily May Peel)",
"Allen, Mr. William Henry")), .Names = "Name", row.names = c(NA,
-5L), class = c("tbl_df", "tbl", "data.frame"))
我的预期输出是:
Name Title
1 Braund, Mr. Owen Harris Mr
2 Cumings, Mrs. John Bradley (Florence Briggs Thayer) Mrs
3 Heikkinen, Miss. Laina Mrs
4 Futrelle, Mrs. Jacques Heath (Lily May Peel) Mrs
5 Allen, Mr. William Henry Mr
问题是,以下代码会将所有Title
设置为"Mr"
。我正在使用dplyr mutate
的自定义函数。
library('stringr')
library('dplyr')
extractTitle <- function(name) {
str_match(name, '(\\b[a-zA-z]+)\\.')[2]
}
data <- data %>%
mutate(Title = extractTitle(Name))
奇怪的是,如果我更改extractTitle以按原样返回参数,它将按预期工作。例如:
extractTitle <- function(name) {
name
}
data <- data %>%
mutate(Title = extractTitle(Name))
以上代码将返回:
Name Title
1 Braund, Mr. Owen Harris Braund, Mr. Owen Harris
2 Cumings, Mrs. John Bradley (Florence Briggs Thayer) Cumings, Mrs. John Bradley (Florence Briggs Thayer)
3 Heikkinen, Miss. Laina Heikkinen, Miss. Laina
4 Futrelle, Mrs. Jacques Heath (Lily May Peel) Futrelle, Mrs. Jacques Heath (Lily May Peel)
5 Allen, Mr. William Henry Allen, Mr. William Henry
这是我预期的行为,这与我遇到问题的代码的行为不同。
我在这里缺少什么或这是一个错误吗?
P.S。 - 我正在使用dplyr版本0.5.0
答案 0 :(得分:2)
library(dplyr)
library(stringr)
data %>%
mutate(title = str_extract(string = Name, pattern = "(Mr|Miss|Mrs)\\.")) %>%
select(Name, title)
返回:
# A tibble: 6 x 2
Name title
<chr> <chr>
1 Braund, Mr. Owen Harris Mr.
2 Cumings, Mrs. John Bradley (Florence Briggs Thayer) Mrs.
3 Heikkinen, Miss. Laina Miss.
4 Futrelle, Mrs. Jacques Heath (Lily May Peel) Mrs.
5 Allen, Mr. William Henry Mr.
6 Moran, Mr. James Mr.