mutate()中的Dplyr管道(%>%)?

时间:2016-09-02 14:37:19

标签: r dplyr

dplyr中的管道很酷,有时我想通过对其应用多个命令来清理一列。有没有办法在mutate()命令中使用管道?在使用正则表达式时我最常注意到这一点,并且在其他情况下也会出现。在下面的例子中,我可以清楚地看到我正在应用于列的不同操作" Clean"如果有办法在%>%内模仿mutate(),我很好奇。

library(dplyr)
phone <- data.frame(Numbers = c("1234567890", "555-3456789", "222-222-2222",   
                                "5131831249", "123.321.1234","(333)444-5555",
                                "+1 123-223-3234", "555-666-7777 x100"), 
                                stringsAsFactors = F)

phone2 <- phone %>%
          mutate(Clean = gsub("[A-Za-z].*", "", Numbers), #remove extensions
                 Clean = gsub("[^0-9]", "", Clean),       #remove parentheses, dashes, etc
                 Clean = substr(Clean, nchar(Clean)-9, nchar(Clean)), #grab the right 10 characters
                 Clean = gsub("(^\\d{3})(\\d{3})(\\d{4}$)", "(\\1)\\2-\\3", Clean)) #format

phone2

我知道可能有一个更好的gsub()命令但是出于这个问题的目的,我想知道是否有办法将这些gsub()元素组合在一起以便我不会这样做。我必须继续写Clean = gsub(...),但也不必使用我将它们嵌入彼此的方法。

如果你用一个更简单的例子来回答这个问题,我会没事的。

3 个答案:

答案 0 :(得分:15)

不要落入无尽管道的陷阱。为了可读性和效率做正确的事情,写一个函数。

phone %>% mutate(Clean = cleanPhone(Numbers))
#             Numbers         Clean
# 1        1234567890 (123)456-7890
# 2       555-3456789 (555)345-6789
# 3      222-222-2222 (222)222-2222
# 4        5131831249 (513)183-1249
# 5      123.321.1234 (123)321-1234
# 6     (333)444-5555 (333)444-5555
# 7   +1 123-223-3234 (123)223-3234
# 8 555-666-7777 x100 (666)777-7100

自定义功能:

cleanPhone <- function(x) {
  x2 <- gsub("[^0-9]", "", x)
  x3 <- substr(x2, nchar(x2)-9, nchar(x2))
  gsub("(^\\d{3})(\\d{3})(\\d{4}$)", "(\\1)\\2-\\3", x3)
}

答案 1 :(得分:6)

我想你需要

phone %>% 
     mutate(Clean = gsub("[A-Za-z].*", "", Numbers) %>%
                    gsub("[^0-9]", "", .) %>%
                    substr(., nchar(.)-9, nchar(.)) %>% 
                    gsub("(^\\d{3})(\\d{3})(\\d{4}$)", "(\\1)\\2-\\3", .))
#            Numbers         Clean
#1        1234567890 (123)456-7890
#2       555-3456789 (555)345-6789
#3      222-222-2222 (222)222-2222
#4        5131831249 (513)183-1249
#5      123.321.1234 (123)321-1234
#6     (333)444-5555 (333)444-5555
#7   +1 123-223-3234 (123)223-3234
#8 555-666-7777 x100 (555)666-7777

答案 2 :(得分:0)

即使回答了问题,请考虑使用magrittr代替dplyr的方法

require(magrittr)
phone <- data.frame(Numbers = c("1234567890", "555-3456789", "222-222-2222",   
                                "5131831249", "123.321.1234","(333)444-5555",
                                "+1 123-223-3234", "555-666-7777 x100"), 
                                stringsAsFactors = F)
phone

cleanchain<-  phone$Numbers %>%  gsub("[A-Za-z].*", "", .) %>% gsub("[^0-9]", "", .) %>% substr(., nchar(.)-9, nchar(.)) %>% gsub("(^\\d{3})(\\d{3})(\\d{4}$)", "(\\1)\\2-\\3", .) 

cleanchain  

data.frame(old=phone$Numbers,new=cleanchain, stringsAsFactors = F)