使用mutate

时间:2017-10-20 10:41:16

标签: r dplyr

一些数据:

x <- structure(list(X. = c("4,084", "4,084", "4,084", "4,084", "4,084"
), ADR = c("1,099.69", "68.66", "232.72", "195.66", "98"), hotel_id = c("2,313,076", 
"583,666", "1,251,372", "1,545,890", "298,160"), city_id = c("9,395", 
"17,193", "5,085", "16,808", "8,584"), star_rating = c(5, 2, 
3, 4, 4), accommodation_type_name = c("Hotel", "Bungalow", "Hotel", 
"Hotel", "Hotel"), chain_hotel = c("chain", "non-chain", "non-chain", 
"non-chain", "non-chain"), booking_date = c("10/5/2016", "12/4/2016", 
"11/6/2016", "10/22/2016", "12/11/2016"), checkin_date = c("10/27/2016", 
"12/9/2016", "11/18/2016", "11/3/2016", "12/11/2016"), checkout_date = c("10/30/2016", 
"12/12/2016", "11/20/2016", "11/4/2016", "12/12/2016"), city = c("A", 
"B", "C", "D", "E")), class = "data.frame", row.names = c(NA, 
-5L), .Names = c("X.", "ADR", "hotel_id", "city_id", "star_rating", 
"accommodation_type_name", "chain_hotel", "booking_date", "checkin_date", 
"checkout_date", "city"))

看起来像这样:

> glimpse(x)
Observations: 27,298
Variables: 11
$ X.                      <chr> "1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14"...
$ ADR                     <chr> "71.06", "76.56", "153.88", "126.6", "115.08", "81.6", "77.16", "168.36",...
$ hotel_id                <chr> "297,388", "298,322", "2,313,076", "2,240,838", "2,240,838", "331,350", "...
$ city_id                 <chr> "9,395", "9,395", "9,395", "9,395", "9,395", "9,395", "9,395", "9,395", "...
$ star_rating             <dbl> 2.5, 3.0, 5.0, 3.5, 3.5, 3.0, 3.0, 5.0, 2.0, 3.0, 4.0, 2.0, 3.0, 2.0, 3.0...
$ accommadation_type_name <chr> "Hotel", "Hotel", "Hotel", "Hotel", "Hotel", "Hotel", "Hotel", "Hotel", "...
$ chain_hotel             <chr> "non-chain", "non-chain", "chain", "non-chain", "non-chain", "non-chain",...
$ booking_date            <chr> "8/2/2016", "8/2/2016", "8/2/2016", "8/4/2016", "8/4/2016", "8/4/2016", "...
$ checkin_date            <chr> "10/1/2016", "10/1/2016", "10/1/2016", "10/2/2016", "10/2/2016", "10/3/20...
$ checkout_date           <chr> "10/2/2016", "10/2/2016", "10/2/2016", "10/3/2016", "10/3/2016", "10/5/20...
$ city                    <chr> "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A"...

我想改变列ADR:star_rating。具体来说,我想找出任何逗号。

我试过了:

x <- x %>% 
  mutate_each(ADR:star_rating, funs(gsub ",", ""))

但这会引发错误:

Error: unexpected string constant in:
"x <- x %>% 
  mutate_each(ADR:star_rating, funs(gsub ",""

在基地r我可以这样:

vars <- c("ADR", "hotel_id", "city_id", "star_rating")
x[vars] <- lapply(x[vars], function(i) gsub(",", "", i))

但是,如果我可以在dplyr链中执行此操作,那将是方便的并且意味着我不必像在声明变量时那样写出每个变量,我可以使用ADR:star_rating。

如何在dplyr中使用mutate实现这一目标?

2 个答案:

答案 0 :(得分:3)

我认为它几乎就在那里。我使用了mutate_at(我认为mutate_each已弃用)并在vars中包含了变量名称:

library(dplyr)
x %>% mutate_at(vars(ADR:star_rating), funs(stringr::str_replace_all(., ",", "")))
#>      X.     ADR hotel_id city_id star_rating accommodation_type_name
#> 1 4,084 1099.69  2313076    9395           5                   Hotel
#> 2 4,084   68.66   583666   17193           2                Bungalow
#> 3 4,084  232.72  1251372    5085           3                   Hotel
#> 4 4,084  195.66  1545890   16808           4                   Hotel
#> 5 4,084      98   298160    8584           4                   Hotel
#>   chain_hotel booking_date checkin_date checkout_date city
#> 1       chain    10/5/2016   10/27/2016    10/30/2016    A
#> 2   non-chain    12/4/2016    12/9/2016    12/12/2016    B
#> 3   non-chain    11/6/2016   11/18/2016    11/20/2016    C
#> 4   non-chain   10/22/2016    11/3/2016     11/4/2016    D
#> 5   non-chain   12/11/2016   12/11/2016    12/12/2016    E

答案 1 :(得分:1)

请注意:“funs() 自 dplyr 0.8.0 起已弃用”。

对于 dplyr,现在需要将 list()~ 一起使用,以一次在多个列上执行所需的 lambda 函数`。

library(dplyr)
x <- x 
  %>% mutate_at(vars(ADR:star_rating), list(~ stringr::str_replace_all(., ",", "")))

print(x)
     X.     ADR hotel_id city_id star_rating accommodation_type_name chain_hotel booking_date checkin_date checkout_date city
1 4,084 1099.69  2313076    9395           5                   Hotel       chain    10/5/2016   10/27/2016    10/30/2016    A
2 4,084   68.66   583666   17193           2                Bungalow   non-chain    12/4/2016    12/9/2016    12/12/2016    B
3 4,084  232.72  1251372    5085           3                   Hotel   non-chain    11/6/2016   11/18/2016    11/20/2016    C
4 4,084  195.66  1545890   16808           4                   Hotel   non-chain   10/22/2016    11/3/2016     11/4/2016    D
5 4,084      98   298160    8584           4                   Hotel   non-chain   12/11/2016   12/11/2016    12/12/2016    E