在dplyr链中格式化tbl

时间:2017-06-04 02:56:48

标签: r dplyr

我正在尝试在我的数据中添加数千个逗号,例如10,000美元以及美元$ 10,000。

我使用了多个dplyr命令以及tidyr收集和传播功能。这就是我的尝试:

剪切n粘贴此代码块以生成随机数据"数据集"我正在与之合作:

library(dplyr)
library(tidyr)
library(lubridate)

## Generate some data
channels <- c("Facebook", "Youtube", "SEM", "Organic", "Direct", "Email")
last_month <- Sys.Date() %m+% months(-1) %>% floor_date("month")
mts <- seq(from = last_month %m+% months(-23), to = last_month, by = "1 month") %>% as.Date()
dimvars <- expand.grid(Month = mts, Channel = channels, stringsAsFactors = FALSE)

# metrics
rws <- nrow(dimvars)
set.seed(42)

# generates variablility in the random data
randwalk <- function(initial_val, ...){
  initial_val + cumsum(rnorm(...))
}
Sessions <- ceiling(randwalk(3000, n = rws, mean = 8, sd = 1500)) %>% abs()
Revenue <- ceiling(randwalk(10000, n = rws, mean = 0, sd = 3500)) %>% abs()

# make primary df
dataset <- cbind(dimvars, Revenue)

看起来像:

> tbl_df(dataset)
# A tibble: 144 × 3
        Month  Channel Revenue
       <date>    <chr>   <dbl>
1  2015-06-01 Facebook    8552
2  2015-07-01 Facebook   12449
3  2015-08-01 Facebook   10765
4  2015-09-01 Facebook    9249
5  2015-10-01 Facebook   11688
6  2015-11-01 Facebook    7991
7  2015-12-01 Facebook    7849
8  2016-01-01 Facebook    2418
9  2016-02-01 Facebook    6503
10 2016-03-01 Facebook    5545
# ... with 134 more rows

现在,我希望将这几个月分为几列,以按渠道显示收入趋势,逐月显示。我可以这样做:

revenueTable <- dataset %>% select(Month, Channel, Revenue) %>%
  group_by(Month, Channel) %>%
  summarise(Revenue = sum(Revenue)) %>%
  #mutate(Revenue = paste0("$", format(Revenue, big.interval = ","))) %>%
  gather(Key, Value, -Channel, -Month) %>%
  spread(Month, Value) %>%
  select(-Key)

它看起来几乎完全符合我的要求:

> revenueTable
# A tibble: 6 × 25
   Channel `2015-06-01` `2015-07-01` `2015-08-01` `2015-09-01` `2015-10-01` `2015-11-01` `2015-12-01` `2016-01-01` `2016-02-01` `2016-03-01` `2016-04-01`
*    <chr>        <dbl>        <dbl>        <dbl>        <dbl>        <dbl>        <dbl>        <dbl>        <dbl>        <dbl>        <dbl>        <dbl>
1   Direct        11910         8417         4012          359         4473         2702         6261         6167         8630         5230         1394
2    Email         7244         3517          671         1339        10788        10575         8567         8406         7856         6345         7733
3 Facebook         8552        12449        10765         9249        11688         7991         7849         2418         6503         5545         3908
4  Organic         4191          978          219         4274         2924         4155         5981         9719         8220         8829         7024
5      SEM         2344         6873        10230         6429         5016         2964         3390         3841         3163         1994         2105
6  Youtube          186         2949         2144         5073         1035         4878         7905         7377         2305         4556         6247
# ... with 13 more variables: `2016-05-01` <dbl>, `2016-06-01` <dbl>, `2016-07-01` <dbl>, `2016-08-01` <dbl>, `2016-09-01` <dbl>, `2016-10-01` <dbl>,
#   `2016-11-01` <dbl>, `2016-12-01` <dbl>, `2017-01-01` <dbl>, `2017-02-01` <dbl>, `2017-03-01` <dbl>, `2017-04-01` <dbl>, `2017-05-01` <dbl>

现在,我正在努力奋斗。我想将数据格式化为货币。我尝试在链中的summarise()gather()之间添加此内容:

mutate(Revenue = paste0("$", format(Revenue, big.interval = ","))) %>%

这一半起作用。美元符号前置,但逗号分隔符不显示。我尝试删除paste0(&#34; $&#34;部分,看看我是否可以使逗号格式化工作没有成功。

如何将我的tbl格式化为带有美元和逗号的货币,四舍五入到最接近的整数(不是1.99美元,只需2美元)?

2 个答案:

答案 0 :(得分:3)

我认为你最后可以dplyr::mutate_at()完成此任务。

revenueTable %>% mutate_at(vars(-Channel), funs(. %>% round(0) %>% scales::dollar()))

#> # A tibble: 6 x 25
#>    Channel `2015-06-01` `2015-07-01` `2015-08-01` `2015-09-01`
#>      <chr>        <chr>        <chr>        <chr>        <chr>
#> 1   Direct      $11,910       $8,417       $4,012         $359
#> 2    Email       $7,244       $3,517         $671       $1,339
#> 3 Facebook       $8,552      $12,449      $10,765       $9,249
#> 4  Organic       $4,191         $978         $219       $4,274
#> 5      SEM       $2,344       $6,873      $10,230       $6,429
#> 6  Youtube         $186       $2,949       $2,144       $5,073
#> # ... with 20 more variables: `2015-10-01` <chr>, `2015-11-01` <chr>,
#> #   `2015-12-01` <chr>, `2016-01-01` <chr>, `2016-02-01` <chr>,
#> #   `2016-03-01` <chr>, `2016-04-01` <chr>, `2016-05-01` <chr>,
#> #   `2016-06-01` <chr>, `2016-07-01` <chr>, `2016-08-01` <chr>,
#> #   `2016-09-01` <chr>, `2016-10-01` <chr>, `2016-11-01` <chr>,
#> #   `2016-12-01` <chr>, `2017-01-01` <chr>, `2017-02-01` <chr>,
#> #   `2017-03-01` <chr>, `2017-04-01` <chr>, `2017-05-01` <chr>

答案 1 :(得分:1)

我们可以使用data.table

library(data.table)
nm1 <- setdiff(names(revenueTable), 'Channel')
setDT(revenueTable)[, (nm1) := lapply(.SD, function(x) 
         scales::dollar(round(x))), .SDcols =  nm1]

revenueTable[, 1:3, with = FALSE]
#     Channel `2015-06-01` `2015-07-01`
#1:   Direct      $11,910       $8,417
#2:    Email       $7,244       $3,517
#3: Facebook       $8,552      $12,449
#4:  Organic       $4,191         $978
#5:      SEM       $2,344       $6,873
#6:  Youtube         $186       $2,949