在Redshift dplyr操作中重命名汇总列

时间:2018-05-14 17:22:46

标签: r dplyr amazon-redshift dbplyr

我正在使用dplyr在Redshift中执行某些操作,因此我不会将数据加载到内存中。

data <- tbl(conn, "customers") %>%
  filter(age >= 18)
subset <- data %>% 
  filter(eye_color != "brown") %>%
  group_by(gender, method, age, region) %>% 
  summarise(sum(purchases)) %>%  # will create a column called sum(purchases)
  full_join(data, by=c("region", "age", "method"))

现在,当我查看结果数据框时,我会看到一个名为sum(purchases)的列,我想将其重命名为purchases,这将创建列purchase.x和{{ 1}}合并后。

到目前为止,我read的大部分重命名都是处理内存中的数据帧,而不是使用dbplyr进行懒惰评估的数据帧。我尝试使用purchase.yrenamerename_以及rename_at的不同变体。我也尝试过制定herehere但没有运气的策略

有没有办法重命名select。我唯一的另一个选择是在某个步骤中将数据帧加载到内存中

sum(purchases)

1 个答案:

答案 0 :(得分:3)

You can assign names in summarise. I don't have your data so I can't triple-check, but I've used this in my own code before when calling summarise(n()). Something like...

summarise(your_column_name = sum(purchases))

You can also pass it a column name with spaces, you just have to use backticks

summarise(`your column name` = sum(purchases))