我在AWS Redshift数据库上使用dplyr
数据库后端。而且因为有些查询需要永远返回,我想缓存它们。我知道基础数据不会改变,所以如果查询没有改变,那么结果集也不会改变。
我在其他地方采取的方法是
{hash}.rds
文件我一直在用dplyr
尝试相同的方法。不幸的是,即使操作保持不变,dplyr生成的SQL查询字符串也会发生变化:
df %>%
select(week, person_id) %>%
group_by(person_id) %>%
mutate(weeks_active = n()) %>%
arrange(weeks_active) %>%
dplyr::sql_render()
产生
<SQL> SELECT *
FROM (SELECT "week", "person_id", COUNT(*) OVER (PARTITION BY "person_id") AS "weeks_active"
FROM (SELECT "week" AS "week", "person_id" AS "person_id"
FROM "fct_person_week") "zznunjjdwe") "ltyyfmiahu"
ORDER BY "weeks_active"
在第一次运行时
<SQL> SELECT *
FROM (SELECT "week", "person_id", COUNT(*) OVER (PARTITION BY "person_id") AS "weeks_active"
FROM (SELECT "week" AS "week", "person_id" AS "person_id"
FROM "fct_person_week") "stxupavckd") "oaknuxjexc"
ORDER BY "weeks_active"
在第二个。有没有办法保持表别名固定?是否有其他查询摘要在多次运行中相同?或者我应该考虑其他缓存方式吗?
答案 0 :(得分:0)
您可以使用compute()
创建临时表。另一种选择是获取生成的SQL并将其转换为View,因此R开发人员只需将其称为表名。