Question

在数据库中存储指标公式的最佳方法是什么？最初，我只是将原始列投入可视化工具，并为我计算了指标。我很快就知道由于客户要求等原因，标准规则有很多（有效）例外。我现在正在考虑是否应该在ETL /数据库层期间创建分子和分母列，或者在我将数据发送到可视化工具。

我正在考虑使用Python evaluate来读取一个存储在Postgres表中的字符串：

https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.eval.html

原谅格式化，但我在下面有三列。一列与特定项目绑定，然后是两个示例指标。

id                          productive_time                        productive_status
165 "productive_time = talk_time + hold_time + after_call_work_time"    "productive_status = status_3_time + status_4_time + status_5_time"
1911    "productive_time = talk_time + hold_time + after_call_work_time + ring_time"    "productive_status = status_7_time + status_8_time"

然后，在可视化层中，与可能进行数十次计算相比，度量计算将只是SUM(productive_time) / SUM(call_count)。

这是否有意义，还有其他最佳做法吗？

我想，另一种方法是使用大量的CASE WHEN语句。但实际上有几百个 - 超过一千个ID要覆盖。其中95％将是相同的。

编辑：

for x in range(0, len(df['inbound_time_formula'].unique())):
    df.loc[df['inbound_time_formula'] == df['inbound_time_formula'].unique()[x], 'inbound_time'] = df.eval(df['inbound_time_formula'].unique()[x], inplace=True)

我尝试了df.eval数据框，但它似乎适用于整个数据框而不仅仅是公式所在的行。

Answer 1

规则：

t=# create table rl(id serial,tm text, sm text);
CREATE TABLE
t=# insert into rl(tm,sm) values('a+b-c','a*b +c');
INSERT 0 1

数据：

t=# create table dt(i serial,a int,b int, c int);
CREATE TABLE
t=# insert into dt(a,b,c) select 1,2,3;
INSERT 0 1

示例：

t=# create or replace function rlf(rid int,did int) returns table (rsm int,rtm int) as $$
begin
return query execute format('select '||(select sm from rl where id=rid)||', '||(select tm from rl where id=rid)||' from dt where i=%s',did);
end;
$$ language plpgsql
;
CREATE FUNCTION
t=# select * from rlf(1,1);
 rsm | rtm
-----+-----
   5 |   0
(1 row)

这种方法非常值得怀疑，因为你无法按照定义避免注入 - 你不会解析规则 - 按原样执行它......

将公式存储在表中以便稍后计算？

1 个答案: