匹配两个JSONB列的百分比,

时间:2020-06-19 11:55:10

标签: sql json postgresql select sql-function

我正在尝试比较表中的两个JSONB列,目前已在应用程序中完成,但是这不允许正确的搜索,过滤和排序而不加载整个数据集。如果我们可以在数据库中进行比较,那就更好了。

以下是数据和计算的示例。

employer = {
  "autism": "1",
  "social": "1",
  "dementia": "0",
  "domestic": "1",
}

employers_keys = ["autism","social","domestic"]

candidate = {
  "autism": "0",
  "social": "1",
  "dementia": "0",
  "domestic": "1",
}

candidate_keys = ["social","domestic"]

remainder_keys = employer_key - candidate_key = ["autism"]

1-(remainder_keys.length/employer_keys.length) = 1-(1/3) = 2/3 = 66%


在Ruby中,此过程非常简单,jsonb->数组->选择->计算

但是,我想在SQL或数据库级别的函数中执行此操作,例如

函数compare_json(雇主,应聘者)返回小数。

更具体地说

 Select candidates.id,
       st_distance_sphere(st_makepoint(employer.long, employer.lat), st_makepoint(candidates.long, candidates.lat)) /
       1000 / 8 * 5 as distance
from (select * from users where id = 8117) employer,
     (select * from users where role_id = 5) candidates
where st_distance_sphere(st_makepoint(employer.long, employer.lat), st_makepoint(candidates.long, candidates.lat)) /
      1000 / 8 * 5 < 25
order by distance

上面的SQL计算单个雇主和多个应聘者之间的距离,内联查询loyer.skills(1行),候选人.skills(n行)。

所以输出应该是。

候选人ID,距离,技能匹配(employer.skills,候选人.skills)

与编辑之前一样,欢迎任何指导。

2 个答案:

答案 0 :(得分:0)

这是一种纯SQL方法:通过将用人对象转换为记录集,然后执行条件聚合来工作:

select 1 - avg( ((d.candidate ->> e.k)::int is distinct from 1)::int ) res
from (values(
    '{ "autism": "1", "social": "1", "dementia": "0", "domestic": "1" }'::jsonb,
    '{ "autism": "0", "social": "1", "dementia": "0", "domestic": "1" }'::jsonb
)) d(employer, candidate)
cross join lateral jsonb_each_text(d.employer) e(k, v)
where e.v::int = 1

通过用参数替换values()行构造函数中的文字对象,可以轻松地使用函数。

Demo on DB Fiddle

|                    res |
| ---------------------: |
| 0.66666666666666666667 |

答案 1 :(得分:0)

好的,这就是我要做的。

CREATE OR REPLACE FUNCTION JSON_COMPARE(employer_json jsonb, candidate_json jsonb, OUT _result numeric)
AS
$$
BEGIN
    select 1 - avg(((d.candidate ->> e.k)::int is distinct from 1)::int)
    into _result
    from (values (employer_json, candidate_json)) d(employer, candidate)
             cross join lateral jsonb_each_text(d.employer) e(k, v)
    where e.v::int = 1;
    RETURN;
END;
$$
    LANGUAGE PLPGSQL;

关于GMB超级快速答案,这是一个很小的变化。通过一些索引并正确限制候选列表的大小,我们可以获得合理的性能。

我是Stack的新手,所以我对GMB的投票没有显示,但再次感谢。