如何在PostgreSQL中创建单词/字符串的所有可能字谜的列表

时间:2019-02-03 16:11:57

标签: sql postgresql anagram

如何在 PostgreSQL 中创建单词/字符串的所有可能字谜的列表。

例如,如果String为' act ' 那么所需的输出应为:

行为, atc, cta, 猫, 交谘会 tca

我有一个表'tbl_words',其中包含一百万个单词。

然后,我只想从此字谜列表中检查/搜索数据库表中的有效单词。

像上面的字谜列表一样,有效词是:行为,猫

有什么办法吗?

更新1:

我需要这样的输出: (给定单词的所有排列)

enter image description here

任何想法??

2 个答案:

答案 0 :(得分:2)

查询生成3个元素集的所有排列:

with recursive numbers as (
    select generate_series(1, 3) as i
),
rec as (
    select i, array[i] as p
    from numbers
union all
    select n.i, p || n.i
    from numbers n
    join rec on cardinality(p) < 3 and not n.i = any(p)
)
select p as permutation
from rec
where cardinality(p) = 3
order by 1

 permutation 
-------------
 {1,2,3}
 {1,3,2}
 {2,1,3}
 {2,3,1}
 {3,1,2}
 {3,2,1}
(6 rows)

修改最终查询以生成给定单词的字母的排列:

with recursive numbers as (
    select generate_series(1, 3) as i
),
rec as (
    select i, array[i] as p
    from numbers
union all
    select n.i, p || n.i
    from numbers n
    join rec on cardinality(p) < 3 and not n.i = any(p)
)
select a[p[1]] || a[p[2]] || a[p[3]] as result
from rec
cross join regexp_split_to_array('act', '') as a
where cardinality(p) = 3
order by 1

 result 
--------
 act
 atc
 cat
 cta
 tac
 tca
(6 rows)    

答案 1 :(得分:1)

这是一个解决方案:

with recursive params as (
      select *
      from (values ('cata')) v(str)
     ),
     nums as (
      select str, 1 as n
      from params
      union all
      select str, 1 + n
      from nums
      where n < length(str)
     ),
     pos as (
      select str, array[n] as poses, array_remove(array_agg(n) over (partition by str), n) as rests, 1 as lev
      from nums
      union all
      select pos.str, array_append(pos.poses, nums.n), array_remove(rests, nums.n), lev + 1
      from pos join
           nums
           on pos.str = nums.str and array_position(pos.rests, nums.n) > 0
      where cardinality(rests) > 0
     )
select distinct pos.str , string_agg(substr(pos.str, thepos, 1), '')
from pos cross join lateral
     unnest(pos.poses) thepos
where cardinality(rests) = 0 
group by pos.str, pos.poses;

这非常棘手,特别是当字符串中有重复的字母时。此处采用的方法会生成从1到n的数字的所有排列,其中n是字符串的长度。然后将它们用作索引,以从原始字符串中提取字符。

那些热衷于此的人会注意到,它与select distinct一起使用group by。这似乎是避免在结果字符串中重复的最简单方法。