如何检索postgres中两个tsvector的差异?

时间:2014-04-23 09:27:25

标签: postgresql tsvector

我有两个varchars字段,我想获得一个单词的数组,这些单词存在于其中一个中,而不存在于另一个中,即:

old_text := to_tsvector("The quick brown fox jumps over the lazy dog")
new_text := to_tsvector("The slow brown fox jumps over the quick dog at Friday")
-> new words: ARRAY["slow", "at", "Friday"] ( the order of words doesn't matter )

我尝试摆弄ts_vectors,但没有运气.. postgres中的任何其他功能,支持这样的东西吗?

1 个答案:

答案 0 :(得分:1)

如果您真的想要涉及文本搜索,请查看ts_parse()

SELECT token
FROM ts_parse('default', 'The slow brown fox jumps over the quick dog at Friday')
WHERE tokid != 12 -- blank
EXCEPT
SELECT token
FROM ts_parse('default', 'The quick brown fox jumps over the lazy dog')
WHERE tokid != 12 -- blank

-- will give you

"token"
--------
'slow'
'at'
'Friday'

或者,您可以使用正则表达式:

SELECT *
FROM regexp_split_to_table('The slow brown fox jumps over the quick dog at Friday', '\s+')
EXCEPT
SELECT *
FROM regexp_split_to_table('The quick brown fox jumps over the lazy dog', '\s+')

最后,如果需要,使用array_agg()将结果累积到数组中。