基于公共ID的SQL中列值的组合

时间:2020-08-29 17:29:08

标签: sql-server string-agg

感谢您的帮助!我正在使用MS SQL Server 17并尝试按ID分组,并根据共享ID在第二列中查找常见的配对。其他大多数问题涉及在多列之间查找任何组合。

以下是一些示例数据:

/* Create sample data */ 
 DROP TABLE IF EXISTS example
 CREATE TABLE example (
    PersonID int,
    Place varchar(50)
 )

 INSERT INTO example (PersonID, Place)
 VALUES (1, 'home'), (2, 'work'), (3, 'gym'), (1, 'grocery'), (1, 'home'), (2, 'gym'), (3, 'work'), 
        (4, 'school'), (2, 'gym'), (3, 'gym'), (4, 'home'), (4, 'school'), (4, 'work'), (5, 'bar')

 SELECT * FROM example
 Order by PersonID asc

每当PersonID有多行时,我想以以下格式(对于Sankey图表)查看Place的常见配对。

from      | to       | count
____________________________
gym       | gym      | 2
gym       | work     | 2
school    | school   | 1
home      | home     | 1
school    | work     | 1
grocery   | home     | 1 

配对可以用于同一地点,例如PersonID == 1进入了'home'两次,但是我只需要成对配对两个就可以了。

到目前为止,我已经尝试过STRING_AGG函数,但是很难将其限制为仅双向配对。非常感谢您的帮助,如果这是以前已经解决的简单答案,我深表歉意。

尝试:

/* Next, let's try to make our Sankey data (from, to, count) */
DROP TABLE IF EXISTS temp_example
SELECT t.combination, COUNT(*) AS value
INTO temp_example
FROM (SELECT STRING_AGG(Place, ',') within group (order by Place) combination 
           FROM example
           GROUP BY PersonID
           HAVING COUNT(*) >= 2
     ) t
GROUP BY t.combination
ORDER BY value desc

1 个答案:

答案 0 :(得分:1)

首先,您需要另一列。可以用来识别此人去过该地点的顺序。 SQL表是无序的,因此插入数据的顺序还不够。例如,添加时间戳列或其他内容?

然后,使用LAG()找出每行先前访问的位置。之后是一个简单的GROUP BY。

WITH
  lagged AS
(
  SELECT
    *, 
    LAG(place) OVER (PARTITION BY PersonID ORDER BY aTimestampOrSomething) AS prevPlace
  FROM
    example
)
SELECT
  prevPlace,
  place,
  COUNT(*)
FROM
  lagged

(对错别字等致歉,我在手机上)