给定一系列数字如何识别丢失的数字

时间:2016-09-11 03:51:45

标签: sql postgresql google-bigquery gaps-and-islands gaps-in-data

我想在一系列数字中得到所有缺失的数字 只是想知道是否有比下面更好的方法?

SELECT x
  FROM 
  (
    SELECT x,
           LAG(x,1) OVER ( ORDER BY x ) prev_x
      FROM 
       ( SELECT * FROM
        ( SELECT 1 AS x ),
        ( SELECT 2 AS x ),
        ( SELECT 3 AS x ),
        ( SELECT 4 AS x ),
        ( SELECT 5 AS x ),
        ( SELECT 6 AS x ),
        ( SELECT 8 AS x ),
        ( SELECT 10 AS x ),
        ( SELECT 11 AS x )
       )
  ) 
 WHERE x-prev_x > 1;

3 个答案:

答案 0 :(得分:1)

让我对你说实话! 任何其他工作解决方案都会更好,然后提出问题 - 原因很简单 - 这是错误的!它根本不会丢失丢失的数字!它显示了下一个差距后的数字。这就是全部(希望你会欣赏我睁开眼睛)

现在,关于更好的解决方案 - 你有很多选择 注意:仅适用于BigQuery的选项!

  

选项1

BigQuery标准SQL - 请参阅How to Enable Standard SQL

WITH YourTable AS (
  SELECT 1 AS x UNION ALL
  SELECT 2 AS x UNION ALL
  SELECT 3 AS x UNION ALL
  SELECT 6 AS x UNION ALL
  SELECT 8 AS x UNION ALL
  SELECT 10 AS x UNION ALL
  SELECT 11 AS x
),
nums AS (
  SELECT num 
  FROM UNNEST(GENERATE_ARRAY((SELECT MIN(x) FROM YourTable), (SELECT MAX(x) FROM YourTable))) AS num
)
SELECT num FROM nums
LEFT JOIN YourTable ON num = x
WHERE x IS NULL
ORDER BY num
  

选项2

你可以尝试下面的

BigQuery Legacy SQL (这里你需要在nums表的select表达式中设置start / min和end / max值

SELECT num FROM (
  SELECT num FROM (
    SELECT ROW_NUMBER() OVER() AS num, * 
    FROM (FLATTEN((SELECT SPLIT(RPAD('', 11, '.'),'') AS h FROM (SELECT NULL)), h))
  ) WHERE num BETWEEN 1 AND 11
) AS nums
LEFT JOIN (
  SELECT x FROM
    (SELECT 1 AS x),
    (SELECT 2 AS x),
    (SELECT 3 AS x),
    (SELECT 6 AS x),
    (SELECT 8 AS x),
    (SELECT 10 AS x),
    (SELECT 11 AS x)
) AS YourTable
ON num = x
WHERE x IS NULL
  

选项3

BigQuery旧版SQL - 如果您不想依赖于最小值和最大值并且需要设置这些值 - 您可以使用以下解决方案 - 它只需要设置足够高的最大值以适应您的预期增长(例如我把1000增加)

SELECT num FROM (
  SELECT num FROM (
    SELECT ROW_NUMBER() OVER() AS num, * 
    FROM (FLATTEN((SELECT SPLIT(RPAD('', 1000, '.'),'') AS h FROM (SELECT NULL)), h))
  ) WHERE num BETWEEN 1 AND 1000
) AS nums
LEFT JOIN YourTable
ON num = x
WHERE x IS NULL
AND num BETWEEN (SELECT MIN(x) FROM YourTable) AND (SELECT MAX(x) FROM YourTable) 
  

选项4(出于某种原因 - 到目前为止我最喜欢的)

BigQuery标准SQL - 没有显式连接

WITH YourTable AS (
  SELECT 1 AS x UNION ALL
  SELECT 2 AS x UNION ALL
  SELECT 3 AS x UNION ALL
  SELECT 6 AS x UNION ALL
  SELECT 8 AS x UNION ALL
  SELECT 10 AS x UNION ALL
  SELECT 11 AS x
)
SELECT num
FROM (SELECT x, LEAD(x) OVER(ORDER BY x) AS next_x FROM YourTable),  
     UNNEST(GENERATE_ARRAY(x + 1,next_x - 1)) AS num
WHERE next_x - x > 1
ORDER BY x

答案 1 :(得分:1)

Postgres 中的 最短 解决方案使用标准SQL EXCEPT

 λ> :m + Control.Applicative
 λ> let combine = liftA2 (,)
 λ> combine "ab" "cd"
 [('a','c'),('a','d'),('b','c'),('b','d')]

set-returns函数unnest()是Postgres特有的,只是用于提供数组的最短语法。

也适用于数据中的重复项或NULL值。

<?xml version="1.0" encoding="UTF-8"?> <beans xmlns="http://www.springframework.org/schema/beans" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:context="http://www.springframework.org/schema/context" xmlns:tx="http://www.springframework.org/schema/tx" xsi:schemaLocation=" http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-3.0.xsd http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context-3.0.xsd http://www.springframework.org/schema/tx http://www.springframework.org/schema/tx/spring-tx-3.0.xsd"> <context:property-placeholder location="classpath:resources/database.properties" /> <context:component-scan base-package="com.onlineshopping" /> <tx:annotation-driven transaction-manager="hibernateTransactionManager"/> <bean id="jspViewResolver" class="org.springframework.web.servlet.view.InternalResourceViewResolver"> <property name="viewClass" value="org.springframework.web.servlet.view.JstlView" /> <property name="prefix" value="/WEB-INF/views/" /> <property name="suffix" value=".jsp" /> </bean> <bean id="dataSource" class="org.springframework.jdbc.datasource.DriverManagerDataSource"> <property name="driverClassName" value="${database.driver}" /> <property name="url" value="${database.url}" /> <property name="username" value="${database.user}" /> <property name="password" value="${database.password}" /> </bean> <bean id="sessionFactory" class="org.springframework.orm.hibernate3.annotation.AnnotationSessionFactoryBean"> <property name="dataSource" ref="dataSource" /> <property name="annotatedClasses"> <list> <value>com.onlineshopping.model.Customer</value> <value>com.onlineshopping.model.Product</value> </list> </property> <property name="hibernateProperties"> <props> <prop key="hibernate.dialect">${hibernate.dialect}</prop> <prop key="hibernate.show_sql">${hibernate.show_sql}</prop> <prop key="hibernate.hbm2ddl.auto">${hibernate.hbm2ddl.auto}</prop> </props> </property> </bean> <bean id="hibernateTransactionManager" class="org.springframework.orm.hibernate3.HibernateTransactionManager"> <property name="sessionFactory" ref="sessionFactory" /> </bean> </beans> 是(标准SQL!)WITH tbl(x) AS (SELECT unnest ('{1,2,3,4,5,6,8,10,11}'::int[])) -- the CTE provides a temp table - might be an actual table instead SELECT generate_series(min(x), max(x)) FROM tbl EXCEPT ALL TABLE tbl; 的简短语法:

相关(有更多解释):

答案 2 :(得分:0)

您的查询可以更简洁地编写:

SELECT x
FROM (
    SELECT x,
           lag(x, 1) OVER ( ORDER BY x ) prev_x
    FROM ( VALUES (1), (2), (3), (4), (5), (6), (8), (10), (11) ) v(x)
) sub
WHERE x-prev_x > 1;

这将返回未命中(8, 10)后的下一个最高值,而不是缺失值本身(7, 9)。但是你当然没有方便的价值。

如果你知道序列中的值范围,那么你可以使用它:

SELECT s.x
FROM generate_series(<<min>>, <<max>>) s(x)
LEFT JOIN my_table t ON s.x = t.x
WHERE t.x IS NULL;

这将返回实际的缺失值。

如果您不知道值的范围,则需要添加子查询:

SELECT s.x
FROM ( SELECT min(x), max(x) FROM my_table ) r
JOIN generate_series(r.min, r.max) s(x) ON true
LEFT JOIN my_table t ON s.x = t.x
WHERE t.x IS NULL;

或者,而不是LEFT JOIN

SELECT x
FROM ( SELECT min(x), max(x) FROM my_table ) r,
     generate_series(r.min, r.max) s(x)
WHERE NOT EXISTS (SELECT 1 FROM my_table t WHERE t.x = s.x);