需要有关数据提取的SQL查询的帮助

时间:2014-10-21 16:00:03

标签: sql unique

我编写的SQL查询检索了3677行,其中cutomerID字段包含许多重复数据。我想写一个查询,它只会给我以下所有必需的字段cutomerID。我们不能仅对customerID使用distinct,因为其他字段具有不同的数据类型。请帮我解决这个问题:

  SELECT TimeMark,
        CustomerID,
        AccountId,
        TargetURL
    FROM BTILog
    WHERE timemark BETWEEN '20140926 00:00:00'
            AND '20141020 23:59:59'
        AND TargetURL LIKE '%/api/v1/cust/details%'
        AND Class LIKE 'com.btfin.security.sso.SSODetailsFactory%'
    ORDER BY TimeMark DESC

Please have a look the data I am getting from the below query. There you can see duplicate customerID as highlighted in blue line. I want the 57155299 customer only once with last time mark, similarly if there will be any other customer who have appeared twice/thrice or so on, theey should only once in my data extract

13 个答案:

答案 0 :(得分:1)

您应该在TimeMark上使用分组依据和聚合功能。 以下内容将为您提供唯一的客户ID记录,其中包含每个记录的最后一个输入时间标记:

SELECT max(TimeMark) TimeMark,
        CustomerID,
        AccountId,
        TargetURL
    FROM BTILog
    WHERE timemark BETWEEN '20140926 00:00:00'
            AND '20141020 23:59:59'
        AND TargetURL LIKE '%/api/v1/cust/details%'
        AND Class LIKE 'com.btfin.security.sso.SSODetailsFactory%'
    GROUP BY CustomerID, AccountId,TargetURL
    ORDER BY TimeMark DESC

答案 1 :(得分:0)

我相信对您的结果进行排名会为您提供所需的结果。与此类似。

select TimeMark,
        CustomerID,
        AccountId,
        TargetURL
    from (
        select TimeMark,
            CustomerID,
            AccountId,
            TargetURL,
            rank() over (
                CustomerID order by TimeMark desc
                ) rank_
        from BTILog
        where timemark between '20140926 00:00:00'
                and '20141020 23:59:59'
            and TargetURL like '%/api/v1/cust/details%'
            and Class like 'com.btfin.security.sso.SSODetailsFactory%'
        )
    where rank_ = 1;

如果这是您想要的结果,您还可以在分区条款中包含AccountId。

rank() over (
                    CustomerID, AccountId order by TimeMark desc
                    ) rank_

此外,如果存在具有相同CustomerID,AccountId和TimeMark的行,则可以使用row_number而不是rank。哪个会随意为其中一个绑定的行分配更大的等级。

答案 2 :(得分:0)

试试这个

Select distinct CustomerId, AccountId, TargetURL,
                Max(TimeMark) over (partition by CustomerId) as 'MaxTM'
  FROM BTILog
    WHERE timemark BETWEEN '20140926 00:00:00'
            AND '20141020 23:59:59'
        AND TargetURL LIKE '%/api/v1/cust/details%'
        AND Class LIKE 'com.btfin.security.sso.SSODetailsFactory%'
   group by CustomerId, AcconutId, TargetUrl --I don't think this is needed
    ORDER BY MaxTM DESC

答案 3 :(得分:0)

所选的每个字段必须是分组或聚合的一部分(与MAX,SUM,COUNT等一样)在您的情况下,您必须对CustomerID,AccountId和可能的TargetURL进行分组(如果它们都是相同,分组依据;如果没有,也许MAX(TargetURL))并想出你想用TimeMark做什么 - MAX(TimeMark)也许?

您可以尝试以下其中一项,具体取决于TargetURL的性质:

SELECT MAX(TimeMark), CustomerID, AccountId, TargetURL
FROM BTILog
WHERE timemark BETWEEN '20140926 00:00:00'
    AND '20141020 23:59:59'
    AND TargetURL LIKE '%/api/v1/cust/details%'
    AND Class LIKE 'com.btfin.security.sso.SSODetailsFactory%'
CustomerID, AccountId, TargetURL
ORDER BY TimeMark DESC

SELECT MAX(TimeMark), CustomerID, AccountId, MAX(TargetURL)
FROM BTILog
WHERE timemark BETWEEN '20140926 00:00:00'
    AND '20141020 23:59:59'
    AND TargetURL LIKE '%/api/v1/cust/details%'
    AND Class LIKE 'com.btfin.security.sso.SSODetailsFactory%'
GROUP BY CustomerID, AccountId
ORDER BY TimeMark DESC

答案 4 :(得分:0)

您需要使用子查询,根据条件返回带有最大TimeMark的记录,然后加入CustomerID和TimeMark,这会将结果限制为您想要的记录

SELECT BTILog.TimeMark, BTILog.CustomerID, BTILog.AccountID, BTILog.TargetUrl
FROM BTILog
INNER JOIN (
  SELECT Max(BTILog.TimeMark) AS MaxOfTimeMark, BTILog.CustomerID
  FROM BTILog
  WHERE (((BTILog.TargetUrl) Like '%/api/v1/cust/details%')
    AND ((BTILog.Class) Like  'com.btfin.security.sso.SSODetailsFactory%')
    AND ((BTILog.TimeMark) BETWEEN '20140926 00:00:00' AND '20141020 23:59:59'))
  GROUP BY BTILog.CustomerID) AS T1
ON (BTILog.CustomerID = T1.CustomerID)
  AND (BTILog.TimeMark = T1.MaxOfTimeMark)
ORDER by BTILog.TimeMark DESC

我通常在MS Access查询编辑器中工作,所以语法不同,我想我改变了任何可能导致它“摔倒”的东西

答案 5 :(得分:0)

根据您的示例数据并考虑您的查询是否正确。

Select * from 
(SELECT TimeMark,
        CustomerID,
        AccountId,
        TargetURL,
        **ROW_NUMBER()over(partition by CustomerID order by TimeMark desc)rownum**
    FROM BTILog
    WHERE timemark BETWEEN '20140926 00:00:00'
            AND '20141020 23:59:59'
        AND TargetURL LIKE '%/api/v1/cust/details%'
        AND Class LIKE 'com.btfin.security.sso.SSODetailsFactory%'
    ORDER BY TimeMark DESC
    ) tbl where rownum=1

答案 6 :(得分:0)

我不认为rank函数或rownum技巧会有所帮助,因为您并未处理相同唯一键的排序列表,而是针对每个{ {1}}您正在寻找具有最长时间标记的指定时间段内的记录

你可以试试这个:

CustomerID, A.AccountId

答案 7 :(得分:0)

在sql server我使用的日期> =转换(nchar(10),first_date_value,103)............

答案 8 :(得分:0)

subselect如何计算当前customerid的行数

  SELECT TimeMark,
        CustomerID,
        AccountId,
        TargetURL
    FROM BTILog ALIAS outer
    WHERE timemark BETWEEN '20140926 00:00:00'
            AND '20141020 23:59:59'
        AND TargetURL LIKE '%/api/v1/cust/details%'
        AND Class LIKE 'com.btfin.security.sso.SSODetailsFactory%'
        AND (
                SELECT count(*) 
                FROM BTILog
                WHERE timemark BETWEEN '20140926 00:00:00'
                AND '20141020 23:59:59'
                AND TargetURL LIKE '%/api/v1/cust/details%'
                AND Class LIKE 'com.btfin.security.sso.SSODetailsFactory%'
                AND outer.CustomerID = CustomerID
        ) == 1

    ORDER BY TimeMark DESC

答案 9 :(得分:-1)

按CustomerID使用group,并且还有一个重复的列,以便您可以跟踪有多少重复项。您不需要TimeMark,因为它们将重复Times

SELECT TimeMark,
        CustomerID,
        COUNT(CustomerID) AS duplicate,
        AccountId,
        TargetURL
    FROM BTILog
    WHERE timemark BETWEEN '20140926 00:00:00'
            AND '20141020 23:59:59'
        AND TargetURL LIKE '%/api/v1/cust/details%'
        AND Class LIKE 'com.btfin.security.sso.SSODetailsFactory%'
   GROUP BY(CustomerID)
    ORDER BY TimeMark DESC

答案 10 :(得分:-1)

您可以编写子查询以仅从CustomerID中选择DISTINCT。

 SELECT TimeMark, 
        (SELECT DISTINCT CustomerID 
        FROM BTILog
        WHERE timemark BETWEEN '20140926 00:00:00'
            AND '20141020 23:59:59'
        AND TargetURL LIKE '%/api/v1/cust/details%'
        AND Class LIKE 'com.btfin.security.sso.SSODetailsFactory%') 
        AS CustyID, 
        AccountId,
        TargetURL
    FROM BTILog
    WHERE timemark BETWEEN '20140926 00:00:00'
            AND '20141020 23:59:59'
        AND TargetURL LIKE '%/api/v1/cust/details%'
        AND Class LIKE 'com.btfin.security.sso.SSODetailsFactory%'
    ORDER BY TimeMark DESC

答案 11 :(得分:-1)

您很可能需要重新设计数据结构,将数据拆分为两个表,一个定义Customer,另一个作为时间戳。无论是否使用DISTINCT,每个客户记录都是唯一的。时间戳表将通过外键,类似的CustomerID字段和时间戳链接到客户表,以及您需要在每个定期事件中记录的任何其他字段。示意性地:

客户: CustomerID,Integer,Unique。
AccountID,整数。
TargetURL,Text 250.

时间戳: TimeID,Integer,Unique。
TimeMark,DateTime。
CustomerID,Integer

来自客户的CustomerID和来自TimeStamp的TimeID应该是AutoIncrement'Crash Number'字段。每个表中的CustomerID字段充当链接。有了这个,合理化的架构,您的查询将很容易开始到位,您将能够提取所需的数据,而无需像DISTINCT和UNIQUE ROWS那样的复杂性。

答案 12 :(得分:-1)

您应按CustomerID分组并获取记录HAVING COUNT(CustomerID) = 1;这应该为您提供CustomerID唯一的所有记录

SELECT TimeMark,
    CustomerID,
    AccountId,
    TargetURL
FROM BTILog
WHERE timemark BETWEEN '20140926 00:00:00'
        AND '20141020 23:59:59'
    AND TargetURL LIKE '%/api/v1/cust/details%'
    AND Class LIKE 'com.btfin.security.sso.SSODetailsFactory%'
ORDER BY TimeMark DESC
GROUP BY CustomerID
HAVING COUNT(CustomerID) = 1;