在PostgreSQL的同一查询中如何使用两个SUM()聚合函数?

时间:2018-11-01 19:39:50

标签: sql postgresql group-by subquery aggregate-functions

我有一个PostgreSQL查询,它产生以下结果:

SELECT   o.order || '-' || osh.ordinal_number AS order, 
         o.company,
         o.order_total,
         SUM(osh.items) AS order_shipment_total,
         o.order_type
FROM     orders o
         JOIN order_shipments osh ON o.order_id = osh.order_id
WHERE    o.order = [some order number]
GROUP BY o.order,
         o.company,
         o.order_total,
         o.order_type;

order   | company | order_total | order_shipment_total | order_type
-------------------------------------------------------------------
123-1   | A corp. | null        |  125.00              | new
123-2   | B corp. | null        |  100.00              | new

我需要替换o.order_total(它不能正常工作),并对order_shipment_total列的总和求和,以便在上面的示例中,每行结束时显示225.00。我需要上面的结果如下所示:

order   | company | order_total | order_shipment_total | order_type
-------------------------------------------------------------------
123-1   | A corp. | 225.00      |  125.00              | new
123-2   | B corp. | 225.00      |  100.00              | new

我尝试过的事情

1。)要替换o.order_total,我尝试了SUM(SUM(osh.items)),但收到错误消息,提示您不能嵌套聚合函数。

2。)我试图将整个查询作为子查询并求和order_shipment_total列的总和,但是当我这样做时,它只是重复了该列本身。见下文:

SELECT   order,
         company,
         SUM(order_shipment_total) AS order_shipment_total,
         order_shipment_total,
         order_type
FROM     (
    SELECT   o.order || '-' || osh.ordinal_number AS order, 
             o.company,
             o.order_total,
             SUM(osh.items) AS order_shipment_total,
             o.order_type
    FROM     orders o
             JOIN order_shipments osh ON o.order_id = osh.order_id
    WHERE    o.order = [some order number]
    GROUP BY o.order,
             o.company,
             o.order_total,
             o.order_type
) subquery
GROUP BY order,
         company,
         order_shipment_total,
         order_type;

order   | company | order_total | order_shipment_total | order_type
-------------------------------------------------------------------
123-1   | A corp. | 125.00      |  125.00              | new
123-2   | B corp. | 100.00      |  100.00              | new

3。)我已经尝试在上面的子查询/查询示例中仅包含我实际上想要分组的行,因为我觉得我能够在Oracle SQL中做到这一点。但是当我这样做时,我收到一条错误消息,说“列[名称]必须出现在GROUP BY子句中或在聚合函数中使用。”

...
GROUP BY order,
         company,
         order_type;

ERROR:  column "[a column name]" must appear in the GROUP BY clause or be used in an aggregate function.

我该如何完成?我确定可以使用子查询作为答案,但是我对为什么这种方法行不通感到困惑。

2 个答案:

答案 0 :(得分:2)

您应该能够使用窗口功能:

  GetDateTimeFunction:
    Type: AWS::Lambda::Function
    Properties:
      Handler: index.lambda_handler
      Runtime: python2.7
      Timeout: '10'
      Role: !GetAtt GetDateTimeFunctionExecutionRole.Arn
      Code:
        ZipFile: |
                from __future__ import print_function
                from boto3.session import Session
                from zipfile import ZipFile
                import json
                import datetime
                import boto3
                import botocore
                import traceback
                import os
                import shutil


                code_pipeline = boto3.client('codepipeline')

                def evaluate(event):
                    # Extract attributes passed in by CodePipeline
                    job_id = event['CodePipeline.job']['id']
                    job_data = event['CodePipeline.job']['data']

                    config = job_data['actionConfiguration']['configuration']
                    credentials = job_data['artifactCredentials']

                    output_artifact = job_data['outputArtifacts'][0]
                    output_bucket = output_artifact['location']['s3Location']['bucketName']
                    output_key = output_artifact['location']['s3Location']['objectKey']

                    # Temporary credentials to access CodePipeline artifact in S3
                    key_id = credentials['accessKeyId']
                    key_secret = credentials['secretAccessKey']
                    session_token = credentials['sessionToken']

                    return (job_id, output_bucket, output_key, key_id, key_secret, session_token)

                def create_artifact(data):
                    artifact_dir = '/tmp/output_artifacts/'+str(uuid.uuid4())
                    artifact_file = artifact_dir+'/files/output.json'
                    zipped_artifact_file = artifact_dir+'/artifact.zip'
                    try:
                        shutil.rmtree(artifact_dir+'/files/')
                    except Exception:
                        pass
                    try:
                        os.remove(zipped_artifact_file)
                    except Exception:
                        pass
                    os.makedirs(artifact_dir+'/files/')
                    with open(artifact_file, 'w') as outfile:
                        json.dump(data, outfile)
                    with ZipFile(zipped_artifact_file, 'w') as zipped_artifact:
                        zipped_artifact.write(artifact_file, os.path.basename(artifact_file))

                    return zipped_artifact_file

                def init_s3client (key_id, key_secret, session_token):
                    session = Session(aws_access_key_id=key_id, aws_secret_access_key=key_secret, aws_session_token=session_token)
                    s3client = session.client('s3', config=botocore.client.Config(signature_version='s3v4'))
                    return s3client

                def lambda_handler(event, context):
                    try:
                        (job_id, output_bucket, output_key, key_id, key_secret, session_token)=evaluate(event)
                        (s3client)=init_s3client(key_id, key_secret, session_token)

                        now=datetime.datetime.now().strftime('%Y-%m-%d_%H:%M:%S')
                        data={"DateTime":now}

                        (zipped_artifact_file)=create_artifact(data)

                        s3client.upload_file(zipped_artifact_file, output_bucket, output_key, ExtraArgs={"ServerSideEncryption": "AES256"})

                        # Tell CodePipeline we succeeded
                        code_pipeline.put_job_success_result(jobId=job_id)

                    except Exception as e:
                        print("ERROR: " + repr(e))
                        message=repr(e)
                        traceback.print_exc()
                        # Tell CodePipeline we failed
                        code_pipeline.put_job_failure_result(jobId=job_id, failureDetails={'message': message, 'type': 'JobFailed'})

                    return "complete"


  Pipeline:
    Type: AWS::CodePipeline::Pipeline
    Properties:
      ArtifactStore:
        Location: !Ref ArtifactStoreBucket
        Type: S3
      DisableInboundStageTransitions: []
      Name: !Ref PipelineName
      RoleArn: !GetAtt PipelineRole.Arn
      Stages:
        - Name: S3Source
          Actions:
            - Name: TemplateSource
              RunOrder: 1
              ActionTypeId:
                Category: Source
                Owner: AWS
                Provider: S3
                Version: '1'
              Configuration:
                S3Bucket: !Ref ArtifactStoreBucket
                S3ObjectKey: !Ref SourceS3Key
              OutputArtifacts:
                - Name: TemplateSource
        - Name: Deploy
          Actions:
            - Name: GetDateTime
              RunOrder: 1
              ActionTypeId:
                Category: Invoke
                Owner: AWS
                Provider: Lambda
                Version: '1'
              Configuration:
                 FunctionName: !Ref GetDateTimeFunction
              OutputArtifacts:
                - Name: GetDateTimeOutput
            - Name: CreateStack
              RunOrder: 2
              ActionTypeId:
                Category: Deploy
                Owner: AWS
                Provider: CloudFormation
                Version: '1'
              InputArtifacts:
                - Name: TemplateSource
                - Name: GetDateTimeOutput
              Configuration:
                ActionMode: REPLACE_ON_FAILURE
                Capabilities: CAPABILITY_IAM
                RoleArn: !GetAtt CloudFormationRole.Arn
                StackName: !Ref CFNStackname
                TemplatePath: !Sub TemplateSource::${CFNScriptfile}
                TemplateConfiguration: !Sub TemplateSource::${CFNConfigfile}
                ParameterOverrides: |
                  {
                    "DateTimeInput" : { "Fn::GetParam" : ["GetDateTimeOutput", "output.json", "DateTime"]}
                  }

答案 1 :(得分:2)

您对查询/方法不太了解的事情是,您实际上想要在同一查询行结果中进行两种不同级别的分组。子查询方法是正确的一半,但是当您对一个子查询进行分组时,在另一个对它进行分组的查询中,您只能使用已从子查询中获取的数据,并且只能选择将其保持在聚合级别详细信息,或者您可以选择降低精度以支持更多分组。您不能保留细节并丢失细节以进行进一步总结。因此,从实际意义上讲,子查询相对而言是毫无意义的,因为您可能会在一击中将其分组到所需的级别:

SELECT groupkey1, sum(y) FROM
(SELECT groupkey1, groupkey2, sum(x) as y FROM table GROUP BY groupkey1, groupkey2)
GROUP BY groupkey1

与:

SELECT groupkey1, sum(x) FROM
table
GROUP BY groupky1

戈登的答案可能会解决(除了您的错误,因为分组集是错误的/未涵盖所有列),但对于您的理解可能没有太大帮助,因为这是代码-仅答案。以下是您如何解决此问题的详细信息,但需要使用更简单的数据并使用已知的窗口功能。

假设库存中有不同类型的苹果和瓜。您想要一个查询,给出总计每种特定种类的水果,与购买日期无关。您还需要为每种水果总体类型的总计列:

详细信息:

fruit | type             | purchasedate | count
apple | golden delicious | 2017-01-01   | 3
apple | golden delicious | 2017-01-02   | 4
apple | granny smith     | 2017-01-04   ! 2
melon | honeydew         | 2017-01-01   | 1
melon | cantaloupe       | 2017-01-05   | 4
melon | cantaloupe       | 2017-01-06   | 2

这就是7个金黄色的美味,2个史密斯奶奶,1个甘露,6个哈密瓜,还有9个苹果和7个甜瓜

您不能将其作为一个查询*,因为您想要两个不同级别的分组。您必须将其作为两个查询,然后(关键的理解点),您必须将精度不高的(苹果/甜瓜)结果返回到更精确的精度(老奶奶史密斯/金美味/ hon子/哈密瓜):

SELECT * FROM
(
  SELECT fruit, type, sum(count) as fruittypecount
  FROM fruit
  GROUP BY fruit, type
) fruittypesum
INNER JOIN
(
  SELECT fruit, sum(count) as fruitcount
  FROM fruit
  GROUP BY fruit
) fruitsum
ON
  fruittypesum.fruit = fruitsum.fruit

您会得到的:

fruit | type             | fruittypecount | fruit | fruitcount
apple | golden delicious | 7              | apple | 9
apple | granny smith     | 2              | apple | 9
melon | honeydew         | 1              | melon | 7
melon | cantaloupe       | 6              | melon | 7

因此,您的查询,不同的组,详细信息和摘要:

SELECT
    detail.order || '-' || detail.ordinal_number as order,
    detail.company,
    summary.order_total,
    detail.order_shipment_total,
    detail.order_type
FROM (
    SELECT   o.order,
             osh.ordinal_number, 
             o.company,
             SUM(osh.items) AS order_shipment_total,
             o.order_type
    FROM     orders o
             JOIN order_shipments osh ON o.order_id = osh.order_id
    WHERE    o.order = [some order number]
    GROUP BY o.order,
             o.company,
             o.order_type
) detail
INNER JOIN
(
    SELECT   o.order,
             SUM(osh.items) AS order_total
    FROM     orders o
             JOIN order_shipments osh ON o.order_id = osh.order_id
    --don't need the where clause; we'll join on order number
    GROUP BY o.order,
             o.company,
             o.order_type
) summary
ON
summary.order = detail.order

戈登的查询使用窗口函数达到相同的效果;窗口函数在完成分组后运行,并建立了另一个分组级别(PARTITION BY ordernumber),它等效于摘要中的GROUP BY ordernumber。窗口功能摘要数据通过订单号固有地连接到明细数据。隐含的查询说:

SELECT
  ordernumber,
  lineitemnumber,
  SUM(amount) linetotal
  sum(SUM(amount)) over(PARTITION BY ordernumber) ordertotal
GROUP BY
  ordernumber,
  lineitemnumber

..将有一个ordertotal,它是所有linetotal的顺序:GROUP BY将数据准备到行级详细信息,而window函数将数据准备到仅行级细节订单级别,并重复总数以填充每个订单项所需的次数。我用大写写了属于GROUP BY操作的SUM。小写的sum属于分区操作。它必须sum(SUM()),而不能简单地说sum(amount),因为数量列本身是不允许的-它不在分组依据中。因为数量本身是不允许的,并且必须汇总才能使分组依据工作,所以我们必须sum(SUM())才能运行分区(在分组依据完成后运行)

它的行为与分组为两个不同级别并连接在一起的行为完全相同,实际上,我选择了这种方式进行解释,因为它可以更清楚地说明其与您已经了解的有关组和联接的关系。

记住:JOINS使数据集横向增长,UNIONS使数据集向下增长。当您有一些详细数据并且想要横向增加一些其他数据(摘要)时,请加入。 (如果您希望总数在每一列的底部显示,则将其合并)


*您可以将其作为一个查询(不带窗口功能)进行操作,但由于需要进行各种技巧而最终变得不值得,因为它很难维护,因此可能会造成混乱,