Question

我有一个PostgreSQL查询，它产生以下结果：

SELECT   o.order || '-' || osh.ordinal_number AS order, 
         o.company,
         o.order_total,
         SUM(osh.items) AS order_shipment_total,
         o.order_type
FROM     orders o
         JOIN order_shipments osh ON o.order_id = osh.order_id
WHERE    o.order = [some order number]
GROUP BY o.order,
         o.company,
         o.order_total,
         o.order_type;

order   | company | order_total | order_shipment_total | order_type
-------------------------------------------------------------------
123-1   | A corp. | null        |  125.00              | new
123-2   | B corp. | null        |  100.00              | new

我需要替换o.order_total（它不能正常工作），并对order_shipment_total列的总和求和，以便在上面的示例中，每行结束时显示225.00。我需要上面的结果如下所示：

order   | company | order_total | order_shipment_total | order_type
-------------------------------------------------------------------
123-1   | A corp. | 225.00      |  125.00              | new
123-2   | B corp. | 225.00      |  100.00              | new

我尝试过的事情

1。）要替换o.order_total，我尝试了SUM(SUM(osh.items))，但收到错误消息，提示您不能嵌套聚合函数。

2。）我试图将整个查询作为子查询并求和order_shipment_total列的总和，但是当我这样做时，它只是重复了该列本身。见下文：

SELECT   order,
         company,
         SUM(order_shipment_total) AS order_shipment_total,
         order_shipment_total,
         order_type
FROM     (
    SELECT   o.order || '-' || osh.ordinal_number AS order, 
             o.company,
             o.order_total,
             SUM(osh.items) AS order_shipment_total,
             o.order_type
    FROM     orders o
             JOIN order_shipments osh ON o.order_id = osh.order_id
    WHERE    o.order = [some order number]
    GROUP BY o.order,
             o.company,
             o.order_total,
             o.order_type
) subquery
GROUP BY order,
         company,
         order_shipment_total,
         order_type;

order   | company | order_total | order_shipment_total | order_type
-------------------------------------------------------------------
123-1   | A corp. | 125.00      |  125.00              | new
123-2   | B corp. | 100.00      |  100.00              | new

3。）我已经尝试在上面的子查询/查询示例中仅包含我实际上想要分组的行，因为我觉得我能够在Oracle SQL中做到这一点。但是当我这样做时，我收到一条错误消息，说“列[名称]必须出现在GROUP BY子句中或在聚合函数中使用。”

...
GROUP BY order,
         company,
         order_type;

ERROR:  column "[a column name]" must appear in the GROUP BY clause or be used in an aggregate function.

我该如何完成？我确定可以使用子查询作为答案，但是我对为什么这种方法行不通感到困惑。

Answer 1

您应该能够使用窗口功能：

  GetDateTimeFunction:
    Type: AWS::Lambda::Function
    Properties:
      Handler: index.lambda_handler
      Runtime: python2.7
      Timeout: '10'
      Role: !GetAtt GetDateTimeFunctionExecutionRole.Arn
      Code:
        ZipFile: |
                from __future__ import print_function
                from boto3.session import Session
                from zipfile import ZipFile
                import json
                import datetime
                import boto3
                import botocore
                import traceback
                import os
                import shutil


                code_pipeline = boto3.client('codepipeline')

                def evaluate(event):
                    # Extract attributes passed in by CodePipeline
                    job_id = event['CodePipeline.job']['id']
                    job_data = event['CodePipeline.job']['data']

                    config = job_data['actionConfiguration']['configuration']
                    credentials = job_data['artifactCredentials']

                    output_artifact = job_data['outputArtifacts'][0]
                    output_bucket = output_artifact['location']['s3Location']['bucketName']
                    output_key = output_artifact['location']['s3Location']['objectKey']

                    # Temporary credentials to access CodePipeline artifact in S3
                    key_id = credentials['accessKeyId']
                    key_secret = credentials['secretAccessKey']
                    session_token = credentials['sessionToken']

                    return (job_id, output_bucket, output_key, key_id, key_secret, session_token)

                def create_artifact(data):
                    artifact_dir = '/tmp/output_artifacts/'+str(uuid.uuid4())
                    artifact_file = artifact_dir+'/files/output.json'
                    zipped_artifact_file = artifact_dir+'/artifact.zip'
                    try:
                        shutil.rmtree(artifact_dir+'/files/')
                    except Exception:
                        pass
                    try:
                        os.remove(zipped_artifact_file)
                    except Exception:
                        pass
                    os.makedirs(artifact_dir+'/files/')
                    with open(artifact_file, 'w') as outfile:
                        json.dump(data, outfile)
                    with ZipFile(zipped_artifact_file, 'w') as zipped_artifact:
                        zipped_artifact.write(artifact_file, os.path.basename(artifact_file))

                    return zipped_artifact_file

                def init_s3client (key_id, key_secret, session_token):
                    session = Session(aws_access_key_id=key_id, aws_secret_access_key=key_secret, aws_session_token=session_token)
                    s3client = session.client('s3', config=botocore.client.Config(signature_version='s3v4'))
                    return s3client

                def lambda_handler(event, context):
                    try:
                        (job_id, output_bucket, output_key, key_id, key_secret, session_token)=evaluate(event)
                        (s3client)=init_s3client(key_id, key_secret, session_token)

                        now=datetime.datetime.now().strftime('%Y-%m-%d_%H:%M:%S')
                        data={"DateTime":now}

                        (zipped_artifact_file)=create_artifact(data)

                        s3client.upload_file(zipped_artifact_file, output_bucket, output_key, ExtraArgs={"ServerSideEncryption": "AES256"})

                        # Tell CodePipeline we succeeded
                        code_pipeline.put_job_success_result(jobId=job_id)

                    except Exception as e:
                        print("ERROR: " + repr(e))
                        message=repr(e)
                        traceback.print_exc()
                        # Tell CodePipeline we failed
                        code_pipeline.put_job_failure_result(jobId=job_id, failureDetails={'message': message, 'type': 'JobFailed'})

                    return "complete"


  Pipeline:
    Type: AWS::CodePipeline::Pipeline
    Properties:
      ArtifactStore:
        Location: !Ref ArtifactStoreBucket
        Type: S3
      DisableInboundStageTransitions: []
      Name: !Ref PipelineName
      RoleArn: !GetAtt PipelineRole.Arn
      Stages:
        - Name: S3Source
          Actions:
            - Name: TemplateSource
              RunOrder: 1
              ActionTypeId:
                Category: Source
                Owner: AWS
                Provider: S3
                Version: '1'
              Configuration:
                S3Bucket: !Ref ArtifactStoreBucket
                S3ObjectKey: !Ref SourceS3Key
              OutputArtifacts:
                - Name: TemplateSource
        - Name: Deploy
          Actions:
            - Name: GetDateTime
              RunOrder: 1
              ActionTypeId:
                Category: Invoke
                Owner: AWS
                Provider: Lambda
                Version: '1'
              Configuration:
                 FunctionName: !Ref GetDateTimeFunction
              OutputArtifacts:
                - Name: GetDateTimeOutput
            - Name: CreateStack
              RunOrder: 2
              ActionTypeId:
                Category: Deploy
                Owner: AWS
                Provider: CloudFormation
                Version: '1'
              InputArtifacts:
                - Name: TemplateSource
                - Name: GetDateTimeOutput
              Configuration:
                ActionMode: REPLACE_ON_FAILURE
                Capabilities: CAPABILITY_IAM
                RoleArn: !GetAtt CloudFormationRole.Arn
                StackName: !Ref CFNStackname
                TemplatePath: !Sub TemplateSource::${CFNScriptfile}
                TemplateConfiguration: !Sub TemplateSource::${CFNConfigfile}
                ParameterOverrides: |
                  {
                    "DateTimeInput" : { "Fn::GetParam" : ["GetDateTimeOutput", "output.json", "DateTime"]}
                  }

Answer 2

您对查询/方法不太了解的事情是，您实际上想要在同一查询行结果中进行两种不同级别的分组。子查询方法是正确的一半，但是当您对一个子查询进行分组时，在另一个对它进行分组的查询中，您只能使用已从子查询中获取的数据，并且只能选择将其保持在聚合级别详细信息，或者您可以选择降低精度以支持更多分组。您不能保留细节并丢失细节以进行进一步总结。因此，从实际意义上讲，子查询相对而言是毫无意义的，因为您可能会在一击中将其分组到所需的级别：

SELECT groupkey1, sum(y) FROM
(SELECT groupkey1, groupkey2, sum(x) as y FROM table GROUP BY groupkey1, groupkey2)
GROUP BY groupkey1

与：

SELECT groupkey1, sum(x) FROM
table
GROUP BY groupky1

戈登的答案可能会解决（除了您的错误，因为分组集是错误的/未涵盖所有列），但对于您的理解可能没有太大帮助，因为这是代码-仅答案。以下是您如何解决此问题的详细信息，但需要使用更简单的数据并使用已知的窗口功能。

假设库存中有不同类型的苹果和瓜。您想要一个查询，给出总计每种特定种类的水果，与购买日期无关。您还需要为每种水果总体类型的总计列：

详细信息：

fruit | type             | purchasedate | count
apple | golden delicious | 2017-01-01   | 3
apple | golden delicious | 2017-01-02   | 4
apple | granny smith     | 2017-01-04   ! 2
melon | honeydew         | 2017-01-01   | 1
melon | cantaloupe       | 2017-01-05   | 4
melon | cantaloupe       | 2017-01-06   | 2

这就是7个金黄色的美味，2个史密斯奶奶，1个甘露，6个哈密瓜，还有9个苹果和7个甜瓜

您不能将其作为一个查询*，因为您想要两个不同级别的分组。您必须将其作为两个查询，然后（关键的理解点），您必须将精度不高的（苹果/甜瓜）结果返回到更精确的精度（老奶奶史密斯/金美味/ hon子/哈密瓜）：

SELECT * FROM
(
  SELECT fruit, type, sum(count) as fruittypecount
  FROM fruit
  GROUP BY fruit, type
) fruittypesum
INNER JOIN
(
  SELECT fruit, sum(count) as fruitcount
  FROM fruit
  GROUP BY fruit
) fruitsum
ON
  fruittypesum.fruit = fruitsum.fruit

您会得到的：

fruit | type             | fruittypecount | fruit | fruitcount
apple | golden delicious | 7              | apple | 9
apple | granny smith     | 2              | apple | 9
melon | honeydew         | 1              | melon | 7
melon | cantaloupe       | 6              | melon | 7

因此，您的查询，不同的组，详细信息和摘要：

SELECT
    detail.order || '-' || detail.ordinal_number as order,
    detail.company,
    summary.order_total,
    detail.order_shipment_total,
    detail.order_type
FROM (
    SELECT   o.order,
             osh.ordinal_number, 
             o.company,
             SUM(osh.items) AS order_shipment_total,
             o.order_type
    FROM     orders o
             JOIN order_shipments osh ON o.order_id = osh.order_id
    WHERE    o.order = [some order number]
    GROUP BY o.order,
             o.company,
             o.order_type
) detail
INNER JOIN
(
    SELECT   o.order,
             SUM(osh.items) AS order_total
    FROM     orders o
             JOIN order_shipments osh ON o.order_id = osh.order_id
    --don't need the where clause; we'll join on order number
    GROUP BY o.order,
             o.company,
             o.order_type
) summary
ON
summary.order = detail.order

戈登的查询使用窗口函数达到相同的效果；窗口函数在完成分组后运行，并建立了另一个分组级别（PARTITION BY ordernumber），它等效于摘要中的GROUP BY ordernumber。窗口功能摘要数据通过订单号固有地连接到明细数据。隐含的查询说：

SELECT
  ordernumber,
  lineitemnumber,
  SUM(amount) linetotal
  sum(SUM(amount)) over(PARTITION BY ordernumber) ordertotal
GROUP BY
  ordernumber,
  lineitemnumber

..将有一个ordertotal，它是所有linetotal的顺序：GROUP BY将数据准备到行级详细信息，而window函数将数据准备到仅行级细节订单级别，并重复总数以填充每个订单项所需的次数。我用大写写了属于GROUP BY操作的SUM。小写的sum属于分区操作。它必须sum(SUM())，而不能简单地说sum(amount)，因为数量列本身是不允许的-它不在分组依据中。因为数量本身是不允许的，并且必须汇总才能使分组依据工作，所以我们必须sum(SUM())才能运行分区（在分组依据完成后运行）

它的行为与分组为两个不同级别并连接在一起的行为完全相同，实际上，我选择了这种方式进行解释，因为它可以更清楚地说明其与您已经了解的有关组和联接的关系。

记住：JOINS使数据集横向增长，UNIONS使数据集向下增长。当您有一些详细数据并且想要横向增加一些其他数据（摘要）时，请加入。（如果您希望总数在每一列的底部显示，则将其合并）

*您可以将其作为一个查询（不带窗口功能）进行操作，但由于需要进行各种技巧而最终变得不值得，因为它很难维护，因此可能会造成混乱，

在PostgreSQL的同一查询中如何使用两个SUM（）聚合函数？

2 个答案: