我有一个PostgreSQL查询,它产生以下结果:
SELECT o.order || '-' || osh.ordinal_number AS order,
o.company,
o.order_total,
SUM(osh.items) AS order_shipment_total,
o.order_type
FROM orders o
JOIN order_shipments osh ON o.order_id = osh.order_id
WHERE o.order = [some order number]
GROUP BY o.order,
o.company,
o.order_total,
o.order_type;
order | company | order_total | order_shipment_total | order_type
-------------------------------------------------------------------
123-1 | A corp. | null | 125.00 | new
123-2 | B corp. | null | 100.00 | new
我需要替换o.order_total
(它不能正常工作),并对order_shipment_total列的总和求和,以便在上面的示例中,每行结束时显示225.00。我需要上面的结果如下所示:
order | company | order_total | order_shipment_total | order_type
-------------------------------------------------------------------
123-1 | A corp. | 225.00 | 125.00 | new
123-2 | B corp. | 225.00 | 100.00 | new
我尝试过的事情
1。)要替换o.order_total
,我尝试了SUM(SUM(osh.items))
,但收到错误消息,提示您不能嵌套聚合函数。
2。)我试图将整个查询作为子查询并求和order_shipment_total
列的总和,但是当我这样做时,它只是重复了该列本身。见下文:
SELECT order,
company,
SUM(order_shipment_total) AS order_shipment_total,
order_shipment_total,
order_type
FROM (
SELECT o.order || '-' || osh.ordinal_number AS order,
o.company,
o.order_total,
SUM(osh.items) AS order_shipment_total,
o.order_type
FROM orders o
JOIN order_shipments osh ON o.order_id = osh.order_id
WHERE o.order = [some order number]
GROUP BY o.order,
o.company,
o.order_total,
o.order_type
) subquery
GROUP BY order,
company,
order_shipment_total,
order_type;
order | company | order_total | order_shipment_total | order_type
-------------------------------------------------------------------
123-1 | A corp. | 125.00 | 125.00 | new
123-2 | B corp. | 100.00 | 100.00 | new
3。)我已经尝试在上面的子查询/查询示例中仅包含我实际上想要分组的行,因为我觉得我能够在Oracle SQL中做到这一点。但是当我这样做时,我收到一条错误消息,说“列[名称]必须出现在GROUP BY子句中或在聚合函数中使用。”
...
GROUP BY order,
company,
order_type;
ERROR: column "[a column name]" must appear in the GROUP BY clause or be used in an aggregate function.
我该如何完成?我确定可以使用子查询作为答案,但是我对为什么这种方法行不通感到困惑。
答案 0 :(得分:2)
您应该能够使用窗口功能:
GetDateTimeFunction:
Type: AWS::Lambda::Function
Properties:
Handler: index.lambda_handler
Runtime: python2.7
Timeout: '10'
Role: !GetAtt GetDateTimeFunctionExecutionRole.Arn
Code:
ZipFile: |
from __future__ import print_function
from boto3.session import Session
from zipfile import ZipFile
import json
import datetime
import boto3
import botocore
import traceback
import os
import shutil
code_pipeline = boto3.client('codepipeline')
def evaluate(event):
# Extract attributes passed in by CodePipeline
job_id = event['CodePipeline.job']['id']
job_data = event['CodePipeline.job']['data']
config = job_data['actionConfiguration']['configuration']
credentials = job_data['artifactCredentials']
output_artifact = job_data['outputArtifacts'][0]
output_bucket = output_artifact['location']['s3Location']['bucketName']
output_key = output_artifact['location']['s3Location']['objectKey']
# Temporary credentials to access CodePipeline artifact in S3
key_id = credentials['accessKeyId']
key_secret = credentials['secretAccessKey']
session_token = credentials['sessionToken']
return (job_id, output_bucket, output_key, key_id, key_secret, session_token)
def create_artifact(data):
artifact_dir = '/tmp/output_artifacts/'+str(uuid.uuid4())
artifact_file = artifact_dir+'/files/output.json'
zipped_artifact_file = artifact_dir+'/artifact.zip'
try:
shutil.rmtree(artifact_dir+'/files/')
except Exception:
pass
try:
os.remove(zipped_artifact_file)
except Exception:
pass
os.makedirs(artifact_dir+'/files/')
with open(artifact_file, 'w') as outfile:
json.dump(data, outfile)
with ZipFile(zipped_artifact_file, 'w') as zipped_artifact:
zipped_artifact.write(artifact_file, os.path.basename(artifact_file))
return zipped_artifact_file
def init_s3client (key_id, key_secret, session_token):
session = Session(aws_access_key_id=key_id, aws_secret_access_key=key_secret, aws_session_token=session_token)
s3client = session.client('s3', config=botocore.client.Config(signature_version='s3v4'))
return s3client
def lambda_handler(event, context):
try:
(job_id, output_bucket, output_key, key_id, key_secret, session_token)=evaluate(event)
(s3client)=init_s3client(key_id, key_secret, session_token)
now=datetime.datetime.now().strftime('%Y-%m-%d_%H:%M:%S')
data={"DateTime":now}
(zipped_artifact_file)=create_artifact(data)
s3client.upload_file(zipped_artifact_file, output_bucket, output_key, ExtraArgs={"ServerSideEncryption": "AES256"})
# Tell CodePipeline we succeeded
code_pipeline.put_job_success_result(jobId=job_id)
except Exception as e:
print("ERROR: " + repr(e))
message=repr(e)
traceback.print_exc()
# Tell CodePipeline we failed
code_pipeline.put_job_failure_result(jobId=job_id, failureDetails={'message': message, 'type': 'JobFailed'})
return "complete"
Pipeline:
Type: AWS::CodePipeline::Pipeline
Properties:
ArtifactStore:
Location: !Ref ArtifactStoreBucket
Type: S3
DisableInboundStageTransitions: []
Name: !Ref PipelineName
RoleArn: !GetAtt PipelineRole.Arn
Stages:
- Name: S3Source
Actions:
- Name: TemplateSource
RunOrder: 1
ActionTypeId:
Category: Source
Owner: AWS
Provider: S3
Version: '1'
Configuration:
S3Bucket: !Ref ArtifactStoreBucket
S3ObjectKey: !Ref SourceS3Key
OutputArtifacts:
- Name: TemplateSource
- Name: Deploy
Actions:
- Name: GetDateTime
RunOrder: 1
ActionTypeId:
Category: Invoke
Owner: AWS
Provider: Lambda
Version: '1'
Configuration:
FunctionName: !Ref GetDateTimeFunction
OutputArtifacts:
- Name: GetDateTimeOutput
- Name: CreateStack
RunOrder: 2
ActionTypeId:
Category: Deploy
Owner: AWS
Provider: CloudFormation
Version: '1'
InputArtifacts:
- Name: TemplateSource
- Name: GetDateTimeOutput
Configuration:
ActionMode: REPLACE_ON_FAILURE
Capabilities: CAPABILITY_IAM
RoleArn: !GetAtt CloudFormationRole.Arn
StackName: !Ref CFNStackname
TemplatePath: !Sub TemplateSource::${CFNScriptfile}
TemplateConfiguration: !Sub TemplateSource::${CFNConfigfile}
ParameterOverrides: |
{
"DateTimeInput" : { "Fn::GetParam" : ["GetDateTimeOutput", "output.json", "DateTime"]}
}
答案 1 :(得分:2)
您对查询/方法不太了解的事情是,您实际上想要在同一查询行结果中进行两种不同级别的分组。子查询方法是正确的一半,但是当您对一个子查询进行分组时,在另一个对它进行分组的查询中,您只能使用已从子查询中获取的数据,并且只能选择将其保持在聚合级别详细信息,或者您可以选择降低精度以支持更多分组。您不能保留细节并丢失细节以进行进一步总结。因此,从实际意义上讲,子查询相对而言是毫无意义的,因为您可能会在一击中将其分组到所需的级别:
SELECT groupkey1, sum(y) FROM
(SELECT groupkey1, groupkey2, sum(x) as y FROM table GROUP BY groupkey1, groupkey2)
GROUP BY groupkey1
与:
SELECT groupkey1, sum(x) FROM
table
GROUP BY groupky1
戈登的答案可能会解决(除了您的错误,因为分组集是错误的/未涵盖所有列),但对于您的理解可能没有太大帮助,因为这是代码-仅答案。以下是您如何解决此问题的详细信息,但需要使用更简单的数据并使用已知的窗口功能。
假设库存中有不同类型的苹果和瓜。您想要一个查询,给出总计每种特定种类的水果,与购买日期无关。您还需要为每种水果总体类型的总计列:
详细信息:
fruit | type | purchasedate | count
apple | golden delicious | 2017-01-01 | 3
apple | golden delicious | 2017-01-02 | 4
apple | granny smith | 2017-01-04 ! 2
melon | honeydew | 2017-01-01 | 1
melon | cantaloupe | 2017-01-05 | 4
melon | cantaloupe | 2017-01-06 | 2
这就是7个金黄色的美味,2个史密斯奶奶,1个甘露,6个哈密瓜,还有9个苹果和7个甜瓜
您不能将其作为一个查询*,因为您想要两个不同级别的分组。您必须将其作为两个查询,然后(关键的理解点),您必须将精度不高的(苹果/甜瓜)结果返回到更精确的精度(老奶奶史密斯/金美味/ hon子/哈密瓜):
SELECT * FROM
(
SELECT fruit, type, sum(count) as fruittypecount
FROM fruit
GROUP BY fruit, type
) fruittypesum
INNER JOIN
(
SELECT fruit, sum(count) as fruitcount
FROM fruit
GROUP BY fruit
) fruitsum
ON
fruittypesum.fruit = fruitsum.fruit
您会得到的:
fruit | type | fruittypecount | fruit | fruitcount
apple | golden delicious | 7 | apple | 9
apple | granny smith | 2 | apple | 9
melon | honeydew | 1 | melon | 7
melon | cantaloupe | 6 | melon | 7
因此,您的查询,不同的组,详细信息和摘要:
SELECT
detail.order || '-' || detail.ordinal_number as order,
detail.company,
summary.order_total,
detail.order_shipment_total,
detail.order_type
FROM (
SELECT o.order,
osh.ordinal_number,
o.company,
SUM(osh.items) AS order_shipment_total,
o.order_type
FROM orders o
JOIN order_shipments osh ON o.order_id = osh.order_id
WHERE o.order = [some order number]
GROUP BY o.order,
o.company,
o.order_type
) detail
INNER JOIN
(
SELECT o.order,
SUM(osh.items) AS order_total
FROM orders o
JOIN order_shipments osh ON o.order_id = osh.order_id
--don't need the where clause; we'll join on order number
GROUP BY o.order,
o.company,
o.order_type
) summary
ON
summary.order = detail.order
戈登的查询使用窗口函数达到相同的效果;窗口函数在完成分组后运行,并建立了另一个分组级别(PARTITION BY ordernumber
),它等效于摘要中的GROUP BY ordernumber
。窗口功能摘要数据通过订单号固有地连接到明细数据。隐含的查询说:
SELECT
ordernumber,
lineitemnumber,
SUM(amount) linetotal
sum(SUM(amount)) over(PARTITION BY ordernumber) ordertotal
GROUP BY
ordernumber,
lineitemnumber
..将有一个ordertotal
,它是所有linetotal
的顺序:GROUP BY将数据准备到行级详细信息,而window函数将数据准备到仅行级细节订单级别,并重复总数以填充每个订单项所需的次数。我用大写写了属于GROUP BY操作的SUM
。小写的sum
属于分区操作。它必须sum(SUM())
,而不能简单地说sum(amount)
,因为数量列本身是不允许的-它不在分组依据中。因为数量本身是不允许的,并且必须汇总才能使分组依据工作,所以我们必须sum(SUM())
才能运行分区(在分组依据完成后运行)
它的行为与分组为两个不同级别并连接在一起的行为完全相同,实际上,我选择了这种方式进行解释,因为它可以更清楚地说明其与您已经了解的有关组和联接的关系。
记住:JOINS使数据集横向增长,UNIONS使数据集向下增长。当您有一些详细数据并且想要横向增加一些其他数据(摘要)时,请加入。 (如果您希望总数在每一列的底部显示,则将其合并)
*您可以将其作为一个查询(不带窗口功能)进行操作,但由于需要进行各种技巧而最终变得不值得,因为它很难维护,因此可能会造成混乱,