我正在使用cloudformation创建我的表的“ Virtual_View”。当我在AWS Athena Console中使用同一视图查询数据时,它可以正常工作并返回数据,但是当我尝试在AWS QuickSight中使用同一视图作为数据集时(使用SPICE),则会引发以下错误:
"Unable to prepare this table.
Please try again or choose another table."
如果我选择在Quicksight中使用“查询”运行它,则会出现以下错误:
region: us-east-1
timestamp: 1558718487000
requestId: 58e18321-7e48-11e9-9740-618021a5eae5
sourceErrorCode: 0
sourceErrorMessage: [Simba][JDBC](11380) Null pointer exception.
sourceErrorState: HY000
sourceException: java.sql.SQLException
sourceType: ATHENA
有趣的部分是,如果我通过使用Athena Web Interface中的“显示/编辑查询”选项来修改视图,并在不更改任何内容的情况下针对我的视图运行“ Alter”视图命令,那么它很快就会开始工作。这使我相信使用我的云形成来创建View会丢失某些东西,或者可能还有其他东西?这是我用来创建db + table +视图的cloudformation模板。
AWSTemplateFormatVersion: 2010-09-09
Description: Glue Athena database and table configuration
Parameters:
Stage:
Description: Stage name (dev, prod)
Type: String
MinLength: 3
PartitionKey:
Description: Patition key for the table (dont use dashes)
Type: String
Default: "modkey"
MinLength: 3
Resources:
GlueDatabase:
Type: AWS::Glue::Database
Properties:
DatabaseInput:
Name: !Sub
- db_${Stage}_glue
-
Stage: !Ref Stage
CatalogId: !Ref AWS::AccountId
GlueTable:
Type: AWS::Glue::Table
Properties:
DatabaseName: !Ref GlueDatabase
CatalogId: !Ref AWS::AccountId
TableInput:
Name: tbl_request
TableType: EXTERNAL_TABLE
Parameters:
CrawlerSchemaDeserializerVersion: "1.0"
CrawlerSchemaSerializerVersion: "1.0"
classification: json
compressionType: none
typeOfData: file
PartitionKeys:
# Data is partitioned by this key
- Name: !Ref PartitionKey
Type: string
StorageDescriptor:
Compressed: false
Location:
Fn::Join:
- ''
- - 's3://'
- Fn::ImportValue:
!Sub
- requests-${Stage}-s3
-
Stage: !Ref Stage
- '/'
InputFormat: org.apache.hadoop.mapred.TextInputFormat
StoredAsSubDirectories: false
OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
SerdeInfo:
Parameters: {paths: 'Id,Module,Organization,Redirect,RequestTime,Suppressed,Template,TemplateData,ToAddresses,ToAddress,Events'}
SerializationLibrary: org.openx.data.jsonserde.JsonSerDe
Columns:
- {Name: id, Type: string}
- {Name: organization, Type: string}
- {Name: module, Type: string}
- {Name: requesttime, Type: string}
- {Name: templatedata, Type: string}
- {Name: template, Type: string}
- {Name: toaddress, Type: string}
- {Name: toaddresses, Type: array<string>}
- {Name: suppressed, Type: array<string>}
- {Name: events, Type: array<string>}
- {Name: redirect, Type: array<string>}
ViewDeliverySample:
Type: AWS::Glue::Table
DependsOn: GlueTable
Properties:
DatabaseName: !Ref GlueDatabase
CatalogId: !Ref AWS::AccountId
TableInput:
Name: tbl_request_view
TableType: VIRTUAL_VIEW
ViewOriginalText:
Fn::Join:
- ''
- - '/* Presto View: '
- Fn::Base64: !Sub
- |
{
"originalSql": "WITH dataset AS ( WITH requests_dataset AS (SELECT * FROM ${TableName} ), basedataset AS (SELECT id, module, ${PartitionKey}, CAST( json_extract(event, '$.eventtype') AS VARCHAR ) AS eventtype, event AS detail FROM requests_dataset CROSS JOIN unnest(events) AS t(event) ), send_dataset AS (SELECT email, module, ${PartitionKey}, eventtype, CAST(json_extract(detail, '$.mail.timestamp') AS VARCHAR ) AS time, id FROM basedataset CROSS JOIN unnest (CAST(json_extract(detail,'$.mail.destination') AS ARRAY(VARCHAR))) AS t(email) WHERE eventtype = 'Send' ), delivery_dataset AS (SELECT email, module, ${PartitionKey}, eventtype, CAST(json_extract(detail, '$.delivery.timestamp') AS VARCHAR ) AS time, id FROM basedataset CROSS JOIN unnest (CAST(json_extract(detail,'$.delivery.recipients') AS ARRAY(VARCHAR))) AS t(email) WHERE eventtype = 'Delivery' ), bounce_dataset AS (SELECT CAST(rr['emailaddress'] AS VARCHAR )as email, module,${PartitionKey}, eventtype, CAST(json_extract(detail,'$.bounce.timestamp') AS VARCHAR ) AS time, id FROM basedataset CROSS JOIN unnest (CAST(json_extract(detail,'$.bounce.bouncedrecipients') AS ARRAY(MAP(VARCHAR,JSON))) ) AS t(rr) WHERE eventtype='Bounce' ), suppress_dataset AS (SELECT email, module, ${PartitionKey}, 'suppress' AS eventtype, requesttime AS time, id FROM requests_dataset CROSS JOIN unnest(suppressed) AS t(email) ) SELECT * FROM send_dataset UNION SELECT * FROM delivery_dataset UNION SELECT * FROM bounce_dataset UNION SELECT * FROM suppress_dataset ) SELECT * FROM dataset ORDER BY email, module, eventtype, time",
"catalog": "awsdatacatalog",
"schema": "${DatabaseName}",
"columns": [
{
"name": "email",
"type": "varchar"
},
{
"name": "module",
"type": "varchar"
},
{
"name": "modkey",
"type": "varchar"
},
{
"name": "eventtype",
"type": "varchar"
},
{
"name": "time",
"type": "varchar"
},
{
"name": "id",
"type": "varchar"
}
]
}
- {
DatabaseName: !Ref GlueDatabase,
TableName: !Ref GlueTable,
PartitionKey: !Ref PartitionKey
}
- ' */'
ViewExpandedText: '/* Presto View */'
Parameters:
presto_view: true
comment: "Presto View"
StorageDescriptor:
Compressed: false
StoredAsSubDirectories: false
SerdeInfo:
Parameters: {paths: 'email,module,modkey,eventtype,time,id'}
SerializationLibrary: org.openx.data.jsonserde.JsonSerDe
Columns:
- {Name: email, Type: string}
- {Name: module, Type: string}
- {Name: modkey, Type: string}
- {Name: eventtype, Type: string}
- {Name: time, Type: string}
- {Name: id, Type: string}
答案 0 :(得分:0)
可以通过删除serdeinfo并添加空的partitionkey数组来进行修复。
ViewDeliverySample:
Description: some description here # change this
Type: AWS::Glue::Table
DependsOn: GlueTable
Properties:
DatabaseName: !Ref GlueDatabase
CatalogId: !Ref AWS::AccountId
TableInput:
Name: tbl_request_view
TableType: VIRTUAL_VIEW
Parameters:
presto_view: true
PartitionKeys: []
ViewOriginalText:
Fn::Join:
- ''
- - '/* Presto View: '
- Fn::Base64: !Sub
- |
{
"originalSql": "my sql query here",
"catalog": "awsdatacatalog",
"schema": "${DatabaseName}",
"columns": [
{
"name": "email",
"type": "varchar"
},
{
"name": "module",
"type": "varchar"
},
{
"name": "${PartitionKey}",
"type": "varchar"
},
{
"name": "eventtype",
"type": "varchar"
},
{
"name": "time",
"type": "varchar"
},
{
"name": "id",
"type": "varchar"
}
]
}
- {
DatabaseName: !Ref GlueDatabase,
TableName: !Ref GlueTable,
PartitionKey: !Ref PartitionKey
}
- ' */'
ViewExpandedText: '/* Presto View */'
StorageDescriptor:
SerdeInfo: {}
Columns:
- {Name: email, Type: string}
- {Name: module, Type: string}
- {Name: eventtype, Type: string}
- {Name: time, Type: string}
- {Name: id, Type: string}