我正在尝试将sv的csv文件转换为Athena中的表。当我在Athena控制台上运行查询时,它可以工作,但是当我在具有boto3客户端的Sagemaker Jupyter笔记本上运行查询时,它返回:
"**InvalidRequestException**: An error occurred (InvalidRequestException) when calling the StartQueryExecution operation: line 1:8: no viable alternative at input 'CREATE EXTERNAL'"
这是我的代码
def run_query(query):
client = boto3.client('athena')
response = client.start_query_execution(
QueryString=query,
ResultConfiguration={
'OutputLocation': 's3://path/to/s3output',
}
)
print('Execution ID: ' + response['QueryExecutionId'])
return response
createTable = \
"""CREATE EXTERNAL TABLE TestTable (
ID string,
CustomerId string,
Ip string,
MessageFilename string
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
'separatorChar' = ',',
'quoteChar' = '\"',
'escapeChar' = '\\'
)
STORED AS TEXTFILE
LOCATION 's3://bucket_name/results/csv/'
TBLPROPERTIES ("skip.header.line.count"="1")"""
response = run_query(createTable, s3_output)
print(response)
我已通过boto3客户端以json格式运行查询(因此,使用ROW FORMAT SERDE'org.openx.data.jsonserde.JsonSerDe')效果很好,但不成功。我曾尝试更改名称,语法,引号,但这似乎不起作用。
任何建议将不胜感激, 谢谢!
答案 0 :(得分:1)
感谢您分享完整的示例。问题在于SERDEPROPERTIES
中的转义。在按如下所示修改createTable
之后,
createTable = \
"""CREATE EXTERNAL TABLE testtable (
`id` string,
`customerid` string,
`ip` string,
`messagefilename` string
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
'separatorChar' = ',',
'quoteChar' = '\\\"',
'escapeChar' = '\\\\' )
STORED AS TEXTFILE
LOCATION 's3://bucket_name/results/csv/'
TBLPROPERTIES ("skip.header.line.count"="1");"""