在Athena中针对S3源运行相同查询时,与在EMR(1 x 10)群集上的pyspark脚本中执行相同查询时,我得到了不同的结果。我从雅典娜那里得到了数据,但是我得到的只是脚本的空值。关于原因的任何建议/想法/猜测吗?
这是雅典娜查询:
public List<string> GetCurrentUserGroupList()
{
List<string> currentGroupList = new List<string>();
try
{
if (this.User != null && this.User.Identity != null)
{
foreach (System.Security.Principal.IdentityReference group in
System.Web.HttpContext.Current.Request.LogonUserIdentity.Groups)
{
currentGroupList.Add(group.Translate(typeof
(System.Security.Principal.NTAccount)).ToString());
}
}
}
catch (Exception exc)
{
Console.WriteLine(exc.Message);
Console.WriteLine(exc.StackTrace);
}
return currentGroupList;
}
哪个返回此结果:
SELECT <real_col1> as reg_plate, <real_col1> as model_num
FROM <my Athena table name>
WHERE partition_datetime LIKE '2019-01-01-14'
limit 10
但是,当我以脚本的形式运行此查询时,使用以下命令针对同一S3源:
reg_plate model_num
515355 961-824
515355 961-824
515355 961-824
515355 961-824
341243 047-891
727027 860-403
619656 948-977
576345 951-657
576345 951-657
113721 034-035
像空一样,我什么也没得到
# Define SQL query
load_qry = """SELECT <real_col1> as reg_plate, <real_col2> as model_num
FROM s3_table
WHERE partition_datetime LIKE '2019-01-01-14'
limit 10 """
df1 = spark.read.parquet("<s3:path to my data>")
df1.createOrReplaceTempView("s3_table")
sqlDF = spark.sql(load_qry)
sqlDF.show(10)
这是我集群上的配置,该配置是1个主r3.xlarge和10个r3.xlarge工人:
这是我用来启动火花作业的命令字符串:+---------+---------+
|reg_plate|model_num|
+---------+---------+
| null| null|
| null| null|
| null| null|
| null| null|
| null| null|
| null| null|
| null| null|
| null| null|
| null| null|
| null| null|
+---------+---------+
答案 0 :(得分:0)
我找到了一个简单的解决方案。
代替
load_qry = """SELECT <real_col1> as reg_plate, <real_col2> as model_num
FROM s3_table WHERE partition_datetime LIKE '2019-01-01-14' limit 10 """
df1 = spark.read.parquet("<s3:path to my data>")
df1.createOrReplaceTempView("s3_table")
我用过
load_qry = """SELECT <real_col1> as reg_plate, <real_col2> as model_num
FROM <my_athena_db>.table WHERE partition_datetime LIKE '2019-01-01-14'
df1 = spark.sql(load_qry)
之所以起作用,是因为Glue知道如何进入“ my_athena_db.table”