我有一个postgres数据库,我想运行一个查询并将表加载到spark数据帧中。我的数据库的一些列是数组。例如:
=> select id, f_2 from raw limit 1;
将返回
id | f_2
---------+-----------
1 | {{140,130},{NULL,NULL},{NULL,NULL}}
我想要的是访问使用此查询在postgres中很容易的140
(内部数组的第一个元素):
=> select id, f_2[1][1] from raw limit 1;
id | f_2
---------+-----------
1 | 140
但我想将其加载到spark数据帧中,这是我加载数据的代码:
df = sqlContext.sql("""
select id as id,
f_2 as A
from raw
""")
并返回此错误:
Py4JJavaError: An error occurred while calling o560.collectToPython.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 4.0 failed 1 times, most recent failure: Lost task 0.0 in stage 4.0 (TID 4, localhost, executor driver): java.lang.ClassCastException: [Ljava.lang.Integer; cannot be cast to java.lang.Integer
然后我尝试了这个:
df = sqlContext.sql("""
select id as id,
f_2[0] as A
from raw
""")
并得到同样的错误然后尝试了这个:
df = sqlContext.sql("""
select id as id,
f_2[0][0] as A
from raw
""")
并返回此错误:
ERROR: An unexpected error occurred while tokenizing input
The following traceback may be corrupted or invalid
The error message is: ('EOF in multi-line string', (1, 0))
AnalysisException: u"Can't extract value from f_2#32685[0];"