如何在SparkSQL中从数据库读取行

时间:2019-07-11 14:07:30

标签: apache-spark-sql

我正在尝试从DB中读取列数据,该数据在单个列中具有多个键值。 我已经在表格上创建了​​一个数据集(java),并尝试仅读取具有特定值的特定键,但是在spark sql中,它失败了

数据库中的列数据如下,列标题为(fact_value)

{"publication_identifier":"10.1016/j.addr.2012.01.011","publication_identifier_type":"DOI","publication_citation_text":"Synthetic Membrane Active Amphiphiles, Advanced Drug Delivery Reviews, 64, 2012, 784-796, 10.1016/j.addr.2012.01.011","publication_title":"","publication_authors":"George W. Gokel and Saeedeh Negin"}

在我运行的sql开发人员中

select * from grantsqadb.ref_scholarly_view where fact_value->>'publication_identifier_type' = 'DOI';

我得到了我需要的东西

在spark sql中,我为该表创建了一个数据集,为表创建了一个createOrReplaceTempView,并尝试了以下操作

DatasetName.createOrReplaceTempView("forSenario1");

spark.sql("select * from forSenario1 where fact_value->>'publication_identifier_type' = 'PMID' limit 10").show(false);

预期结果如下

{"publication_identifier":"10.1021/ja108564","publication_identifier_type":"DOI","publication_citation_text":"Pore Formation in Phospholipid Bilayers by Branched-Chain Pyrogallol[4]arenes, JOURNAL OF THE AMERICAN CHEMICAL SOCIETY, 133, 2011, 3234, http://wok-ws.isiknowledge.com/WoS?recid=195942747#000288410100005, 10.1021/ja108564","publication_title":"","publication_authors":"Negin, S; Daschbach, MM; Kulikov, OV; Rath, N; Gokel, GW"}

我现在得到的是

mismatched input 'from' expecting {<EOF>, 'WHERE', 'GROUP', 'ORDER', 'HAVING', 'LIMIT', 'LATERAL', 'WINDOW', 'UNION', 'EXCEPT', 'MINUS', 'INTERSECT', 'SORT', 'CLUSTER', 'DISTRIBUTE'}(line 1, pos 9)

SQL

select * from forSenario1 where fact_value->>'publication_identifier_type' = 'PMID' limit 10

0 个答案:

没有答案