我正在尝试从DB中读取列数据,该数据在单个列中具有多个键值。 我已经在表格上创建了一个数据集(java),并尝试仅读取具有特定值的特定键,但是在spark sql中,它失败了
数据库中的列数据如下,列标题为(fact_value)
{"publication_identifier":"10.1016/j.addr.2012.01.011","publication_identifier_type":"DOI","publication_citation_text":"Synthetic Membrane Active Amphiphiles, Advanced Drug Delivery Reviews, 64, 2012, 784-796, 10.1016/j.addr.2012.01.011","publication_title":"","publication_authors":"George W. Gokel and Saeedeh Negin"}
在我运行的sql开发人员中
select * from grantsqadb.ref_scholarly_view where fact_value->>'publication_identifier_type' = 'DOI';
我得到了我需要的东西
在spark sql中,我为该表创建了一个数据集,为表创建了一个createOrReplaceTempView,并尝试了以下操作
DatasetName.createOrReplaceTempView("forSenario1");
spark.sql("select * from forSenario1 where fact_value->>'publication_identifier_type' = 'PMID' limit 10").show(false);
预期结果如下
{"publication_identifier":"10.1021/ja108564","publication_identifier_type":"DOI","publication_citation_text":"Pore Formation in Phospholipid Bilayers by Branched-Chain Pyrogallol[4]arenes, JOURNAL OF THE AMERICAN CHEMICAL SOCIETY, 133, 2011, 3234, http://wok-ws.isiknowledge.com/WoS?recid=195942747#000288410100005, 10.1021/ja108564","publication_title":"","publication_authors":"Negin, S; Daschbach, MM; Kulikov, OV; Rath, N; Gokel, GW"}
我现在得到的是
mismatched input 'from' expecting {<EOF>, 'WHERE', 'GROUP', 'ORDER', 'HAVING', 'LIMIT', 'LATERAL', 'WINDOW', 'UNION', 'EXCEPT', 'MINUS', 'INTERSECT', 'SORT', 'CLUSTER', 'DISTRIBUTE'}(line 1, pos 9)
SQL
select * from forSenario1 where fact_value->>'publication_identifier_type' = 'PMID' limit 10