我正在尝试从外部文本文件读取数据框架架构,并使用它创建数据框架。但是,我不明白如何将字符串转换/转换为StructType。
我正在使用Spark 2.1和Java。 这是代码段。
BufferedReader br = new BufferedReader(new FileReader(new File(“ C:\ Users \ kt799f \ Desktop \ struct.txt”)));; 字符串struct2 = br.readLine(); session.sqlContext()。read()。schema(struct2).json(“ C:\ Users \ kt799f \ Desktop \ testJson.txt”)。show();
如何将上面代码中的struct2强制转换为StructType?
struct文件包含此 StructType(StructField(name,StringType,true),StructField(age,IntegerType,true))
org.apache.spark.sql.catalyst.parser.ParseException: 输入'{'不匹配,期望{'SELECT','FROM','ADD','AS','ALL','DISTINCT','WHERE','GROUP','BY','GROUPING','SETS' ,'CUBE','ROLLUP','ORDER','HAVING','LIMIT','AT','OR','AND','IN',NOT,'NO','EXISTS','BETWEEN' ,'LIKE',RLIKE,'IS','NULL','TRUE','FALSE','NULLS','ASC','DESC','FOR','INTERVAL','CASE','WHEN' ,“ THEN”,“ ELSE”,“ END”,“ JOIN”,“ CROSS”,“ OUTER”,“ INNER”,“ LEFT”,“ SEMI”,“ RIGHT”,“ FULL”,“ NATURAL”,“开启”,“横向”,“窗口”,“上方”,“方向”,“范围”,“行”,“未限制”,“开始”,“跟随”,“当前”,“第一”,“之后” ,“最后一个”,“ ROW”,“ WITH”,“ VALUES”,“ CREATE”,“ TABLE”,“ DIRECTORY”,“ VIEW”,“ REPLACE”,“ INSERT”,“ DELETE”,“ INTO”,“ DESCRIBE”,“ EXPLAIN”,“ FORMAT”,“ LOGICAL”,“ CODEGEN”,“ COST”,“ CAST”,“ SHOW”,“ TABLES”,“ COLUMNS”,“ COLUMN”,“ USE”,“ PARTITIONS” ,“ FUNCTIONS”,“ DROP”,“ UNION”,“ EXCEPT”,“ MINUS”,“ INTERSECT”,“ TO”,“ TABLESAMPLE”,“ STRATIFY”,“ ALTER”,“ RENAME”,“ ARRAY”,“ MAP”,“ STRUCT”,“ COMMENT”,“ SET”,“ RESET”,“ DATA”,“ START”,“ TRA” NSACTION”,“提交”,“ ROLLBACK”,“ MACRO”,“ IGNORE”,“ BOTH”,“ LEADING”,“ TRAILING”,“ IF”,“ POSITION”,“ DIV”,“ PERCENT”,“ BUCKET” ,“ OUT”,“ OF”,“ SORT”,“ CLUSTER”,“ DISTRIBUTE”,“ OVERWRITE”,“ TRANSFORM”,“ REDUCE”,“ SERDE”,“ SERDEPROPERTIES”,“ RECORDREADER”,“ RECORDWRITER”,“ DELIMITED”,“ FIELDS”,“ TERMINATED”,“ COLLECTION”,“ ITEMS”,“ KEYS”,“ ESCAPED”,“ LINES”,“ SEPARATED”,“ FUNCTION”,“ EXTENDED”,“ REFRESH”,“ CLEAR” ,“ CACHE”,“ UNCACHE”,“ LAZY”,“ FORMATTED”,“ GLOBAL”,临时,“ OPTIONS”,“ UNSET”,“ TBLPROPERTIES”,“ DBPROPERTIES”,“ BUCKETS”,“ SKEWED”,“ STORED” ,“目录”,“位置”,“交换”,“归档”,“未归档”,“ FILEFORMAT”,“触摸”,“压缩”,“ CONCATENATE”,“更改”,“串级”,“限制”,“丛集,'排序','清除','输入格式','输出格式',数据库,数据库,'DFS','截断','分析','计算机','列表','统计','已分配' ,'EXTERNAL','DEFINED','REVOKE','GRANT','LOCK','UNLOCK','MSCK','REPAIR','RECOVER','EXPORT','IMPORT','LOAD','角色','角色','COMPACTIONS','PRINCIPALS','TRANSACTIONS','INDEX','INDEXES','LOCKS','OPTION','ANTI','LOCAL','INPATH',IDENTIFIER,BACKQUOTED_IDENTIFIER}(第1行,位置0)
== SQL == {“ type”:“ struct”,“ fields”:[{“ name”:“ name”,“ type”:“ string”,“ nullable”:true,“ metadata”:{}},{“ name”: “ age”,“ type”:“ long”,“ nullable”:true,“ metadata”:{}}]} ^^^
at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:239)
at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:115)
at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parseTableSchema(ParseDriver.scala:64)
at org.apache.spark.sql.types.StructType$.fromDDL(StructType.scala:425)
at org.apache.spark.sql.DataFrameReader.schema(DataFrameReader.scala:84)
at com.att.sparktest.PartitionReadLocal.main(PartitionReadLocal.java:37)
输入'{'不匹配,期望{'SELECT','FROM','ADD','AS','ALL','DISTINCT','WHERE','GROUP','BY','GROUPING', 'SETS','CUBE','ROLLUP','ORDER','HAVING','LIMIT','AT','OR','AND','IN',NOT,'NO','EXISTS', 'BETWEEN','LIKE',RLIKE,'IS','NULL','TRUE','FALSE','NULLS','ASC','DESC','FOR','INTERVAL','CASE', '时间','时间','其他','结束','加入','交叉','外部','内部','左','半','右','全','自然','开','横向','窗户','上方','纵断面','范围','行','无界','前行','跟随','当前','第一', 'AFTER','LAST','ROW','WITH','VALUES','CREATE','TABLE','DIRECTORY','VIEW','REPLACE','INSERT','DELETE','INTO' ”,“ DESCRIBE”,“ EXPLAIN”,“ FORMAT”,“ LOGICAL”,“ CODEGEN”,“ COST”,“ CAST”,“ SHOW”,“ TABLES”,“ COLUMNS”,“ COLUMN”,“ USE”, 'PARTITIONS','FUNCTIONS','DROP','UNION','EXCEPT','MINUS','INTERSECT','TO','TABLESAMPLE','STRATIFY','ALTER','RENAME','ARRAY' ','MAP','STRUCT','COMMENT','SET','RESET','DATA','START',' TRANSACTION”,“ COMMIT”,“ ROLLBACK”,“ MACRO”,“ IGNORE”,“ BOTH”,“ LEADING”,“ TRAILING”,“ IF”,“ POSITION”,“ DIV”,“ PERCENT”,“ BUCKET” ,“ OUT”,“ OF”,“ SORT”,“ CLUSTER”,“ DISTRIBUTE”,“ OVERWRITE”,“ TRANSFORM”,“ REDUCE”,“ SERDE”,“ SERDEPROPERTIES”,“ RECORDREADER”,“ RECORDWRITER”,“ DELIMITED”,“ FIELDS”,“ TERMINATED”,“ COLLECTION”,“ ITEMS”,“ KEYS”,“ ESCAPED”,“ LINES”,“ SEPARATED”,“ FUNCTION”,“ EXTENDED”,“ REFRESH”,“ CLEAR” ,“ CACHE”,“ UNCACHE”,“ LAZY”,“ FORMATTED”,“ GLOBAL”,临时,“ OPTIONS”,“ UNSET”,“ TBLPROPERTIES”,“ DBPROPERTIES”,“ BUCKETS”,“ SKEWED”,“ STORED” ,“目录”,“位置”,“交换”,“归档”,“未归档”,“ FILEFORMAT”,“触摸”,“压缩”,“ CONCATENATE”,“更改”,“串级”,“限制”,“丛集,'排序','清除','输入格式','输出格式',数据库,数据库,'DFS','截断','分析','计算机','列表','统计','已分配' ,'EXTERNAL','DEFINED','REVOKE','GRANT','LOCK','UNLOCK','MSCK','REPAIR','RECOVER','EXPORT','IMPORT','LOAD','角色','RO LES”,“ COMPACTIONS”,“ PRINCIPALS”,“ TRANSACTIONS”,“ INDEX”,“ INDEXES”,“ LOCKS”,“ OPTION”,“ ANTI”,“ LOCAL”,“ INPATH”,IDENTIFIER,BACKQUOTED_IDENTIFIER}(第1行) ,位置0)
== SQL == {“ type”:“ struct”,“ fields”:[{“ name”:“ name”,“ type”:“ string”,“ nullable”:true,“ metadata”:{}},{“ name”: “ age”,“ type”:“ long”,“ nullable”:true,“ metadata”:{}}]} ^^^
答案 0 :(得分:0)
首先如何创建模式文件?看起来您只是.toString()模式。我建议做的是使用json库将Struct类型序列化为保存到文件的字符串。然后,要使用该文件,请读取文件并使用相同的json库反序列化以返回该类。