鉴于以下情况:
val sqlTransformer_labeled = new SQLTransformer().setStatement("""SELECT *, CAST(var1 AS DOUBLE) AS label, var2 - var3 - var4 AS newvar1, var5 - var6 AS new var2""")
如何将长SQL查询分解为多行?为了便于阅读,我想要以下内容:
val sqlTransformer_labeled = new SQLTransformer().setStatement("""SELECT *,
CAST(var1 AS DOUBLE) AS label,
var2 - var3 - var4 AS newvar1,
var5 - var6 AS new var2""")
可读代码将在6个月后对我自己以及协作者以及我们的迭代建模工作流程都有所帮助。
然而,尝试运行后者,更清晰的上述代码版本,会给出奇怪的反馈,如下所示:
scala> val sqlTransformer_labeled = new SQLTransformer().setStatement("""SELECT *,
|
<init> DRIVER_IDENTIFIER LEGACY_DRIVER_IDENTIFIER Pipeline PipelineModel RDD_SCOPE_KEY RDD_SCOPE_NO_OVERRIDE_KEY
RandomForestClassifier SPARK_JOB_DESCRIPTION SPARK_JOB_GROUP_ID SPARK_JOB_INTERRUPT_ON_CANCEL SQLTransformer StringIndexer StringToColumn
VectorAssembler _sqlContext abs acos add_months approxCountDistinct array
array_contains asc ascii asin atan atan2 avg
base64 bin bitwiseNOT boolToBoolWritable booleanWritableConverter broadcast bytesToBytesWritable
bytesWritableConverter callUDF callUdf cbrt ceil classOf clearActiveContext
clone coalesce col collect_list collect_set column concat
concat_ws conv corr cos cosh count countDistinct
crc32 cumeDist cume_dist current_date current_timestamp date_add date_format
date_sub datediff dayofmonth dayofyear decode denseRank dense_rank
desc doubleRDDToDoubleRDDFunctions doubleToDoubleWritable doubleWritableConverter encode eq equals
exp explode expm1 expr factorial finalize first
floatToFloatWritable floatWritableConverter floor format_number format_string from_unixtime from_utc_timestamp
getClass getOrCreate get_json_object greatest hashCode hex hh_features
hh_split hh_timeline_pair_luid hour hypot initcap inputFileName input_file_name
instr intRddToDataFrameHolder intToIntWritable intWritableConverter isNaN isTraceEnabled isnan
isnull jarOfClass jarOfObject json_tuple kurtosis lag last
last_day lead least length levenshtein lit localSeqToDataFrameHolder
localSeqToDatasetHolder locate log log10 log1p log2 logDebug
logError logInfo logName logTrace logWarning longRddToDataFrameHolder longToLongWritable
longWritableConverter lower lpad ltrim markPartiallyConstructed max md5
mean min minute model modeling_stack monotonicallyIncreasingId monotonically_increasing_id
month months_between nanvl ne negate newBooleanEncoder newByteEncoder
newDoubleEncoder newFloatEncoder newIntEncoder newLongEncoder newProductEncoder newShortEncoder newStringEncoder
next_day not notify notifyAll ntile numDriverCores numericRDDToDoubleRDDFunctions
percentRank percent_rank pipeline pmod pow prediction quarter
rand randn rank rddToAsyncRDDActions rddToDataFrameHolder rddToDatasetHolder rddToOrderedRDDFunctions
rddToPairRDDFunctions rddToSequenceFileRDDFunctions regexp_extract regexp_replace repeat reverse rf
rint round rowNumber row_number rpad rtrim sc
second setActiveContext sha1 sha2 shiftLeft shiftRight shiftRightUnsigned
signum sin sinh size skewness sort_array soundex
sparkPartitionId spark_partition_id split sql sqlContext sqlTransformer sqlTransformer_labeled
sqrt stddev stddev_pop stddev_samp stringIndexer stringRddToDataFrameHolder stringToText
stringWritableConverter struct substring substring_index sum sumDistinct symbolToColumn
synchronized tan tanh toDegrees toRadians toString to_date
to_utc_timestamp translate trim trunc tstack udf unbase64
unhex unix_timestamp updatedConf upper var_pop var_samp variance
wait weekofyear when writableWritableConverter year
| cast(same_variable as double) as label,
|
<init> DRIVER_IDENTIFIER LEGACY_DRIVER_IDENTIFIER Pipeline PipelineModel RDD_SCOPE_KEY RDD_SCOPE_NO_OVERRIDE_KEY
RandomForestClassifier SPARK_JOB_DESCRIPTION SPARK_JOB_GROUP_ID SPARK_JOB_INTERRUPT_ON_CANCEL SQLTransformer StringIndexer StringToColumn
VectorAssembler _sqlContext abs acos add_months approxCountDistinct array
array_contains asc ascii asin atan atan2 avg
base64 bin bitwiseNOT boolToBoolWritable booleanWritableConverter broadcast bytesToBytesWritable
bytesWritableConverter callUDF callUdf cbrt ceil classOf clearActiveContext
clone coalesce col collect_list collect_set column concat
concat_ws conv corr cos cosh count countDistinct
crc32 cumeDist cume_dist current_date current_timestamp date_add date_format
date_sub datediff dayofmonth dayofyear decode denseRank dense_rank
desc doubleRDDToDoubleRDDFunctions doubleToDoubleWritable doubleWritableConverter encode eq equals
exp explode expm1 expr factorial finalize first
floatToFloatWritable floatWritableConverter floor format_number format_string from_unixtime from_utc_timestamp
getClass getOrCreate get_json_object greatest hashCode hex hh_features
hh_split hh_timeline_pair_luid hour hypot initcap inputFileName input_file_name
instr intRddToDataFrameHolder intToIntWritable intWritableConverter isNaN isTraceEnabled isnan
isnull jarOfClass jarOfObject json_tuple kurtosis lag last
last_day lead least length levenshtein lit localSeqToDataFrameHolder
localSeqToDatasetHolder locate log log10 log1p log2 logDebug
logError logInfo logName logTrace logWarning longRddToDataFrameHolder longToLongWritable
longWritableConverter lower lpad ltrim markPartiallyConstructed max md5
mean min minute model modeling_stack monotonicallyIncreasingId monotonically_increasing_id
month months_between nanvl ne negate newBooleanEncoder newByteEncoder
newDoubleEncoder newFloatEncoder newIntEncoder newLongEncoder newProductEncoder newShortEncoder newStringEncoder
next_day not notify notifyAll ntile numDriverCores numericRDDToDoubleRDDFunctions
percentRank percent_rank pipeline pmod pow prediction quarter
rand randn rank rddToAsyncRDDActions rddToDataFrameHolder rddToDatasetHolder rddToOrderedRDDFunctions
rddToPairRDDFunctions rddToSequenceFileRDDFunctions regexp_extract regexp_replace repeat reverse rf
rint round rowNumber row_number rpad rtrim sc
second setActiveContext sha1 sha2 shiftLeft shiftRight shiftRightUnsigned
signum sin sinh size skewness sort_array soundex
sparkPartitionId spark_partition_id split sql sqlContext sqlTransformer sqlTransformer_labeled
sqrt stddev stddev_pop stddev_samp stringIndexer stringRddToDataFrameHolder stringToText
stringWritableConverter struct substring substring_index sum sumDistinct symbolToColumn
synchronized tan tanh toDegrees toRadians toString to_date
to_utc_timestamp translate trim trunc tstack udf unbase64
unhex unix_timestamp updatedConf upper var_pop var_samp variance
wait weekofyear when writableWritableConverter year
等
答案 0 :(得分:1)
scala> """very long
| |sql query""".stripMargin
res2: String =
very long
sql query
答案 1 :(得分:1)
此处的问题是您的SQL脚本包含选项卡,这会在REPL中触发选项卡完成。从脚本中删除选项卡,它应该可以工作。
或者,如果您没有使用标签页,则可以尝试使用this answer完全禁用它。