我是u-sql of azure datalake analytics的新手。 我想做我认为非常简单的操作,但遇到了麻烦。 基本上:我想创建一个忽略空字符串的查询。 在select作品中使用它,但在WHERE语句中不使用它。
在我正在制作的声明之下以及我得到的神秘错误
JOB
@xsel_res_1 =
EXTRACT
x_paper_id long,
x_Rank uint,
x_doi string,
x_doc_type string,
x_paper_title string,
x_original_title string,
x_book_title string,
x_paper_year int,
x_paper_date DateTime?,
x_publisher string,
x_journal_id long?,
x_conference_series_id long?,
x_conference_instance_id long?,
x_volume string,
x_issue string,
x_first_page string,
x_last_page string,
x_reference_count long,
x_citation_count long?,
x_estimated_citation int?
FROM @"adl://xmag.azuredatalakestore.net/graph/2018-02-02/Papers.txt"
USING Extractors.Tsv()
;
@xsel_res_2 =
SELECT
x_paper_id AS x_paper_id,
x_doi.ToLower() AS x_doi,
x_doi.Length AS x_doi_length
FROM @xsel_res_1
WHERE NOT string.IsNullOrEmpty(x_doi)
;
@xsel_res_3 =
SELECT
*
FROM @xsel_res_2
SAMPLE ANY (5)
;
OUTPUT @xsel_res_3
TO @"/graph/2018-02-02/x_output/x_papers_x6.tsv"
USING Outputters.Tsv();
错误
Vertex failed
Vertex failure triggered quick job abort. Vertex failed: SV1_Extract[0][1] with error: Vertex user code error.
VertexFailedFast: Vertex failed with a fail-fast error
E_RUNTIME_USER_EXTRACT_ROW_ERROR: Error occurred while extracting row after processing 10 record(s) in the vertex' input split. Column index: 5, column name: 'x_original_title'.
E_RUNTIME_USER_EXTRACT_EXTRACT_INVALID_CHARACTER_AFTER_QUOTED_FIELD: Invalid character following the ending quote character in a quoted field.
Row selected
Component
RUNTIME
Message
Invalid character following the ending quote character in a quoted field.
Resolution
Column should be fully surrounded with double-quotes and double-quotes within the field escaped as two double-quotes.
Description
Invalid character is detected following the ending quote character in a quoted field. A column delimiter, row delimiter or EOF is expected. This error can occur if double-quotes within the field are not correctly escaped as two double-quotes.
Details
Row Delimiter: 0x0
Column Delimiter: 0x9
HEX: 61 76 6E 69 20 74 65 72 6D 69 6E 20 75 20 70 6F 76 61 6C 6A 73 6B 6F 6A 20 6C 69 73 74 69 6E 69 20 69 20 6E 61 74 70 69 73 75 20 67 20 31 31 38 35 09 22 50 6F 20 6B 6F 6E 63 75 22 ### 20 28 73 74 61 72 69 20 68 72
更新 顺便说一下,这些操作适用于其他数据集,所以问题不在于我能说的语法
//Define schema of file, must map all columns
@searchlog =
EXTRACT UserId int,
Start DateTime,
Region string,
Query string,
Duration int,
Urls string,
ClickedUrls string
FROM @"/Samples/Data/SearchLog.tsv"
USING Extractors.Tsv();
@searchlog_1 =
SELECT * FROM @searchlog
WHERE NOT string.IsNullOrEmpty(ClickedUrls );
OUTPUT @searchlog_1
TO @"/Samples/Output/SearchLog_output_x1.tsv"
USING Outputters.Tsv();
答案 0 :(得分:3)
对于这种情况,这是一个不幸的错误显示。
假设文本是utf-8,您可以使用像www.hexutf8.com这样的网站将十六进制转换为:
avni termin u povaljskoj listini natpisu g 1185 "Po koncu" (Stari hr
看起来输入行包含至少一个未正确转义的"
字符。它应该是这样的:
avni termin u povaljskoj listini natpisu g 1185 ""Po koncu"" (Stari hr
答案 1 :(得分:3)
@ Saveenr的回答假定您的文件中的值都是引用的。或者,如果它们未被引用(并且不包含列分隔符作为值),那么设置Extractors.Tsv(quoting:false)
也可能有所帮助。