Question

我有一个定界文本文件，定界符为 ~|^ 。我需要使用MLCP将文件摄取到marklogic中。为此，我尝试了使用两种方法进行MLCP摄取。

使用不带选项文件的MLCP

mlcp.sh import -username admin -password admin -input_file_type delimited_text -delimiter "~|^" -document_type json -host localhost -database test -port 8052 -output_uri_prefix /test/data/ -generate_uri -output_uri_suffix .json \-output_collections "Test" -input_file_path inputfile1.csv
将MLCP与选项文件一起使用

mlcp.sh import -username admin -password admin -options_file delim.opt -document_type json -host localhost -database test -port 8052 -output_uri_prefix /test/data/ -generate_uri -output_uri_suffix .json \-output_collections "Test" -input_file_path inputfile1.csv

我的选项文件如下所示（delim.opt）：

-input_file_type
delimited_text
-delimiter
"~|^"

但是在两种方法中，mlcp均不起作用，并且出现以下错误：

java.lang.IllegalArgumentException：无效的分隔符：〜| ^

有人可以帮助我如何通过MLCP将这些类型的CSV文件提取到MarkLogic中吗？

Answer 1

我认为MarkLogic内容泵不能支持解析多字符定界符。 MarkLogic content pump使用Apache Commons CSV library解析定界文本。到今天为止，似乎存在一个未解决的问题，即为多字符定界符解析定界文本，请参见问题CSV-206。

现在，您可以使用单个字符定界符创建新的定界文本文件。我经常在命令行中使用sed来替换文件中的字符串。如果走这条路线，请注意，您需要对记录值中所有出现的新定界符进行转义。