BlazingText jsonlines批量转换的问题

时间:2019-09-30 10:57:58

标签: amazon-sagemaker

我有一个像这样的jsonlines文件:

{"id":123,"source":"this is a text string"}
{"id":456,"source":"this is another text string"}
{"id":789,"source":"yet another string"}

当我在仅包含源的文件上运行BlazingText批处理转换作业时,它可以工作。尝试连接输入和输出时,我得到Customer Error: Unable to decode payload: Incorrect data format. (caused by AttributeError)

有什么建议吗?

代码:

bt_transformer = bt_model.transformer(
    instance_count = 1,
    instance_type = "ml.m4.xlarge",
    assemble_with = "Line",
    output_path = s3_batch_out_data,
    accept = "application/jsonlines"
)

bt_transformer.transform(
    s3_batch_in_data, 
    content_type = "application/jsonlines",
    split_type = "Line", 
    input_filter = "$.source", 
    join_source = "Input", 
    output_filter = "$['id', 'SageMakerOutput']"
)

bt_transformer.wait()

1 个答案:

答案 0 :(得分:0)

在{“ id”:123,“ source”:“这是一个文本字符串”}上应用“ $ .source”时,输出是“这是文本字符串”,而不是{“ source”:“ this是文本字符串“},这可能就是为什么您遇到格式错误的原因。我想知道为什么您需要对JSON输入进行此类过滤-算法不会自动忽略无法识别的JSON字段吗?